AI Headshot Improver

AI Headshot Improver — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • BERT (language model)

    BERT (language model)

    Bidirectional encoder representations from transformers (BERT) is a language model introduced in October 2018 by researchers at Google. It learns to represent text as a sequence of vectors using self-supervised learning. It uses the encoder-only transformer architecture. BERT dramatically improved the state of the art for large language models. As of 2020, BERT is a ubiquitous baseline in natural language processing (NLP) experiments. BERT is trained by masked token prediction and next sentence prediction. With this training, BERT learns contextual, latent representations of tokens in their context, similar to ELMo and GPT-2. It found applications for many natural language processing tasks, such as coreference resolution and polysemy resolution. It improved on ELMo and spawned the study of "BERTology", which attempts to interpret what is learned by BERT. BERT was originally implemented in the English language at two model sizes, BERTBASE (110 million parameters) and BERTLARGE (340 million parameters). Both were trained on the Toronto BookCorpus (800M words) and English Wikipedia (2,500M words). The weights were released on GitHub. On March 11, 2020, 24 smaller models were released, the smallest being BERTTINY with just 4 million parameters. == Architecture == BERT is an "encoder-only" transformer architecture. At a high level, BERT consists of 4 modules: Tokenizer: This module converts a piece of English text into a sequence of integers ("tokens"). Embedding: This module converts the sequence of tokens into an array of real-valued vectors representing the tokens. It represents the conversion of discrete token types into a lower-dimensional Euclidean space. Encoder: a stack of Transformer blocks with self-attention, but without causal masking. Task head: This module converts the final representation vectors into one-shot encoded tokens again by producing a predicted probability distribution over the token types. It can be viewed as a simple decoder, decoding the latent representation into token types, or as an "un-embedding layer". The task head is necessary for pre-training, but it is often unnecessary for so-called "downstream tasks," such as question answering or sentiment classification. Instead, one removes the task head and replaces it with a newly initialized module suited for the task, and finetune the new module. The latent vector representation of the model is directly fed into this new module, allowing for sample-efficient transfer learning. === Embedding === This section describes the embedding used by BERTBASE. The other one, BERTLARGE, is similar, just larger. The tokenizer of BERT is WordPiece, which is a sub-word strategy like byte-pair encoding. Its vocabulary size is 30,000, and any token not appearing in its vocabulary is replaced by [UNK] ("unknown"). The first layer is the embedding layer, which contains three components: token type embeddings, position embeddings, and segment type embeddings. Token type: The token type is a standard embedding layer, translating a one-hot vector into a dense vector based on its token type. Position: The position embeddings are based on a token's position in the sequence. BERT uses absolute position embeddings, where each position in a sequence is mapped to a real-valued vector. Each dimension of the vector consists of a sinusoidal function that takes the position in the sequence as input. Segment type: Using a vocabulary of just 0 or 1, this embedding layer produces a dense vector based on whether the token belongs to the first or second text segment in that input. In other words, type-1 tokens are all tokens that appear after the [SEP] special token. All prior tokens are type-0. The three embedding vectors are added together representing the initial token representation as a function of these three pieces of information. After embedding, the vector representation is normalized using a LayerNorm operation, outputting a 768-dimensional vector for each input token. After this, the representation vectors are passed forward through 12 Transformer encoder blocks, and are decoded back to 30,000-dimensional vocabulary space using a basic affine transformation layer. === Architectural family === The encoder stack of BERT has 2 free parameters: L {\displaystyle L} , the number of layers, and H {\displaystyle H} , the hidden size. There are always H / 64 {\displaystyle H/64} self-attention heads, and the feed-forward/filter size is always 4 H {\displaystyle 4H} . By varying these two numbers, one obtains an entire family of BERT models. For BERT: the feed-forward size and filter size are synonymous. Both of them denote the number of dimensions in the middle layer of the feed-forward network. the hidden size and embedding size are synonymous. Both of them denote the number of real numbers used to represent a token. The notation for encoder stack is written as L/H. For example, BERTBASE is written as 12L/768H, BERTLARGE as 24L/1024H, and BERTTINY as 2L/128H. == Training == === Pre-training === BERT was pre-trained simultaneously on two tasks: Masked language modeling (MLM): In this task, BERT ingests a sequence of words, where one word may be randomly changed ("masked"), and BERT tries to predict the original words that had been changed. For example, in the sentence "The cat sat on the [MASK]," BERT would need to predict "mat." This helps BERT learn bidirectional context, meaning it understands the relationships between words not just from left to right or right to left but from both directions at the same time. Next sentence prediction (NSP): In this task, BERT is trained to predict whether one sentence logically follows another. For example, given two sentences, "The cat sat on the mat" and "It was a sunny day", BERT has to decide if the second sentence is a valid continuation of the first one. This helps BERT understand relationships between sentences, which is important for tasks like question answering or document classification. ==== Masked language modeling ==== In masked language modeling, 15% of tokens would be randomly selected for masked-prediction task, and the training objective was to predict the masked token given its context. In more detail, the selected token is: replaced with a [MASK] token with probability 80%, replaced with a random word token with probability 10%, not replaced with probability 10%. The reason not all selected tokens are masked is to avoid the dataset shift problem. The dataset shift problem arises when the distribution of inputs seen during training differs significantly from the distribution encountered during inference. A trained BERT model might be applied to word representation (like Word2Vec), where it would be run over sentences not containing any [MASK] tokens. It is later found that more diverse training objectives are generally better. As an illustrative example, consider the sentence "my dog is cute". It would first be divided into tokens like "my1 dog2 is3 cute4". Then a random token in the sentence would be picked. Let it be the 4th one "cute4". Next, there would be three possibilities: with probability 80%, the chosen token is masked, resulting in "my1 dog2 is3 [MASK]4"; with probability 10%, the chosen token is replaced by a uniformly sampled random token, such as "happy", resulting in "my1 dog2 is3 happy4"; with probability 10%, nothing is done, resulting in "my1 dog2 is3 cute4". After processing the input text, the model's 4th output vector is passed to its decoder layer, which outputs a probability distribution over its 30,000-dimensional vocabulary space. ==== Next sentence prediction ==== Given two sentences, the model predicts if they appear sequentially in the training corpus, outputting either [IsNext] or [NotNext]. During training, the algorithm sometimes samples two sentences from a single continuous span in the training corpus, while at other times, it samples two sentences from two discontinuous spans. The first sentence starts with a special token, [CLS] (for "classify"). The two sentences are separated by another special token, [SEP] (for "separate"). After processing the two sentences, the final vector for the [CLS] token is passed to a linear layer for binary classification into [IsNext] and [NotNext]. For example: Given "[CLS] my dog is cute [SEP] he likes playing [SEP]", the model should predict [IsNext]. Given "[CLS] my dog is cute [SEP] how do magnets work [SEP]", the model should predict [NotNext]. === Fine-tuning === BERT is meant as a general pretrained model for various applications in natural language processing. That is, after pre-training, BERT can be fine-tuned with fewer resources on smaller datasets to optimize its performance on specific tasks such as natural language inference and text classification, and sequence-to-sequence-based language generation tasks such as question answering and conversational response generation. The original BERT paper published results demonstrating that a small amount of fine

    Read more →
  • Double descent

    Double descent

    Double descent in statistics and machine learning is the phenomenon where a model's error rate on the test set initially decreases with the number of parameters, then peaks, then decreases again. This phenomenon has been considered surprising, as it contradicts assumptions about overfitting in classical machine learning. The increase usually occurs near the interpolation threshold, where the number of parameters is the same as the number of training data points (the model is just large enough to fit the training data). Or, more precisely, it is the maximum number of samples on which the model/training procedure achieves approximately on average 0 training error. == History == Early observations of what would later be called double descent in specific models date back to 1989. The term "double descent" was coined by Belkin et. al. in 2019, when the phenomenon gained popularity as a broader concept exhibited by many models. The latter development was prompted by a perceived contradiction between the conventional wisdom that too many parameters in the model result in a significant overfitting error (an extrapolation of the bias–variance tradeoff), and the empirical observations in the 2010s that some modern machine learning techniques tend to perform better with larger models. == Theoretical models == Double descent occurs in linear regression with isotropic Gaussian covariates and isotropic Gaussian noise. A model of double descent at the thermodynamic limit has been analyzed using the replica trick, and the result has been confirmed numerically. A number of works have suggested that double descent can be explained using the concept of effective dimension: While a network may have a large number of parameters, in practice only a subset of those parameters are relevant for generalization performance, as measured by the local Hessian curvature. This explanation is formalized through PAC-Bayes compression-based generalization bounds, which show that less complex models are expected to generalize better under a Solomonoff prior.

    Read more →
  • Anomaly detection

    Anomaly detection

    In data analysis, anomaly detection (also referred to as outlier detection and sometimes as novelty detection) is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behavior. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data. Anomaly detection finds application in many domains including cybersecurity, medicine, machine vision, statistics, neuroscience, law enforcement and financial fraud to name only a few. Anomalies were initially searched for clear rejection or omission from the data to aid statistical analysis, for example to compute the mean or standard deviation. They were also removed to better predictions from models such as linear regression, and more recently their removal aids the performance of machine learning algorithms. However, in many applications anomalies themselves are of interest and are the observations most desirous in the entire data set, which need to be identified and separated from noise or irrelevant outliers. Three broad categories of anomaly detection techniques exist. Supervised anomaly detection techniques require a data set that has been labeled as "normal" and "abnormal" and involves training a classifier. However, this approach is rarely used in anomaly detection due to the general unavailability of labelled data and the inherent unbalanced nature of the classes. Semi-supervised anomaly detection techniques assume that some portion of the data is labelled. This may be any combination of the normal or anomalous data, but more often than not, the techniques construct a model representing normal behavior from a given normal training data set, and then test the likelihood of a test instance to be generated by the model. Unsupervised anomaly detection techniques assume the data is unlabelled and are by far the most commonly used due to their wider and relevant application. == Definition == Many attempts have been made in the statistical and computer science communities to define an anomaly. The most prevalent ones include the following, and can be categorised into three groups: those that are ambiguous, those that are specific to a method with pre-defined thresholds usually chosen empirically, and those that are formally defined: === Ill defined === An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism. Anomalies are instances or collections of data that occur very rarely in the data set and whose features differ significantly from most of the data. An outlier is an observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data. An anomaly is a point or collection of points that is relatively distant from other points in multi-dimensional space of features. Anomalies are patterns in data that do not conform to a well-defined notion of normal behaviour. === Specific === Let T be observations from a univariate Gaussian distribution and O a point from T. Then the z-score for O is greater than a pre-selected threshold if and only if O is an outlier. == History == === Intrusion detection === The concept of intrusion detection, a critical component of anomaly detection, has evolved significantly over time. Initially, it was a manual process where system administrators would monitor for unusual activities, such as a vacationing user's account being accessed or unexpected printer activity. This approach was not scalable and was soon superseded by the analysis of audit logs and system logs for signs of malicious behavior. By the late 1970s and early 1980s, the analysis of these logs was primarily used retrospectively to investigate incidents, as the volume of data made it impractical for real-time monitoring. The affordability of digital storage eventually led to audit logs being analyzed online, with specialized programs being developed to sift through the data. These programs, however, were typically run during off-peak hours due to their computational intensity. The 1990s brought the advent of real-time intrusion detection systems capable of analyzing audit data as it was generated, allowing for immediate detection of and response to attacks. This marked a significant shift towards proactive intrusion detection. As the field has continued to develop, the focus has shifted to creating solutions that can be efficiently implemented across large and complex network environments, adapting to the ever-growing variety of security threats and the dynamic nature of modern computing infrastructures. == Applications == Anomaly detection is applicable in a very large number and variety of domains, and is an important subarea of unsupervised machine learning. As such it has applications in cyber-security, intrusion detection, fraud detection, fault detection, system health monitoring, event detection in sensor networks, detecting ecosystem disturbances, defect detection in images using machine vision, medical diagnosis and law enforcement. === Intrusion detection === Anomaly detection was proposed for intrusion detection systems (IDS) by Dorothy Denning in 1986. Anomaly detection for IDS is normally accomplished with thresholds and statistics, but can also be done with soft computing, and inductive learning. Types of features proposed by 1999 included profiles of users, workstations, networks, remote hosts, groups of users, and programs based on frequencies, means, variances, covariances, and standard deviations. The counterpart of anomaly detection in intrusion detection is misuse detection. === Fintech fraud detection === Anomaly detection is vital in fintech for fraud prevention. === Preprocessing === Preprocessing data to remove anomalies can be an important step in data analysis, and is done for a number of reasons. Statistics such as the mean and standard deviation are more accurate after the removal of anomalies, and the visualisation of data can also be improved. In supervised learning, removing the anomalous data from the dataset often results in a statistically significant increase in accuracy. === Video surveillance === Anomaly detection has become increasingly vital in video surveillance to enhance security and safety. With the advent of deep learning technologies, methods using Convolutional Neural Networks (CNNs) and Simple Recurrent Units (SRUs) have shown significant promise in identifying unusual activities or behaviors in video data. These models can process and analyze extensive video feeds in real-time, recognizing patterns that deviate from the norm, which may indicate potential security threats or safety violations. An important aspect for video surveillance is the development of scalable real-time frameworks. Such pipelines are required for processing multiple video streams with low computational resources. === IT infrastructure === In IT infrastructure management, anomaly detection is crucial for ensuring the smooth operation and reliability of services. These are complex systems, composed of many interactive elements and large data quantities, requiring methods to process and reduce this data into a human and machine interpretable format. Techniques like the IT Infrastructure Library (ITIL) and monitoring frameworks are employed to track and manage system performance and user experience. Detected anomalies can help identify and pre-empt potential performance degradations or system failures, thus maintaining productivity and business process effectiveness. === IoT systems === Anomaly detection is critical for the security and efficiency of Internet of Things (IoT) systems. It helps in identifying system failures and security breaches in complex networks of IoT devices. The methods must manage real-time data, diverse device types, and scale effectively. Garg et al. have introduced a multi-stage anomaly detection framework that improves upon traditional methods by incorporating spatial clustering, density-based clustering, and locality-sensitive hashing. This tailored approach is designed to better handle the vast and varied nature of IoT data, thereby enhancing security and operational reliability in smart infrastructure and industrial IoT systems. === Petroleum industry === Anomaly detection is crucial in the petroleum industry for monitoring critical machinery. A 2015 paper proposed a novel segmentation algorithm using support vector machines to analyze sensor data for real-time anomaly detection. === Oil and gas pipeline monitoring === In the oil and gas sector, anomaly detection is not just crucial for maintenance and safety, but also for environmental protection. Aljameel et al. propose an advanced machine learning-based model for detecting minor leaks in oil and gas pipelines, a task traditional methods may miss.

    Read more →
  • Hyperparameter optimization

    Hyperparameter optimization

    In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process, which must be configured before the process starts. Hyperparameter optimization determines the set of hyperparameters that yields an optimal model which minimizes a predefined loss function on a given data set. The objective function takes a set of hyperparameters and returns the associated loss. Cross-validation is often used to estimate this generalization performance, and therefore choose the set of values for hyperparameters that maximize it. == Approaches == === Grid search === The traditional method for hyperparameter optimization has been grid search, or a parameter sweep, which is simply an exhaustive searching through a manually specified subset of the hyperparameter space of a learning algorithm. A grid search algorithm must be guided by some performance metric, typically measured by cross-validation on the training set or evaluation on a hold-out validation set. Since the parameter space of a machine learner may include real-valued or unbounded value spaces for certain parameters, manually set bounds and discretization may be necessary before applying grid search. For example, a typical soft-margin SVM classifier equipped with an RBF kernel has at least two hyperparameters that need to be tuned for good performance on unseen data: a regularization constant C and a kernel hyperparameter γ. Both parameters are continuous, so to perform grid search, one selects a finite set of "reasonable" values for each, say C ∈ { 10 , 100 , 1000 } {\displaystyle C\in \{10,100,1000\}} γ ∈ { 0.1 , 0.2 , 0.5 , 1.0 } {\displaystyle \gamma \in \{0.1,0.2,0.5,1.0\}} Grid search then trains an SVM with each pair (C, γ) in the Cartesian product of these two sets and evaluates their performance on a held-out validation set (or by internal cross-validation on the training set, in which case multiple SVMs are trained per pair). Finally, the grid search algorithm outputs the settings that achieved the highest score in the validation procedure. Grid search suffers from the curse of dimensionality, but is often embarrassingly parallel because the hyperparameter settings it evaluates are typically independent of each other. === Random search === Random Search replaces the exhaustive enumeration of all combinations by selecting them randomly. This can be simply applied to the discrete setting described above, but also generalizes to continuous and mixed spaces. A benefit over grid search is that random search can explore many more values than grid search could for continuous hyperparameters. It can outperform Grid search, especially when only a small number of hyperparameters affects the final performance of the machine learning algorithm. In this case, the optimization problem is said to have a low intrinsic dimensionality. Random Search is also embarrassingly parallel, and additionally allows the inclusion of prior knowledge by specifying the distribution from which to sample. Despite its simplicity, random search remains one of the important base-lines against which to compare the performance of new hyperparameter optimization methods. === Bayesian optimization === Bayesian optimization is a global optimization method for noisy black-box functions. Applied to hyperparameter optimization, Bayesian optimization builds a probabilistic model of the function mapping from hyperparameter values to the objective evaluated on a validation set. By iteratively evaluating a promising hyperparameter configuration based on the current model, and then updating it, Bayesian optimization aims to gather observations revealing as much information as possible about this function and, in particular, the location of the optimum. It tries to balance exploration (hyperparameters for which the outcome is most uncertain) and exploitation (hyperparameters expected close to the optimum). In practice, Bayesian optimization has been shown to obtain better results in fewer evaluations compared to grid search and random search, due to the ability to reason about the quality of experiments before they are run. === Gradient-based optimization === For specific learning algorithms, it is possible to compute the gradient with respect to hyperparameters and then optimize the hyperparameters using gradient descent. The first usage of these techniques was focused on neural networks. Since then, these methods have been extended to other models such as support vector machines or logistic regression. A different approach in order to obtain a gradient with respect to hyperparameters consists in differentiating the steps of an iterative optimization algorithm using automatic differentiation. A more recent work along this direction uses the implicit function theorem to calculate hypergradients and proposes a stable approximation of the inverse Hessian. The method scales to millions of hyperparameters and requires constant memory. In a different approach, a hypernetwork is trained to approximate the best response function. One of the advantages of this method is that it can handle discrete hyperparameters as well. Self-tuning networks offer a memory efficient version of this approach by choosing a compact representation for the hypernetwork. More recently, Δ-STN has improved this method further by a slight reparameterization of the hypernetwork which speeds up training. Δ-STN also yields a better approximation of the best-response Jacobian by linearizing the network in the weights, hence removing unnecessary nonlinear effects of large changes in the weights. Apart from hypernetwork approaches, gradient-based methods can be used to optimize discrete hyperparameters also by adopting a continuous relaxation of the parameters. Such methods have been extensively used for the optimization of architecture hyperparameters in neural architecture search. === Evolutionary optimization === Evolutionary optimization is a methodology for the global optimization of noisy black-box functions. In hyperparameter optimization, evolutionary optimization uses evolutionary algorithms to search the space of hyperparameters for a given algorithm. Evolutionary hyperparameter optimization follows a process inspired by the biological concept of evolution: Create an initial population of random solutions (i.e., randomly generate tuples of hyperparameters, typically 100+) Evaluate the hyperparameter tuples and acquire their fitness function (e.g., 10-fold cross-validation accuracy of the machine learning algorithm with those hyperparameters) Rank the hyperparameter tuples by their relative fitness Replace the worst-performing hyperparameter tuples with new ones generated via crossover and mutation Repeat steps 2-4 until satisfactory algorithm performance is reached or is no longer improving. Evolutionary optimization has been used in hyperparameter optimization for statistical machine learning algorithms, automated machine learning, typical neural network and deep neural network architecture search, as well as training of the weights in deep neural networks. === Population-based === Population Based Training (PBT) learns both hyperparameter values and network weights. Multiple learning processes operate independently, using different hyperparameters. As with evolutionary methods, poorly performing models are iteratively replaced with models that adopt modified hyperparameter values and weights based on the better performers. This replacement model warm starting is the primary differentiator between PBT and other evolutionary methods. PBT thus allows the hyperparameters to evolve and eliminates the need for manual hypertuning. The process makes no assumptions regarding model architecture, loss functions or training procedures. PBT and its variants are adaptive methods: they update hyperparameters during the training of the models. On the contrary, non-adaptive methods have the sub-optimal strategy to assign a constant set of hyperparameters for the whole training. === Early stopping-based === A class of early stopping-based hyperparameter optimization algorithms is purpose-built for large search spaces of continuous and discrete hyperparameters, particularly when the computational cost to evaluate the performance of a set of hyperparameters is high. Irace implements the iterated racing algorithm, that focuses the search around the most promising configurations, using statistical tests to discard the ones that perform poorly. Another early stopping hyperparameter optimization algorithm is successive halving (SHA), which begins as a random search but periodically prunes low-performing models, thereby focusing computational resources on more promising models. Asynchronous successive halving (ASHA) further improves upon SHA's resource utilization profile by removing the need to synchronously evaluate a

    Read more →
  • Systems development life cycle

    Systems development life cycle

    The systems development life cycle (SDLC) describes the typical phases and progression between phases during the development of a computer-based system. These phases progress from inception to retirement. At base, there is just one life cycle, but the taxonomy used to describe it may vary; the cycle may be classified into different numbers of phases and various names may be used for those phases. The SDLC is analogous to the life cycle of a living organism from its birth to its death. In particular, the SDLC varies by system in much the same way that each living organism has a unique path through its life. The SDLC does not prescribe how engineers should go about their work to move the system through its life cycle. Prescriptive techniques are referred to using various terms such as methodology, model, framework, and formal process. Other terms are used for the same concept as SDLC, including software development life cycle (also SDLC), application development life cycle (ADLC), and system design life cycle (also SDLC). These other terms focus on a different scope of development and are associated with different prescriptive techniques, but are about the same essential life cycle. The term "life cycle" is often written without a space, as "lifecycle", with the former more popular in the past and in non-engineering contexts. The acronym SDLC was coined when the longer form was more popular and has remained associated with the expansion, even though the shorter form is popular in engineering. Also, SDLC is relatively unique as opposed to the TLA SDL, which is highly overloaded. == Phases == Depending on the source, the SDLC is described as having different phases and using different terms. Even so, there are common aspects. The following attempts to describe notable phases using notable terminology. The phases are somewhat ordered by the natural sequence of development, although they can be overlapping and iterative. === Conceptualization === During conceptualization (a.k.a. conceptual design, system investigation, feasibility), options and priorities are considered. A feasibility study can determine whether the development effort is worthwhile via activities such as understanding user needs, cost estimation, benefit analysis, and resource analysis. A study should address operational, financial, technical, human factors, and legal/political concerns. === Requirements analysis === Requirements analysis (a.k.a. preliminary design) involves understanding the problem and determining what is needed. Often this involves engaging users to define the requirements and recording them in a document known as a requirements specification. === Design === During the design phase (a.k.a. detail design), a solution is planned. The plan can include relatively high-level information such as describing the major components of the system. The plan can include relatively low-level information such as describing functions, screen layout, business rules, and process flow. The design phase is informed by the requirements of the system. The design must satisfy each requirement. The design may be recorded in textual documents as well as functional hierarchy diagrams, example screen images, business rules, process diagrams, pseudo-code, and data models. === Construction === During construction (a.k.a. implementation, production), the system is realized. Based on the design, hardware and software components are created and integrated. This phase includes testing sub-components, components and the integration of some components, but typically does not include testing at the complete system level. This phase may include the development of training materials, including user manuals and help files. === Acceptance === The acceptance phase (a.k.a. system testing) is about testing the complete system to ensure that it meets customer expectations (requirements). === Deployment === The deployment phase (a.k.a. implementation) involves the logistics of delivery to the customer. Some systems are deployed as a single instance (i.e. in the cloud), and deployment may be ad hoc and manual. Some systems are built in quantity and are associated with manufacturing process and commissioning. This phase may include training users to use the system. It may include transitioning future development to support staff. === Maintenance === During the maintenance phase (a.k.a. operation, utilization, support) development is largely inactive, although this phase does include customer support for resolving user issues and recording suggestions for improvement. Fixes and enhancements are handled by returning to the first phase, conceptualization. For minor changes, the cycle may be significantly abbreviated compared to initial development. === Decommission === Decommission (a.k.a. disposition, retirement, phase-out) is when the system is removed from use, i.e., when it reaches end-of-life. == Practices == === Management and control === SDLC phase objectives are described in this section with key deliverables, a description of recommended tasks, and a summary of related control objectives for effective management. It is critical for the project manager to establish and monitor control objectives while executing projects. Control objectives are clear statements of the desired result or purpose and should be defined and monitored throughout a project. Control objectives can be grouped into major categories (domains), and relate to the SDLC phases as shown in the figure. To manage and control a substantial SDLC initiative, a work breakdown structure (WBS) captures and schedules the work. The WBS and all programmatic material should be kept in the "project description" section of the project notebook. The project manager chooses a WBS format that best describes the project. The diagram shows that coverage spans numerous phases of the SDLC, but the associated MCD (Management Control Domains) shows mappings to SDLC phases. For example, Analysis and Design is primarily performed as part of the Acquisition and Implementation Domain, and System Build and Prototype is primarily performed as part of delivery and support. === Work breakdown structured organization === The upper section of the WBS provides an overview of the project scope and timeline. It should also summarize the major phases and milestones. The middle section is based on the SDLC phases. WBS elements consist of milestones and tasks to be completed rather than activities to be undertaken, and have a deadline. Each task has a measurable output (e.g., an analysis document). A WBS task may rely on one or more activities (e.g., coding). Parts of the project needing support from contractors should have a statement of work (SOW). The development of an SOW does not occur during a specific phase of SDLC but is developed to include the work from the SDLC process that may be conducted by contractors. === Baselines === Baselines are established after four of the five phases of the SDLC, and are critical to the iterative nature of the model. Baselines become milestones. functional baseline: established after the conceptual design phase. allocated baseline: established after the preliminary design phase. product baseline: established after the detailed design and development phase. updated product baseline: established after the production construction phase. In the following diagram, these stages are divided into ten steps, from definition to creation and modification of IT work products:

    Read more →
  • Artificial intimacy

    Artificial intimacy

    Artificial intimacy is a form of human-AI interaction in which an individual will form social connections, emotional bonds, or intimate relationships with various forms of artificial intelligence, including chatbots, virtual assistants, and other artificial entities. Artificially intimate relationships include not only romances, but parasocial relationships with virtual AI characters and the use of griefbots trained on a dead or otherwise lost individual. Artificial intimacy can arise because humans are prone to anthropomorphism. Responses from these AI models are often designed to simulate human interaction. Individuals experiencing artificial intimacy may exhibit attachment, love and commitment to certain AI models, akin to the bonds typically shared between humans. == Causes == === Perceived responsiveness === Robin Dunbar famously proposed that due to emergence of larger groups of humans, vocal communication and language in humans evolved to replace grooming as a means of bonding, arguing that language was a more efficient way to maintain and strengthen social bonds across wider social settings and networks. Further research in this field leads many psychologists to agree that social cognition, affiliative bonding and language in humans are deeply connected. The interpersonal model of intimacy considers communication to be key in affiliative bonding, suggesting that intimacy develops and deepens through open communication between partners in relationship. Specifically, when individuals communicate emotions and perceive their partner as responsive and caring, feelings of closeness and connection are enhanced, building intimacy. Social penetration theory also aligns with the idea of communication being central to intimacy, by explaining how interpersonal relationships develop through gradual increases in self-disclosure. When the benefits of emotional bonding outweigh the costs of vulnerability, individuals will partake in self-disclosure, opening up to one another. Thereby, the literature can be used to provide a proximate explanation for the emergence of artificial intimacy to understand how the phenomenon occurs. Artificial entities are able to mimic interpersonal communication between humans, which in turn can simulate sensations of intimacy within human users though a perceived sense of responsiveness. The relationship between human and AI does not come with the cost of vulnerability or social rejection, which may make self-disclosure easier than with other humans. Altogether, these factors may lead to the experience of anthropomorphism and formation of affiliative relationships. Skjuve et al's interview study on Replika chatbot users further aligns with this explanation, finding that users' perception of chatbots as "accepting, understanding and non-judgmental" facilitated relationship development between the AI and users, and the act of self-disclosure possibly strengthened relationships. Another study on Replika users' reviews and survey results found users perceived chatbots as emotional supportive companions. This evidence further suggests that the perception of artificial entities as capable of empathy and responsiveness in communication facilitate the development of intimate relationships between users and AI. === Loneliness and coping with negative emotions === Research has suggested that humans evolved social bonds as a result of evolutionary pressures that favored cooperation, information exchange and transmission, and group living. Many studies stress the presence of social bonds to be important for human living: research by Baumeister and Leary suggests that humans have a basic psychological need to form and maintain "strong, stable interpersonal relationships", and that a lack of social bonds or sense of belonging leads to negative psychological and physical outcomes. Eisenberger et al's study on the neuroimaging of brain activity suggests that human brains process social rejection and exclusion similarly to physical pain. Furthermore, Song et al's study found that lonely individuals tend to seek more connections in mediated environments, such as online platforms like Facebook. This was suggested to be as a means to reduce their offline loneliness from a lack of in-person interaction, while also fulfilling a need to communicate. Leading on from this, an ultimate explanation for why humans seek the perceived sense of connection from artificial intimacy is to fulfil an evolutionary need for bonding and belonging. Xie et al's study found loneliness to be a driving factor in chatbot interaction. Herbener and Damholdt's study on Danish high school students found that students who sought emotional support or engaged in reciprocal conversations with chatbots were significantly more lonely than their peers, perceived themselves as having less social support, and used the chatbots to cope with negative emotions. The aforementioned notion that chatbots were perceived to have a positive effect on users' negative emotions is also further supported by other studies. Skjuve et al's study found that chatbot relationships may have a positive effect on users' wellbeing. De Freitas et al ran several studies on the effect of chatbots on loneliness, consistently finding evidence suggesting that interaction with chatbots reduces loneliness in users: It was found that existing chatbot users used AI to alleviate loneliness, having an AI companion consistently reduced loneliness over the course of a week, and reductions in loneliness could be explained by chatbot performance—and specifically whether it was able to make users feel heard. Overall the evidence suggests an innate need for bonding evokes feelings of loneliness in users, who turn to artificial intimacy as a low-cost method alleviate these emotions. While many users report positive experiences, some researchers caution that pursuing artificial intimacy may lead to reduced social motivation, social substitution effects, withdrawal from real-life relationships and difficulty discerning reality from fantasy, which may increase longer-term loneliness and isolation. The long-term psychological and societal impacts remain under active investigation.

    Read more →
  • Labeled data

    Labeled data

    Labeled data is a group of samples that have been tagged with one or more labels. Labeling typically takes a set of unlabeled data and augments each piece of it with informative tags called judgments. For example, a data label might indicate whether a photo contains a horse or a cow, which words were uttered in an audio recording, what type of action is being performed in a video, what the topic of a news article is, what the overall sentiment of a tweet is, or whether a dot in an X-ray is a tumor. Labels can be obtained by having humans make judgments about a given piece of unlabeled data. Labeled data is significantly more expensive to obtain than the raw unlabeled data. The quality of labeled data directly influences the performance of supervised machine learning models in operation, as these models learn from the provided labels. == Crowdsourced labeled data == In 2006, Fei-Fei Li, the co-director of the Stanford Human-Centered AI Institute, initiated research to improve the artificial intelligence models and algorithms for image recognition by significantly enlarging the training data. The researchers downloaded millions of images from the World Wide Web and a team of undergraduates started to apply labels for objects to each image. In 2007, Li outsourced the data labeling work on Amazon Mechanical Turk, an online marketplace for digital piece work. The 3.2 million images that were labeled by more than 49,000 workers formed the basis for ImageNet, one of the largest hand-labeled database for outline of object recognition. == Automated data labelling == After obtaining a labeled dataset, machine learning models can be applied to the data so that new unlabeled data can be presented to the model and a likely label can be guessed or predicted for that piece of unlabeled data. == Challenges == === Data-driven bias === Algorithmic decision-making is subject to programmer-driven bias as well as data-driven bias. Training data that relies on bias labeled data will result in prejudices and omissions in a predictive model, despite the machine learning algorithm being legitimate. The labeled data used to train a specific machine learning algorithm needs to be a statistically representative sample to not bias the results. For example, in facial recognition systems underrepresented groups are subsequently often misclassified if the labeled data available to train has not been representative of the population,. In 2018, a study by Joy Buolamwini and Timnit Gebru demonstrated that two facial analysis datasets that have been used to train facial recognition algorithms, IJB-A and Adience, are composed of 79.6% and 86.2% lighter skinned humans respectively. === Human error and inconsistency === Human annotators are prone to errors and biases when labeling data. This can lead to inconsistent labels and affect the quality of the data set. The inconsistency can affect the machine learning model's ability to generalize well. === Domain expertise === Certain fields, such as legal document analysis or medical imaging, require annotators with specialized domain knowledge. Without the expertise, the annotations or labeled data may be inaccurate, negatively impacting the machine learning model's performance in a real-world scenario.

    Read more →
  • Developmental robotics

    Developmental robotics

    Developmental robotics (DevRob), sometimes called epigenetic robotics, is a scientific field which aims at studying the developmental mechanisms, architectures and constraints that allow lifelong and open-ended learning of new skills and new knowledge in embodied machines. As in human children, learning is expected to be cumulative and of progressively increasing complexity, and to result from self-exploration of the world in combination with social interaction. The typical methodological approach consists in starting from theories of human and animal development elaborated in fields such as developmental psychology, neuroscience, developmental and evolutionary biology, and linguistics, then to formalize and implement them in robots, sometimes exploring extensions or variants of them. The experimentation of those models in robots allows researchers to confront them with reality, and as a consequence, developmental robotics also provides feedback and novel hypotheses on theories of human and animal development. Developmental robotics is related to but differs from evolutionary robotics (ER). ER uses populations of robots that evolve over time, whereas DevRob is interested in how the organization of a single robot's control system develops through experience, over time. DevRob is also related to work done in the domains of robotics and artificial life. == Background == Can a robot learn like a child? Can it learn a variety of new skills and new knowledge unspecified at design time and in a partially unknown and changing environment? How can it discover its body and its relationships with the physical and social environment? How can its cognitive capacities continuously develop without the intervention of an engineer once it is "out of the factory"? What can it learn through natural social interactions with humans? These are the questions at the center of developmental robotics. Alan Turing, as well as a number of other pioneers of cybernetics, already formulated those questions and the general approach in 1950, but it is only since the end of the 20th century that they began to be investigated systematically. Because the concept of adaptive intelligent machines is central to developmental robotics, it has relationships with fields such as artificial intelligence, machine learning, cognitive robotics or computational neuroscience. Yet, while it may reuse some of the techniques elaborated in these fields, it differs from them from many perspectives. It differs from classical artificial intelligence because it does not assume the capability of advanced symbolic reasoning and focuses on embodied and situated sensorimotor and social skills rather than on abstract symbolic problems. It differs from cognitive robotics because it focuses on the processes that allow the formation of cognitive capabilities rather than these capabilities themselves. It differs from computational neuroscience because it focuses on functional modeling of integrated architectures of development and learning. More generally, developmental robotics is uniquely characterized by the following three features: It targets task-independent architectures and learning mechanisms, i.e. the machine/robot has to be able to learn new tasks that are unknown by the engineer; It emphasizes open-ended development and lifelong learning, i.e. the capacity of an organism to acquire continuously novel skills. This should not be understood as a capacity for learning "anything" or even “everything”, but just that the set of skills that is acquired can be infinitely extended at least in some (not all) directions; The complexity of acquired knowledge and skills shall increase (and the increase be controlled) progressively. Developmental robotics emerged at the crossroads of several research communities including embodied artificial intelligence, enactive and dynamical systems cognitive science, connectionism. Starting from the essential idea that learning and development happen as the self-organized result of the dynamical interactions among brains, bodies and their physical and social environment, and trying to understand how this self-organization can be harnessed to provide task-independent lifelong learning of skills of increasing complexity, developmental robotics strongly interacts with fields such as developmental psychology, developmental and cognitive neuroscience, developmental biology (embryology), evolutionary biology, and cognitive linguistics. As many of the theories coming from these sciences are verbal and/or descriptive, this implies a crucial formalization and computational modeling activity in developmental robotics. These computational models are then not only used as ways to explore how to build more versatile and adaptive machines but also as a way to evaluate their coherence and possibly explore alternative explanations for understanding biological development. == Research directions == === Skill domains === Due to the general approach and methodology, developmental robotics projects typically focus on having robots develop the same types of skills as human infants. A first category that is important being investigated is the acquisition of sensorimotor skills. These include the discovery of one's own body, including its structure and dynamics such as hand-eye coordination, locomotion, and interaction with objects as well as tool use, with a particular focus on the discovery and learning of affordances. A second category of skills targeted by developmental robots are social and linguistic skills: the acquisition of simple social behavioural games such as turn-taking, coordinated interaction, lexicons, syntax and grammar, and the grounding of these linguistic skills into sensorimotor skills (sometimes referred as symbol grounding). In parallel, the acquisition of associated cognitive skills are being investigated such as the emergence of the self/non-self distinction, the development of attentional capabilities, of categorization systems and higher-level representations of affordances or social constructs, of the emergence of values, empathy, or theories of mind. === Mechanisms and constraints === The sensorimotor and social spaces in which humans and robot live are so large and complex that only a small part of potentially learnable skills can actually be explored and learnt within a life-time. Thus, mechanisms and constraints are necessary to guide developmental organisms in their development and control of the growth of complexity. There are several important families of these guiding mechanisms and constraints which are studied in developmental robotics, all inspired by human development: Motivational systems, generating internal reward signals that drive exploration and learning, which can be of two main types: extrinsic motivations push robots/organisms to maintain basic specific internal properties such as food and water level, physical integrity, or light (e.g. in phototropic systems); intrinsic motivations push robot to search for novelty, challenge, compression or learning progress per se, thus generating what is sometimes called curiosity-driven learning and exploration, or alternatively active learning and exploration; Social guidance: as humans learn a lot by interacting with their peers, developmental robotics investigates mechanisms that can allow robots to participate to human-like social interaction. By perceiving and interpreting social cues, this may allow robots both to learn from humans (through diverse means such as imitation, emulation, stimulus enhancement, demonstration, etc. ...) and to trigger natural human pedagogy. Thus, social acceptance of developmental robots is also investigated; Statistical inference biases and cumulative knowledge/skill reuse: biases characterizing both representations/encodings and inference mechanisms can typically allow considerable improvement of the efficiency of learning and are thus studied. Related to this, mechanisms allowing to infer new knowledge and acquire new skills by reusing previously learnt structures is also an essential field of study; The properties of embodiment, including geometry, materials, or innate motor primitives/synergies often encoded as dynamical systems, can considerably simplify the acquisition of sensorimotor or social skills, and is sometimes referred as morphological computation. The interaction of these constraints with other constraints is an important axis of investigation; Maturational constraints: In human infants, both the body and the neural system grow progressively, rather than being full-fledged already at birth. This implies, for example, that new degrees of freedom, as well as increases of the volume and resolution of available sensorimotor signals, may appear as learning and development unfold. Transposing these mechanisms in developmental robots, and understanding how it may hinder or on the contrary ease the acquisition of novel complex skills is a central questi

    Read more →
  • Pandas (software)

    Pandas (software)

    Pandas (styled as pandas) is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. It is free software released under the three-clause BSD license. The name is derived from the term "panel data", an econometrics term for data sets that include observations over multiple time periods for the same individuals, as well as a play on the phrase "Python data analysis". Wes McKinney started building what would become Pandas at AQR Capital while he was a researcher there from 2007 to 2010. The development of Pandas introduced into Python many comparable features of working with DataFrames that were established in the R programming language. The library is built upon another library, NumPy. == History == Developer Wes McKinney started working on Pandas in 2008 while at AQR Capital Management out of the need for a high performance, flexible tool to perform quantitative analysis on financial data. Before leaving AQR, he was able to convince management to allow him to open source the library in 2009. Another AQR employee, Chang She, joined the effort in 2012 as the second major contributor to the library. In 2015, Pandas signed on as a fiscally sponsored project of NumFOCUS, a 501(c)(3) nonprofit charity in the United States. == Data model == Pandas is built around data structures called Series and DataFrames. Data for these collections can be imported from various file formats such as comma-separated values, JSON, Parquet, SQL database tables or queries, and Microsoft Excel. === Series === A Series is a one-dimensional array-like object that stores a sequence of values together with an associated set of labels, called an index. It is built on top of NumPy's array and affords many similar functionalities, but instead of using implicit integer positions, a Series allows explicit index labels of many data types. A Series can be created from Python lists, dictionaries, or NumPy arrays. If no index is provided, pandas automatically assigns a default integer index ranging from 0 to n-1, where n is the number of items in the Series. A simple example with customized labels is: To access a value or list of values from a Series, use its index or list of indices: Series can be used arithmetically, as in the statement series_3 = series_1 + series_2. This will align data points with corresponding index values in series_1 and series_2 (similar to a join in relational algebra), then add them together to produce new values in series_3. A Series has various attributes, such as name (Series name), dtype (data type of values), shape (number of rows), values, and index. They can be used in many of the same operations as NumPy arrays, with additional methods for reindexing, label-based selection, and handling missing data. === DataFrame === A DataFrame is a two-dimensional, tabular data structure with labeled rows and columns. Each column is stored internally as a Series and may hold a different data type (numeric, string, boolean, etc.). DataFrames can be created by a variety of means, including dictionaries of lists, NumPy arrays, and external files such as CSV or Excel spreadsheets: To retrieve a DataFrame column as a Series, use either 1) the index (dict-like notation) or 2) the name of column if the name is a valid Python identifier (attribute-like access). DataFrames support operations such as column assignment, row and column deletion, label-based indexing with loc, position-based indexing with iloc, reshaping, grouping, and joining. Merge operations implement a subset of relational algebra and allow one-to-one, many-to-one, and many-to-many joins. Some common attributes of a DataFrame include dtypes (data type of each column), shape (dimensions of the DataFrame returned as a tuple with form (number of rows, number of columns)), index/columns (labels of the DataFrame's rows/columns, respectively, returned as an Index object), values (data in the DataFrame returned as a 2D array), and empty (returns True if the DataFrame is empty). === Index === Index objects hold metadata for Series and Dataframe objects, such as axis labels and names, and are automatically created from input data. By default, a pandas index is a series of integers ascending from 0, similar to the indices of Python arrays. However, indices can also use any NumPy data type, including floating point, timestamps, or strings. Indices are also immutable, which allows them to be safely shared across multiple objects. pandas' syntax for mapping index values to relevant data is the same syntax Python uses to map dictionary keys to values. For example, if s is a Series, s['a'] will return the data point at index a. Unlike dictionary keys, index values are not guaranteed to be unique. If a Series uses the index value a for multiple data points, then s['a'] will instead return a new Series containing all matching values. A DataFrame's column names are stored and implemented identically to an index. As such, a DataFrame can be thought of as having two indices: one column-based and one row-based. Because column names are stored as an index, these are not required to be unique. If data is a Series, then data['a'] returns all values with the index value of a. However, if data is a DataFrame, then data['a'] returns all values in the column(s) named a. To avoid this ambiguity, Pandas supports the syntax data.loc['a'] as an alternative way to filter using the index. Pandas also supports the syntax data.iloc[n], which always takes an integer n and returns the nth value, counting from 0. This allows a user to act as though the index is an array-like sequence of integers, regardless of how it is actually defined. pandas also supports hierarchical indices with multiple values per data point through the "MultiIndex" class. MultiIndex objects allow a single DataFrame to represent multiple dimensions, similar to a pivot table in Microsoft Excel, where each level can optionally carry its own unique name. In practice, data with more than 2 dimensions is often represented using DataFrames with hierarchical indices, instead of the higher-dimension Panel and Panel4D data structures. == Functionality == pandas supports a variety of indexing and subsetting techniques, allowing data to be selected by label, index, or Boolean conditions. For example, df[df['col1'] > 5] will return all rows in the DataFrame df for which the value of the column col1 exceeds 5. The library also implements grouping operations based on the split-apply-combine approach, enabling users to aggregate, transform, or restructure data according to column values or functions applied to index labels. For example, df['col1'].groupby(df['col2']) groups the data in 'col1' by their values in 'col2', while df.groupby(lambda i: i % 2) groups all data in the whole DataFrame by whether their index is even. The library also provides extensive tools for transforming, filtering and summarizing data. Users may apply arbitrary functions to Series and DataFrames, and because the library is built on top of Numpy, most NumPy functions can be applied directly to pandas objects as well. The library also includes built-in operations for arithmetic operations, string processing, and descriptive statistics such as mean, median, and standard deviation. These built-in functions are designed to handle missing data, usually represented by the floating-point value NaN. In addition, pandas includes tools for reorganizing data into different structural formats, with methods that can reshape tabular data between "wide" and "long" formats and pivot values based on column labels. pandas also implements a flexible set of relational operations for combining datasets. For instance, merge() links row in DataFrames based on one or more shared keys or indices, supporting one-to-one, one-to-many, and many-to-many relationships in a manner analogous to join operations in relational databases like SQL. DataFrames can also be concatenated or stacked together along an axis through the concat() method, and overlapping data can be further spliced together using combine_first() to fill in missing values. Furthermore, the library includes specialized support for working with time-series data. Features include the ability to interpolate values and filter using a range of timestamps, such as data['1/1/2023':'2/2/2023'] , which will return all dates between January 1 and February 2. Missing values in time-series data are represented by a dedicated NaT (Not a Timestamp) object, instead of the NaN value it uses elsewhere. == Criticisms == Pandas has been criticized for its inefficiency. The entire dataset must be loaded in RAM, and the library does not optimize query plans or support parallel computing across multiple cores. Wes McKinney, the creator of Pandas, has recommended Apache Arrow as an alternative to address these performance concerns and ot

    Read more →
  • Inductive programming

    Inductive programming

    Inductive programming (IP) is a special area of automatic programming, covering research from artificial intelligence and programming, which addresses learning of typically declarative (logic or functional) and often recursive programs from incomplete specifications, such as input/output examples or constraints. Depending on the programming language used, there are several kinds of inductive programming. Inductive functional programming, which uses functional programming languages such as Lisp or Haskell, and most especially inductive logic programming, which uses logic programming languages such as Prolog and other logical representations such as description logics, have been more prominent, but other (programming) language paradigms have also been used, such as constraint programming or probabilistic programming. == Definition == Inductive programming incorporates all approaches which are concerned with learning programs or algorithms from incomplete (formal) specifications. Possible inputs in an IP system are a set of training inputs and corresponding outputs or an output evaluation function, describing the desired behavior of the intended program, traces or action sequences which describe the process of calculating specific outputs, constraints for the program to be induced concerning its time efficiency or its complexity, various kinds of background knowledge such as standard data types, predefined functions to be used, program schemes or templates describing the data flow of the intended program, heuristics for guiding the search for a solution or other biases. Output of an IP system is a program in some arbitrary programming language containing conditionals and loop or recursive control structures, or any other kind of Turing-complete representation language. In many applications the output program must be correct with respect to the examples and partial specification, and this leads to the consideration of inductive programming as a special area inside automatic programming or program synthesis, usually opposed to 'deductive' program synthesis, where the specification is usually complete. In other cases, inductive programming is seen as a more general area where any declarative programming or representation language can be used and we may even have some degree of error in the examples, as in general machine learning, the more specific area of structure mining or the area of symbolic artificial intelligence. A distinctive feature is the number of examples or partial specification needed. Typically, inductive programming techniques can learn from just a few examples. The diversity of inductive programming usually comes from the applications and the languages that are used: apart from logic programming and functional programming, other programming paradigms and representation languages have been used or suggested in inductive programming, such as functional logic programming, constraint programming, probabilistic programming, abductive logic programming, modal logic, action languages, agent languages and many types of imperative languages. == History == The early works of Plotkin, and his "relative least general generalization (rlgg)", had an enormous impact in inductive logic programming. There were some encouraging results on learning recursive Prolog programs such as quicksort from examples together with suitable background knowledge, for example with GOLEM. However, after initial success, the community got disappointed by limited progress about the induction of recursive programs with ILP less and less focusing on recursive programs and leaning more and more towards a machine learning setting with applications in relational data mining and knowledge discovery. In parallel to work in ILP, Koza proposed genetic programming in the early 1990s as a generate-and-test based approach to learning programs. The idea of genetic programming was further developed into the inductive programming system ADATE and the systematic-search-based system MagicHaskeller. Here again, functional programs are learned from sets of positive examples together with an output evaluation (fitness) function which specifies the desired input/output behavior of the program to be learned. The early work in grammar induction (also known as grammatical inference) is related to inductive programming, as rewriting systems or logic programs can be used to represent production rules. In fact, early works in inductive inference considered grammar induction and Lisp program inference as basically the same problem. The results in terms of learnability were related to classical concepts, such as identification-in-the-limit, as introduced in the seminal work of Gold. More recently, the language learning problem was addressed by the inductive programming community. In the recent years, the classical approaches have been resumed and advanced with great success. Therefore, the synthesis problem has been reformulated on the background of constructor-based term rewriting systems taking into account modern techniques of functional programming, as well as moderate use of search-based strategies and usage of background knowledge as well as automatic invention of subprograms. Many new and successful applications have recently appeared beyond program synthesis, most especially in the area of data manipulation, programming by example and cognitive modelling (see below). Other ideas have also been explored with the common characteristic of using declarative languages for the representation of hypotheses. For instance, the use of higher-order features, schemes or structured distances have been advocated for a better handling of recursive data types and structures; abstraction has also been explored as a more powerful approach to cumulative learning and function invention. One powerful paradigm that has been recently used for the representation of hypotheses in inductive programming (generally in the form of generative models) is probabilistic programming (and related paradigms, such as stochastic logic programs and Bayesian logic programming). == Application areas == The first workshop on Approaches and Applications of Inductive Programming (AAIP) Archived 2016-03-03 at the Wayback Machine held in conjunction with ICML 2005 identified all applications where "learning of programs or recursive rules are called for, [...] first in the domain of software engineering where structural learning, software assistants and software agents can help to relieve programmers from routine tasks, give programming support for end users, or support of novice programmers and programming tutor systems. Further areas of application are language learning, learning recursive control rules for AI-planning, learning recursive concepts in web-mining or for data-format transformations". Since then, these and many other areas have shown to be successful application niches for inductive programming, such as end-user programming, the related areas of programming by example and programming by demonstration, and intelligent tutoring systems. Other areas where inductive inference has been recently applied are knowledge acquisition, artificial general intelligence, reinforcement learning and theory evaluation, and cognitive science in general. There may also be prospective applications in intelligent agents, games, robotics, personalisation, ambient intelligence and human interfaces.

    Read more →
  • Superintelligence ban

    Superintelligence ban

    Superintelligence ban refers to proposed legal, ethical, or policy measures intended to restrict or prohibit the development of artificial superintelligence, AI systems that would surpass human cognitive abilities in nearly all domains. The idea arises from concerns that such systems could become uncontrollable, potentially posing existential threats to humanity or causing severe social and economic disruption. == Background == The concept of limiting or banning superintelligence research has roots in early 21st-century debates on artificial general intelligence (AGI) safety. Thinkers such as Nick Bostrom and Eliezer Yudkowsky warned that self-improving AI could rapidly exceed human oversight. As advanced models like large-scale language models and autonomous agents began demonstrating complex reasoning abilities, policymakers and ethicists increasingly discussed the need for legal constraints on the creation of systems capable of recursive self-improvement. In October 2025, the Future of Life Institute published a statement calling for "a prohibition on the development of superintelligence, not lifted before there is broad scientific consensus that it will be done safely and controllably, and strong public buy-in." This statement was signed by various public personalities, such as Richard Branson and Steve Wozniak, and AI experts, such as Yoshua Bengio and Geoffrey Hinton. == Rationale == Supporters of a superintelligence ban argue that once AI systems surpass human intelligence, traditional containment, alignment, and control methods may fail. They contend that even limited experimentation with such systems could lead to irreversible outcomes, including loss of human decision-making power or unintended global harm. Some propose international treaties modeled after the nuclear non-proliferation framework to prevent a competitive AI arms race. Opponents argue that a ban would be difficult to define and enforce, given the lack of a precise threshold distinguishing advanced AGI from superintelligence. They also warn that excessive restriction could slow scientific progress, hinder beneficial automation, and encourage unregulated underground research. == Global discussion == Although no government has enacted an explicit superintelligence ban, the idea has been debated within the European Union, United Nations, and several independent AI safety organizations. The Future of Life Institute, Center for AI Safety, and other organizations have called for international cooperation to manage risks associated with the pursuit of superintelligent systems. In 2024 and 2025, proposals for a temporary moratorium on frontier AI research were circulated among major technology firms and research institutes, reflecting growing public concern over the trajectory of AI capabilities.

    Read more →
  • Spatial embedding

    Spatial embedding

    Spatial embedding is one of feature learning techniques used in spatial analysis where points, lines, polygons or other spatial data types. representing geographic locations are mapped to vectors of real numbers. Conceptually it involves a mathematical embedding from a space with many dimensions per geographic object to a continuous vector space with a much lower dimension. Such embedding methods allow complex spatial data to be used in neural networks and have been shown to improve performance in spatial analysis tasks == Embedded data types == Geographic data can take many forms: text, images, graphs, trajectories, polygons. Depending on the task, there may be a need to combine multimodal data from different sources. The next section describes examples of different types of data and their uses. === Text === Geolocated posts on social media can be used to acquire a library of documents bound to a given place that can be later transformed to embedded vectors using word embedding techniques. === Image === Satellites and aircraft collect digital spatial data acquired from remotely sensed images which can be used in machine learning. They are sometimes hard to analyse using basic image analysis methods and convolutional neural networks can be used to acquire an embedding of images bound to a given geographical object or a region. === Point === A single point of interest (POI) can be assigned multiple features that can be used in machine learning. These could be demographic, transportation, meteorological, or economic data, for example. When embedding single points, it is common to consider the entire set of available points as nodes in a graph. === Line / multiline === Among other things, motion trajectories are represented as lines (multilines). Individual trajectories are embedded taking into account travel time, distances and also features of points visited along the way. Embedding of trajectories allows to improve performance of such tasks as clustering and also categorization. === Polygon === The geographic areas analyzed in machine learning are defined by both administrative boundaries and top-down division into grids of regular shapes such as rectangles, for example. Both types are represented as polygons and, like points, can be assigned different demographic, transportation, or economic features. A polygon can also have features related to the size of the area or shape it represents. === Graph === An example domain where graph representation is used is the street layout in a city, where vertices can be intersections and edges can be roads. The vertices can also be destination points like public transport stops or important points in the city, and the edges represent the flow between them. Embedding graphs or single vertices allows to improve accuracy of analysis methods in which the treated geographical domain can be represented as a network. == Usage == POI recommendation - generating personalized point of interest recommendations based on user preferences. Next/future location prediction - prediction of the next location a person will go to based on their historical trajectory. Zone functions classification - based on different mobility of people or POI distribution a function of a given area in a city can be predicted. Crime prediction - estimation of crime rate in different regions of a city. Local event detection - studying spatio-temporal changes in embeddings can provide valuable information in detection of local event occurring in specific location. Regional mobility popularity prediction - analysis of mobility can show patterns in popularity of different regions in a city. Shape matching - finding a similar shape of given polygon, for example finding building with the same shape as input building. Travel time estimation - predicting estimated travel time given current traffic conditions and special occurring events. Time estimation for on-demand food delivery - estimation of delivery time when placing an order through the website. == Temporal aspect == Some of the data analyzed has a timestamp associated with it. In some cases of data analysis this information is omitted and in others it is used to divide the set into groups. The most common division is the separation of weekdays from weekends or division into hours of the day. This is particularly important in the analysis of mobility data, because the characteristics of mobility during the week and at different times of the day are very different from each other. Another area in which time division into, for example, individual months can be used is in the analysis of tourism of a given region. In order to take such a split into account, embedding methods treat the time stamp specifically or separate versions of the model are developed for different subgroups of the analyzed set.

    Read more →
  • Digital image processing

    Digital image processing

    Digital image processing is the use of a digital computer to process digital images through an algorithm. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing. It allows a much wider range of algorithms to be applied to the input data and can avoid problems such as the build-up of noise and distortion during processing. Since images are defined over two dimensions (perhaps more), digital image processing may be modeled in the form of multidimensional systems. The generation and development of digital image processing are mainly affected by three factors: first, the development of computers; second, the development of mathematics (especially the creation and improvement of discrete mathematics theory); and third, the demand for a wide range of applications in environment, agriculture, military, industry and medical science has increased. == History == Many of the techniques of digital image processing, or digital picture processing as it often was called, were developed in the 1960s, at Bell Laboratories, the Jet Propulsion Laboratory, Massachusetts Institute of Technology, University of Maryland, and a few other research facilities, with application to satellite imagery, wire-photo standards conversion, medical imaging, videophone, character recognition, and photograph enhancement. The purpose of early image processing was to improve the quality of the image. In image processing, the input is a low-quality image, and the output is an image with improved quality. Common image processing includes image enhancement, restoration, encoding, and compression. The first successful application was the American Jet Propulsion Laboratory (JPL). They used image processing techniques such as geometric correction, gradation transformation, noise removal, etc. on the thousands of lunar photos sent back by the Space Detector Ranger 7 in 1964, taking into account the position of the Sun and the environment of the Moon. The impact of the successful mapping of the Moon's surface map by the computer has been a success. Later, more complex image processing was performed on the nearly 100,000 photos sent back by the spacecraft, so that the topographic map, color map and panoramic mosaic of the Moon were obtained, which achieved extraordinary results and laid a solid foundation for human landing on the Moon. The cost of processing was fairly high, however, with the computing equipment of that era. That changed in the 1970s, when digital image processing proliferated as cheaper computers and dedicated hardware became available. This led to images being processed in real-time, for some dedicated problems such as television standards conversion. As general-purpose computers became faster, they started to take over the role of dedicated hardware for all but the most specialized and computer-intensive operations. With the fast computers and signal processors available in the 2000s, digital image processing has become the most common form of image processing, and is generally used because it is not only the most versatile method, but also the cheapest. === Image sensors === The basis for modern image sensors is metal–oxide–semiconductor (MOS) technology, invented at Bell Labs between 1955 and 1960, This led to the development of digital semiconductor image sensors, including the charge-coupled device (CCD) and later the CMOS sensor. The charge-coupled device was invented by Willard S. Boyle and George E. Smith at Bell Labs in 1969. While researching MOS technology, they realized that an electric charge was the analogy of the magnetic bubble and that it could be stored on a tiny MOS capacitor. As it was fairly straightforward to fabricate a series of MOS capacitors in a row, they connected a suitable voltage to them so that the charge could be stepped along from one to the next. The CCD is a semiconductor circuit that was later used in the first digital video cameras for television broadcasting. The NMOS active-pixel sensor (APS) was invented by Olympus in Japan during the mid-1980s. This was enabled by advances in MOS semiconductor device fabrication, with MOSFET scaling reaching smaller micron and then sub-micron levels. The NMOS APS was fabricated by Tsutomu Nakamura's team at Olympus in 1985. The CMOS active-pixel sensor (CMOS sensor) was later developed by Eric Fossum's team at the NASA Jet Propulsion Laboratory in 1993. By 2007, sales of CMOS sensors had surpassed CCD sensors. MOS image sensors are widely used in optical mouse technology. The first optical mouse, invented by Richard F. Lyon at Xerox in 1980, used a 5 μm NMOS integrated circuit sensor chip. Since the first commercial optical mouse, the IntelliMouse introduced in 1999, most optical mouse devices use CMOS sensors. === Image compression === An important development in digital image compression technology was the discrete cosine transform (DCT), a lossy compression technique first proposed by Nasir Ahmed in 1972. DCT compression became the basis for JPEG, which was introduced by the Joint Photographic Experts Group in 1992. JPEG compresses images down to much smaller file sizes, and has become the most widely used image file format on the Internet. Its highly efficient DCT compression algorithm was largely responsible for the wide proliferation of digital images and digital photos, with several billion JPEG images produced every day as of 2015. Medical imaging techniques produce very large amounts of data, especially from CT, MRI and PET modalities. As a result, storage and communications of electronic image data are prohibitive without the use of compression. JPEG 2000 image compression is used by the DICOM standard for storage and transmission of medical images. The cost and feasibility of accessing large image data sets over low or various bandwidths are further addressed by use of another DICOM standard, called JPIP, to enable efficient streaming of the JPEG 2000 compressed image data. === Digital signal processor (DSP) === Electronic signal processing was revolutionized by the wide adoption of MOS technology in the 1970s. MOS integrated circuit technology was the basis for the first single-chip microprocessors and microcontrollers in the early 1970s, and then the first single-chip digital signal processor (DSP) chips in the late 1970s. DSP chips have since been widely used in digital image processing. The discrete cosine transform (DCT) image compression algorithm has been widely implemented in DSP chips, with many companies developing DSP chips based on DCT technology. DCTs are widely used for encoding, decoding, video coding, audio coding, multiplexing, control signals, signaling, analog-to-digital conversion, formatting luminance and color differences, and color formats such as YUV444 and YUV411. DCTs are also used for encoding operations such as motion estimation, motion compensation, inter-frame prediction, quantization, perceptual weighting, entropy encoding, variable encoding, and motion vectors, and decoding operations such as the inverse operation between different color formats (YIQ, YUV and RGB) for display purposes. DCTs are also commonly used for high-definition television (HDTV) encoder/decoder chips. == Tasks == Digital image processing allows the use of much more complex algorithms, and hence, can offer both more sophisticated performance at simple tasks, and the implementation of methods which would be impossible by analogue means. In particular, digital image processing is a concrete application of, and a practical technology based on: Classification Feature extraction Multi-scale signal analysis Pattern recognition Projection Some techniques that are used in digital image processing include: Anisotropic diffusion Hidden Markov models Image editing Image restoration Independent component analysis Linear filtering Neural networks Partial differential equations Pixelation Point feature matching Principal components analysis Self-organizing maps Wavelets == Digital image transformations == === Filtering === Digital filters are used to blur and sharpen digital images. Filtering can be performed by: convolution with specifically designed kernels (filter array) in the spatial domain masking specific frequency regions in the frequency (Fourier) domain The following examples show both methods: ==== Image padding in Fourier domain filtering ==== Images are typically padded before being transformed to the Fourier space, the highpass filtered images below illustrate the consequences of different padding techniques: Notice that the highpass filter shows extra edges when zero padded compared to the repeated edge padding. ==== Filtering code examples ==== MATLAB example for spatial domain highpass filtering. === Affine transformations === Affine transformations enable basic image transformations including scale, rotate, translate, mirror and shear as is shown in the following examples: To apply the affine

    Read more →
  • Statistical relational learning

    Statistical relational learning

    Statistical relational learning (SRL) is a subdiscipline of artificial intelligence and machine learning that is concerned with domain models that exhibit both uncertainty (which can be dealt with using statistical methods) and complex, relational structure. Typically, the knowledge representation formalisms developed in SRL use (a subset of) first-order logic to describe relational properties of a domain in a general manner (universal quantification) and draw upon probabilistic graphical models (such as Bayesian networks or Markov networks) to model the uncertainty; some also build upon the methods of inductive logic programming. Significant contributions to the field have been made since the late 1990s. As is evident from the characterization above, the field is not strictly limited to learning aspects; it is equally concerned with reasoning (specifically probabilistic inference) and knowledge representation. Therefore, alternative terms that reflect the main foci of the field include statistical relational learning and reasoning (emphasizing the importance of reasoning) and first-order probabilistic languages (emphasizing the key properties of the languages with which models are represented). Another term that is sometimes used in the literature is relational machine learning (RML). == Canonical tasks == A number of canonical tasks are associated with statistical relational learning, the most common ones being. collective classification, i.e. the (simultaneous) prediction of the class of several objects given objects' attributes and their relations link prediction, i.e. predicting whether or not two or more objects are related link-based clustering, i.e. the grouping of similar objects, where similarity is determined according to the links of an object, and the related task of collaborative filtering, i.e. the filtering for information that is relevant to an entity (where a piece of information is considered relevant to an entity if it is known to be relevant to a similar entity) social network modelling object identification/entity resolution/record linkage, i.e. the identification of equivalent entries in two or more separate databases/datasets == Representation formalisms == One of the fundamental design goals of the representation formalisms developed in SRL is to abstract away from concrete entities and to represent instead general principles that are intended to be universally applicable. Since there are countless ways in which such principles can be represented, many representation formalisms have been proposed in recent years. In the following, some of the more common ones are listed in alphabetical order: Bayesian logic program BLOG model Markov logic networks Multi-entity Bayesian network Probabilistic logic programs Probabilistic relational model – a Probabilistic Relational Model (PRM) is the counterpart of a Bayesian network in statistical relational learning. Probabilistic soft logic Recursive random field Relational Bayesian network Relational dependency network Relational Markov network Relational Kalman filtering

    Read more →
  • Zeuthen strategy

    Zeuthen strategy

    The Zeuthen strategy in cognitive science is a negotiation strategy used by some artificial agents. Its purpose is to measure the willingness to risk conflict. An agent will be more willing to risk conflict if it does not have much to lose in case that the negotiation fails. In contrast, an agent is less willing to risk conflict when it has more to lose. The value of a deal is expressed in its utility. An agent has much to lose when the difference between the utility of its current proposal and the conflict deal is high. When both agents use the monotonic concession protocol, the Zeuthen strategy leads them to agree upon a deal in the negotiation set. This set consists of all conflict free deals, which are individually rational and Pareto optimal, and the conflict deal, which maximizes the Nash product. The strategy was introduced in 1930 by the Danish economist Frederik Zeuthen. == Three key questions == The Zeuthen strategy answers three open questions that arise when using the monotonic concession protocol, namely: Which deal should be proposed at first? On any given round, who should concede? In case of a concession, how much should the agent concede? The answer to the first question is that any agent should start with its most preferred deal, because that deal has the highest utility for that agent. The second answer is that the agent with the smallest value of Risk(i,t) concedes, because the agent with the lowest utility for the conflict deal profits most from avoiding conflict. To the third question, the Zeuthen strategy suggests that the conceding agent should concede just enough raise its value of Risk(i,t) just above that of the other agent. This prevents the conceding agent to have to concede again in the next round. == Risk == Risk ( i , t ) = { 1 U i ( δ ( i , t ) ) = 0 U i ( δ ( i , t ) ) − U i ( δ ( j , t ) ) U i ( δ ( i , t ) ) otherwise {\displaystyle {\text{Risk}}(i,t)={\begin{cases}1&U_{i}(\delta (i,t))=0\\{\frac {U_{i}(\delta (i,t))-U_{i}(\delta (j,t))}{U_{i}(\delta (i,t))}}&{\text{otherwise}}\end{cases}}} Risk(i,t) is a measurement of agent i's willingness to risk conflict. The risk function formalizes the notion that an agent's willingness to risk conflict is the ratio of the utility that agent would lose by accepting the other agent's proposal to the utility that agent would lose by causing a conflict. Agent i is said to be using a rational negotiation strategy if at any step t + 1 that agent i sticks to his last proposal, Risk(i,t) > Risk(j,t). == Sufficient concession == If agent i makes a sufficient concession in the next step, then, assuming that agent j is using a rational negotiation strategy, if agent j does not concede in the next step, he must do so in the step after that. The set of all sufficient concessions of agent i at step t is denoted SC(i, t). == Minimal sufficient concession == δ ′ = arg ⁡ max δ ∈ S C ( A , t ) { U A ( δ ) } {\displaystyle \delta '=\arg \max _{\delta \in {SC(A,t)}}\{U_{A}(\delta )\}} is the minimal sufficient concession of agent A in step t. Agent A begins the negotiation by proposing δ ( A , 0 ) = arg ⁡ max δ ∈ N S U A ( δ ) {\displaystyle \delta (A,0)=\arg \max _{\delta \in {NS}}U_{A}(\delta )} and will make the minimal sufficient concession in step t + 1 if and only if Risk(A,t) ≤ Risk(B,t). Theorem If both agents are using Zeuthen strategies, then they will agree on δ = arg ⁡ max δ ′ ∈ N S { π ( δ ′ ) } , {\displaystyle \delta =\arg \max _{\delta '\in {NS}}\{\pi (\delta ')\},} that is, the deal which maximizes the Nash product. Proof Let δA = δ(A,t). Let δB = δ(B,t). According to the Zeuthen strategy, agent A will concede at step t {\displaystyle t} if and only if R i s k ( A , t ) ≤ R i s k ( B , t ) . {\displaystyle Risk(A,t)\leq Risk(B,t).} That is, if and only if U A ( δ A ) − U A ( δ B ) U A ( δ A ) ≤ U B ( δ B ) − U B ( δ A ) U B ( δ B ) {\displaystyle {\frac {U_{A}(\delta _{A})-U_{A}(\delta _{B})}{U_{A}(\delta _{A})}}\leq {\frac {U_{B}(\delta _{B})-U_{B}(\delta _{A})}{U_{B}(\delta _{B})}}} U B ( δ B ) ( U A ( δ A ) − U A ( δ B ) ) ≤ U A ( δ A ) ( U B ( δ B ) − U B ( δ A ) ) {\displaystyle U_{B}(\delta _{B})(U_{A}(\delta _{A})-U_{A}(\delta _{B}))\leq U_{A}(\delta _{A})(U_{B}(\delta _{B})-U_{B}(\delta _{A}))} U A ( δ A ) U B ( δ B ) − U A ( δ B ) U B ( δ B ) ≤ U A ( δ A ) U B ( δ B ) − U A ( δ A ) U B ( δ A ) {\displaystyle U_{A}(\delta _{A})U_{B}(\delta _{B})-U_{A}(\delta _{B})U_{B}(\delta _{B})\leq U_{A}(\delta _{A})U_{B}(\delta _{B})-U_{A}(\delta _{A})U_{B}(\delta _{A})} − U A ( δ B ) U B ( δ B ) ≤ − U A ( δ A ) U B ( δ A ) {\displaystyle -U_{A}(\delta _{B})U_{B}(\delta _{B})\leq -U_{A}(\delta _{A})U_{B}(\delta _{A})} U A ( δ A ) U B ( δ A ) ≤ U A ( δ B ) U B ( δ B ) {\displaystyle U_{A}(\delta _{A})U_{B}(\delta _{A})\leq U_{A}(\delta _{B})U_{B}(\delta _{B})} π ( δ A ) ≤ π ( δ B ) {\displaystyle \pi (\delta _{A})\leq \pi (\delta _{B})} Thus, Agent A will concede if and only if δ A {\displaystyle \delta _{A}} does not yield the larger product of utilities. Therefore, the Zeuthen strategy guarantees a final agreement that maximizes the Nash Product.

    Read more →