AI For Business Writing

AI For Business Writing — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Outline of machine learning

    Outline of machine learning

    The following outline is provided as an overview of, and topical guide to, machine learning: Machine learning (ML) is a subfield of artificial intelligence within computer science that evolved from the study of pattern recognition and computational learning theory. In 1959, Arthur Samuel defined machine learning as a "field of study that gives computers the ability to learn without being explicitly programmed". ML involves the study and construction of algorithms that can learn from and make predictions on data. These algorithms operate by building a model from a training set of example observations to make data-driven predictions or decisions expressed as outputs, rather than following strictly static program instructions. == How can machine learning be categorized? == An academic discipline A branch of science An applied science A subfield of computer science A branch of artificial intelligence A subfield of soft computing Application of statistics === Paradigms of machine learning === Supervised learning, where the model is trained on labeled data Unsupervised learning, where the model tries to identify patterns in unlabeled data Reinforcement learning, where the model learns to make decisions by receiving rewards or penalties. == Applications of machine learning == Applications of machine learning Bioinformatics Biomedical informatics Computer vision Customer relationship management Data mining Earth sciences Email filtering Inverted pendulum (balance and equilibrium system) Natural language processing Named Entity Recognition Automatic summarization Automatic taxonomy construction Dialog system Grammar checker Language recognition Handwriting recognition Optical character recognition Speech recognition Text to Speech Synthesis Speech Emotion Recognition Machine translation Question answering Speech synthesis Text mining Term frequency–inverse document frequency Text simplification Pattern recognition Facial recognition system Handwriting recognition Image recognition Optical character recognition Speech recognition Recommendation system Collaborative filtering Content-based filtering Hybrid recommender systems Search engine Search engine optimization Social engineering == Machine learning hardware == Graphics processing unit Tensor processing unit Vision processing unit == Machine learning tools == Comparison of machine learning software Comparison of deep learning software === Machine learning frameworks === ==== Proprietary machine learning frameworks ==== Amazon Machine Learning Microsoft Azure Machine Learning Studio DistBelief (replaced by TensorFlow) ==== Open source machine learning frameworks ==== Apache Singa Apache MXNet Caffe PyTorch mlpack TensorFlow Torch CNTK Accord.Net Jax MLJ.jl – A machine learning framework for Julia === Machine learning libraries === Deeplearning4j Theano scikit-learn Keras === Machine learning algorithms === == Machine learning methods == === Instance-based algorithm === K-nearest neighbors algorithm (KNN) Learning vector quantization (LVQ) Self-organizing map (SOM) === Regression analysis === Logistic regression Ordinary least squares regression (OLSR) Linear regression Stepwise regression Multivariate adaptive regression splines (MARS) Regularization algorithm Ridge regression Least Absolute Shrinkage and Selection Operator (LASSO) Elastic net Least-angle regression (LARS) Classifiers Probabilistic classifier Naive Bayes classifier Binary classifier Linear classifier Hierarchical classifier === Dimensionality reduction === Dimensionality reduction Canonical correlation analysis (CCA) Factor analysis Feature extraction Feature selection Independent component analysis (ICA) Linear discriminant analysis (LDA) Multidimensional scaling (MDS) Non-negative matrix factorization (NMF) Partial least squares regression (PLSR) Principal component analysis (PCA) Principal component regression (PCR) Projection pursuit Sammon mapping t-distributed stochastic neighbor embedding (t-SNE) === Ensemble learning === Ensemble learning AdaBoost Boosting Bootstrap aggregating (also "bagging" or "bootstrapping") Ensemble averaging Gradient boosted decision tree (GBDT) Gradient boosting Random Forest Stacked Generalization === Meta-learning === Meta-learning Inductive bias Metadata === Reinforcement learning === Reinforcement learning Q-learning State–action–reward–state–action (SARSA) Temporal difference learning (TD) Learning Automata === Supervised learning === Supervised learning Averaged one-dependence estimators (AODE) Artificial neural network Case-based reasoning Gaussian process regression Gene expression programming Group method of data handling (GMDH) Inductive logic programming Instance-based learning Lazy learning Learning Automata Learning Vector Quantization Logistic Model Tree Minimum message length (decision trees, decision graphs, etc.) Nearest Neighbor Algorithm Analogical modeling Probably approximately correct learning (PAC) learning Ripple down rules, a knowledge acquisition methodology Symbolic machine learning algorithms Support vector machines Random Forests Ensembles of classifiers Bootstrap aggregating (bagging) Boosting (meta-algorithm) Ordinal classification Conditional Random Field ANOVA Quadratic classifiers k-nearest neighbor Boosting SPRINT Bayesian networks Naive Bayes Hidden Markov models Hierarchical hidden Markov model ==== Bayesian ==== Bayesian statistics Bayesian knowledge base Naive Bayes Gaussian Naive Bayes Multinomial Naive Bayes Averaged One-Dependence Estimators (AODE) Bayesian Belief Network (BBN) Bayesian Network (BN) ==== Decision tree algorithms ==== Decision tree algorithm Decision tree Classification and regression tree (CART) Iterative Dichotomiser 3 (ID3) C4.5 algorithm C5.0 algorithm Chi-squared Automatic Interaction Detection (CHAID) Decision stump Conditional decision tree ID3 algorithm Random forest SLIQ ==== Linear classifier ==== Linear classifier Fisher's linear discriminant Linear regression Logistic regression Multinomial logistic regression Naive Bayes classifier Perceptron Support vector machine === Unsupervised learning === Unsupervised learning Expectation-maximization algorithm Vector Quantization Generative topographic map Information bottleneck method Association rule learning algorithms Apriori algorithm Eclat algorithm ==== Artificial neural networks ==== Artificial neural network Feedforward neural network Extreme learning machine Convolutional neural network Recurrent neural network Long short-term memory (LSTM) Logic learning machine Self-organizing map ==== Association rule learning ==== Association rule learning Apriori algorithm Eclat algorithm FP-growth algorithm ==== Hierarchical clustering ==== Hierarchical clustering Single-linkage clustering Conceptual clustering ==== Cluster analysis ==== Cluster analysis BIRCH DBSCAN Expectation–maximization (EM) Fuzzy clustering Hierarchical clustering k-means clustering k-medians Mean-shift OPTICS algorithm ==== Anomaly detection ==== Anomaly detection k-nearest neighbors algorithm (k-NN) Local outlier factor === Semi-supervised learning === Semi-supervised learning Active learning Generative models Low-density separation Graph-based methods Co-training Transduction === Deep learning === Deep learning Deep belief networks Deep Boltzmann machines Deep Convolutional neural networks Deep Recurrent neural networks Hierarchical temporal memory Generative Adversarial Network Style transfer Transformer Stacked Auto-Encoders === Other machine learning methods and problems === Anomaly detection Association rules Bias-variance dilemma Classification Multi-label classification Clustering Data Pre-processing Empirical risk minimization Feature engineering Feature learning Learning to rank Occam learning Online machine learning PAC learning Regression Reinforcement Learning Semi-supervised learning Statistical learning Structured prediction Graphical models Bayesian network Conditional random field (CRF) Hidden Markov model (HMM) Unsupervised learning VC theory == Machine learning research == List of artificial intelligence projects List of datasets for machine learning research == History of machine learning == History of machine learning Timeline of machine learning == Machine learning projects == Machine learning projects: DeepMind Google Brain OpenAI Meta AI Hugging Face == Machine learning organizations == === Machine learning conferences and workshops === Artificial Intelligence and Security (AISec) (co-located workshop with CCS) Conference on Neural Information Processing Systems (NIPS) ECML PKDD International Conference on Machine Learning (ICML) ML4ALL (Machine Learning For All) == Machine learning publications == === Books on machine learning === Mathematics for Machine Learning Hands-On Machine Learning Scikit-Learn, Keras, and TensorFlow The Hundred-Page Machine Learning Book === Machine learning journals === Machine Learning Journal of Machine Learning Research (JMLR) Neural Computation == Pe

    Read more →
  • Operation Serenata de Amor

    Operation Serenata de Amor

    Operation Serenata de Amor is an artificial intelligence project designed to analyze public spending in Brazil. The project has been funded by a recurrent financing campaign since September 7, 2016, and came in the wake of major scandals of misappropriation of public funds in Brazil, such as the Mensalão scandal and what was revealed in the Operation Car Wash investigations. The analysis began with data from the National Congress then expanded to other types of budget and instances of government, such as the Federal Senate. The project is built through collaboration on GitHub and using a public group with more than 600 participants on Telegram. The name "Serenata de Amor," which means "serenade of love," was taken from a popular cashew cream bonbon produced by Chocolates Garoto in Brazil. == Modules == Throughout development of the project, new modules have been newly introduced in addition to the main repository: The main repository, serenata-de-amor, serves as the starting point for investigative work. Rosie is the robot programmed to identify public funds expenses with discrepancies, starting with CEAP (Quota for Exercise of Parliamentary Activity); it analyzes each of the reimbursements requested by the deputies and senators, indicating the reasons that lead it to believe they are suspicious. From Rosie was born whistleblower, which tweets under the name of @RosieDaSerenata, distributing the results found on social media. Jarbas (Github repository) is a data visualization tool which shows a complete list of reimbursements made available by the Chamber of Deputies and mined by Rosie. Toolbox is a Python installable package that supports the development of Serenata de Amor and Rosie. == History == Operation Serenata de Amor is an Artificial intelligence project for analysis of public expenditures. It was conceived in March 2016 by data scientist Irio Musskopf, sociologist Eduardo Cuducos and entrepreneur Felipe Cabral. The project was financed collectively in the Catarse platform, where it reached 131% of the collection goal paying 3 months of project development. Ana Schwendler, also a data scientist, Pedro Vilanova "Tonny", data journalist, Bruno Pazzim, software engineer, Filipe Linhares, a frontend engineer, Leandro Devegili, an entrepreneur and André Pinho took the first steps towards constructing the platform, such as collecting and structuring the first datasets. Jessica Temporal, data scientist and Yasodara Córdova "Yaso", researcher, Tatiana Balachova "Russa", UX designer, joined the project after the financing took place. The members created a recurring financing campaign, expanding the analysis of public spending to the Federal Senate. Donors make monthly payments ranging from 5 BRL to 200 BRL to maintain group activities. The monthly amount collected is around 10,000 BRL. == Results == In January 2017, concluding the period financed by the initial campaign, the group carried out an investigation into the suspicious activities found by the data analysis system. 629 complaints were made to the Ombudsman's Office of the Chamber of Deputies, questioning expenses of 216 federal deputies. In addition, the Facebook project page has more than 25,000 followers, and users frequently cite the operation as a benchmark in transparency in the Brazilian government. One of the examples of results obtained by the operation is the case of the Deputy who had to return about 700 BRL to the House after his expenses were analyzed by the platform. The platform was able to analyze more than 3 million notes, raising about 8,000 suspected cases in public spending. The community that supports the work of the team benefits from open source repositories, with licenses open for the collaboration. So much so that the two main data scientists of the project presented it at the CivicTechFest in Taipei, obtaining several mentions even in the international press. The technical leader presented the project in Poland during DevConf2017 in Kraków. It was also presented in the Google News Lab in 2017. It was presented by Yaso, when she was the Director of the initiative, at the MIT Media Lab/Berkman Klein Center Initiative for Artificial Intelligence ethics, and at the Artificial Intelligence and Inclusion Symposium, an initiative of the Global Network of Internet & Society Centers (NoC). It was also presented both by Irio and Yaso at the Digital Harvard Kennedy School, over a lunch seminar, where the transparency of the platform and the main solutions found were discussed, so that the code and data are always available to verify its suitability. This infographic provides information about the first results of Operation Serenata de Amor, a project that analyzes open data on public spending to find discrepancies. The project was presented by Yaso to the House Audit and Control Committee of the Chamber of Deputies in August 2017, and raised the interest of House officials who work with open data. The operation has been a source of inspiration for other civic projects that aim to work with similar goals, demonstrating the broader impact of artificial intelligence also in industry in Brazil. Participation of several team members in events throughout Brazil and abroad can be found on the Internet, such as presentation at OpenDataDay, held at Calango Hackerspace in the Federal District, Campus Party Bahia, Campus Party Brasilia, Friends of Tomorrow, XIII National Meeting of Internal Control, in the event USP Talks Hackfest against corruption in João Pessoa, the latter being also highlighted in the National Press.

    Read more →
  • Embodied agent

    Embodied agent

    In artificial intelligence, an embodied agent, also sometimes referred to as an interface agent, is an intelligent agent that interacts with the environment through a physical body within that environment. Agents that are represented graphically with a body, for example a human or a cartoon animal, are also called embodied agents, although they have only virtual, not physical, embodiment. A branch of artificial intelligence focuses on empowering such agents to interact autonomously with human beings and the environment. Mobile robots are one example of physically embodied agents; Ananova and Microsoft Agent are examples of graphically embodied agents. Embodied conversational agents are embodied agents (usually with a graphical front-end as opposed to a robotic body) that are capable of engaging in conversation with one another and with humans employing the same verbal and nonverbal means that humans do (such as gesture, facial expression, and so forth). == Embodied conversational agents == Embodied conversational agents are a form of intelligent user interface. Graphically embodied agents aim to unite gesture, facial expression and speech to enable face-to-face communication with users, providing a powerful means of human-computer interaction. == Advantages == Face-to-face communication allows communication protocols that give a much richer communication channel than other means of communicating. It enables pragmatic communication acts such as conversational turn-taking, facial expression of emotions, information structure and emphasis, visualization and iconic gestures, and orientation in a three-dimensional environment. This communication takes place through both verbal and non-verbal channels such as gaze, gesture, spoken intonation and body posture. Research has found that users prefer a non-verbal visual indication of an embodied system's internal state to a verbal indication, demonstrating the value of additional non-verbal communication channels. As well as this, the face-to-face communication involved in interacting with an embodied agent can be conducted alongside another task without distracting the human participants, instead improving the enjoyment of such an interaction. Furthermore, the use of an embodied presentation agent results in improved recall of the presented information. Embodied agents also provide a social dimension to the interaction. Humans willingly ascribe social awareness to computers, and thus interaction with embodied agents follows social conventions, similar to human to human interactions. This social interaction both raises the believably and perceived trustworthiness of agents, and increases the user's engagement with the system. Rickenberg and Reeves found that the presence of an embodied agent on a website increased the level of user trust in that website, but also increased users' anxiety and affected their performance, as if they were being watched by a real human. Another effect of the social aspect of agents is that presentations given by an embodied agent are perceived as being more entertaining and less difficult than similar presentations given without an agent. Research shows that perceived enjoyment, followed by perceived usefulness and ease of use, is the major factor influencing user adoption of embodied agents. A study in January 2004 by Byron Reeves at Stanford demonstrated how digital characters could "enhance online experiences" through explaining how virtual characters essentially add a sense of familiarity to the user experience and make it more approachable. This increase in likability in turn helps make the products better, which benefits both the end users and those creating the product. === Applications === The rich style of communication that characterizes human conversation makes conversational interaction with embodied conversational agents ideal for many non-traditional interaction tasks. A familiar application of graphically embodied agents is computer games; embodied agents are ideal for this setting because the richer communication style makes interacting with the agent enjoyable. Embodied conversational agents have also been used in virtual training environments, portable personal navigation guides, interactive fiction and storytelling systems, interactive online characters and automated presenters and commentators. Major virtual assistants like Siri, Amazon Alexa and Google Assistant do not come with any visual embodied representation, which is believed to limit the sense of human presence by users. The U.S. Department of Defense utilizes a software agent called SGT STAR on U.S. Army-run Web sites and Web applications for site navigation, recruitment and propaganda purposes. Sgt. Star is run by the Army Marketing and Research Group, a division operated directly from The Pentagon. Sgt. Star is based upon the ActiveSentry technology developed by Next IT, a Washington-based information technology services company. Other such bots in the Sgt. Star "family" are utilized by the Federal Bureau of Investigation and the Central Intelligence Agency for intelligence gathering purposes.

    Read more →
  • Cognitive robotics

    Cognitive robotics

    Cognitive robotics or cognitive technology is a subfield of robotics concerned with endowing a robot with intelligent behavior by providing it with a processing architecture that will allow it to learn and reason about how to behave in response to complex goals in a complex world. Cognitive robotics may be considered the engineering branch of embodied cognitive science and embodied embedded cognition, consisting of robotic process automation, artificial intelligence, machine learning, deep learning, optical character recognition, image processing, process mining, analytics, software development and system integration. == Core issues == While traditional cognitive modeling approaches have assumed symbolic coding schemes as a means for depicting the world, translating the world into these kinds of symbolic representations has proven to be problematic if not untenable. Perception and action and the notion of symbolic representation are therefore core issues to be addressed in cognitive robotics. == Starting point == Cognitive robotics views human or animal cognition as a starting point for the development of robotic information processing, as opposed to more traditional artificial intelligence techniques. Target robotic cognitive capabilities include perception processing, attention allocation, anticipation, planning, complex motor coordination, reasoning about other agents and perhaps even about their own mental states. Robotic cognition embodies the behavior of intelligent agents in the physical world (or a virtual world, in the case of simulated cognitive robotics). Ultimately, the robot must be able to act in the real world. == Learning techniques == === Motor Babble === A preliminary robot learning technique called motor babbling involves correlating pseudo-random complex motor movements by the robot with resulting visual and/or auditory feedback such that the robot may begin to expect a pattern of sensory feedback given a pattern of motor output. Desired sensory feedback may then be used to inform a motor control signal. This is thought to be analogous to how a baby learns to reach for objects or learns to produce speech sounds. For simpler robot systems, where, for instance, inverse kinematics may feasibly be used to transform anticipated feedback (desired motor result) into motor output, this step may be skipped. === Imitation === Once a robot can coordinate its motors to produce a desired result, the technique of learning by imitation may be used. The robot monitors the performance of another agent and then the robot tries to imitate that agent. It is often a challenge to transform imitation information from a complex scene into a desired motor result for the robot. Note that imitation is a high-level form of cognitive behavior and imitation is not necessarily required in a basic model of embodied animal cognition. === Knowledge acquisition === A more complex learning approach is "autonomous knowledge acquisition": the robot is left to explore the environment on its own. A system of goals and beliefs is typically assumed. A somewhat more directed mode of exploration can be achieved by "curiosity" algorithms, such as Intelligent Adaptive Curiosity or Category-Based Intrinsic Motivation. These algorithms generally involve breaking sensory input into a finite number of categories and assigning some sort of prediction system (such as an artificial neural network) to each. The prediction system keeps track of the error in its predictions over time. Reduction in prediction error is considered learning. The robot then preferentially explores categories in which it is learning (or reducing prediction error) the fastest. == Other architectures == Some researchers in cognitive robotics have tried using architectures such as (ACT-R and Soar (cognitive architecture)) as a basis of their cognitive robotics programs. These highly modular symbol-processing architectures have been used to simulate operator performance and human performance when modeling simplistic and symbolized laboratory data. The idea is to extend these architectures to handle real-world sensory input as that input continuously unfolds through time. What is needed is a way to somehow translate the world into a set of symbols and their relationships. == Questions == Some of the fundamental questions to be answered in cognitive robotics are: How much human programming should or can be involved to support the learning processes? How can one quantify progress? Some of the adopted ways are reward and punishment. But what kind of reward and what kind of punishment? In humans, when teaching a child, for example, the reward would be candy or some encouragement, and the punishment can take many forms. But what is an effective way with robots?

    Read more →
  • Decorrelation

    Decorrelation

    Decorrelation is a general term for any process that is used to reduce autocorrelation within a signal, or cross-correlation within a set of signals, while preserving other aspects of the signal. A frequently used method of decorrelation is the use of a matched linear filter to reduce the autocorrelation of a signal as far as possible. Since the minimum possible autocorrelation for a given signal energy is achieved by equalising the power spectrum of the signal to be similar to that of a white noise signal, this is often referred to as signal whitening. == Process == === Signal processing === Most decorrelation algorithms are linear, but there are also non-linear decorrelation algorithms. Many data compression algorithms incorporate a decorrelation stage. For example, many transform coders first apply a fixed linear transformation that would, on average, have the effect of decorrelating a typical signal of the class to be coded, prior to any later processing. This is typically a Karhunen–Loève transform, or a simplified approximation such as the discrete cosine transform. By comparison, sub-band coders do not generally have an explicit decorrelation step, but instead exploit the already-existing reduced correlation within each of the sub-bands of the signal, due to the relative flatness of each sub-band of the power spectrum in many classes of signals. Linear predictive coders can be modelled as an attempt to decorrelate signals by subtracting the best possible linear prediction from the input signal, leaving a whitened residual signal. Decorrelation techniques can also be used for many other purposes, such as reducing crosstalk in a multi-channel signal, or in the design of echo cancellers. In image processing decorrelation techniques can be used to enhance or stretch, colour differences found in each pixel of an image. This is generally termed as 'decorrelation stretching'. === Neuroscience === In neuroscience, decorrelation is used in the analysis of the neural networks in the human visual system. The raw inputs from cone cells and rod cells under go many steps of processing before it is handled by the visual cortex. These steps generally perform decorrelation, both spatial (surround suppression in the retina) and temporal (handling of movement in the lateral geniculate nucleus). === Cryptography === In cryptography, decorrelation is used in cipher design (see Decorrelation theory) and in the design of hardware random number generators.

    Read more →
  • Machine learning in video games

    Machine learning in video games

    Artificial intelligence and machine learning techniques are used in video games for a wide variety of applications such as non-player character (NPC) control, procedural content generation (PCG) and deep learning-based content generation. Machine learning is a subset of artificial intelligence that uses historical data to build predictive and analytical models. This is in sharp contrast to traditional methods of artificial intelligence such as search trees and expert systems. Information on machine learning techniques in the field of games is mostly known to public through research projects as most gaming companies choose not to publish specific information about their intellectual property. The most publicly known application of machine learning in games is likely the use of deep learning agents that compete with professional human players in complex strategy games. There has been a significant application of machine learning on games such as Atari/ALE, Doom, Minecraft, StarCraft, and car racing. Other games that did not originally exists as video games, such as chess and Go have also been affected by the machine learning. == Overview of relevant machine learning techniques == === Deep learning === Deep learning is a subset of machine learning which focuses heavily on the use of artificial neural networks (ANN) that learn to solve complex tasks. Deep learning uses multiple layers of ANN and other techniques to progressively extract information from an input. Due to this complex layered approach, deep learning models often require powerful machines to train and run on. ==== Convolutional neural networks ==== Convolutional neural networks (CNN) are specialized ANNs that are often used to analyze image data. These types of networks are able to learn translation invariant patterns, which are patterns that are not dependent on location. CNNs are able to learn these patterns in a hierarchy, meaning that earlier convolutional layers will learn smaller local patterns while later layers will learn larger patterns based on the previous patterns. A CNN's ability to learn visual data has made it a commonly used tool for deep learning in games. === Recurrent neural network === Recurrent neural networks are a type of ANN that are designed to process sequences of data in order, one part at a time rather than all at once. An RNN runs over each part of a sequence, using the current part of the sequence along with memory of previous parts of the current sequence to produce an output. These types of ANN are highly effective at tasks such as speech recognition and other problems that depend heavily on temporal order. There are several types of RNNs with different internal configurations; the basic implementation suffers from a lack of long term memory due to the vanishing gradient problem, thus it is rarely used over newer implementations. ==== Long short-term memory ==== A long short-term memory (LSTM) network is a specific implementation of a RNN that is designed to deal with the vanishing gradient problem seen in simple RNNs, which would lead to them gradually "forgetting" about previous parts of an inputted sequence when calculating the output of a current part. LSTMs solve this problem with the addition of an elaborate system that uses an additional input/output to keep track of long term data. LSTMs have achieved very strong results across various fields, and were used by several monumental deep learning agents in games. === Reinforcement learning === Reinforcement learning is the process of training an agent using rewards and/or punishments. The way an agent is rewarded or punished depends heavily on the problem; such as giving an agent a positive reward for winning a game or a negative one for losing. Reinforcement learning is used heavily in the field of machine learning and can be seen in methods such as Q-learning, policy search, Deep Q-networks and others. It has seen strong performance in both the field of games and robotics. === Neuroevolution === Neuroevolution involves the use of both neural networks and evolutionary algorithms. Instead of using gradient descent like most neural networks, neuroevolution models make use of evolutionary algorithms to update neurons in the network. Researchers claim that this process is less likely to get stuck in a local minimum and is potentially faster than state of the art deep learning techniques. == Deep learning agents == Machine learning agents have been used to take the place of a human player rather than function as NPCs, which are deliberately added into video games as part of designed gameplay. Deep learning agents have achieved impressive results when used in competition with both humans and other artificial intelligence agents. === Chess === Chess is a turn-based strategy game that is considered a difficult AI problem due to the computational complexity of its board space. Similar strategy games are often solved with some form of a Minimax Tree Search. These types of AI agents have been known to beat professional human players, such as the historic 1997 Deep Blue versus Garry Kasparov match. Since then, machine learning agents have shown ever greater success than previous AI agents. === Go === Go is another turn-based strategy game which is considered an even more difficult AI problem than chess. The state space of is Go is around 10^170 possible board states compared to the 10^120 board states for Chess. Prior to recent deep learning models, AI Go agents were only able to play at the level of a human amateur. ==== AlphaGo ==== Google's 2015 AlphaGo was the first AI agent to beat a professional Go player. AlphaGo used a deep learning model to train the weights of a Monte Carlo tree search (MCTS). The deep learning model consisted of 2 ANN, a policy network to predict the probabilities of potential moves by opponents, and a value network to predict the win chance of a given state. The deep learning model allows the agent to explore potential game states more efficiently than a vanilla MCTS. The network were initially trained on games of humans players and then were further trained by games against itself. ==== AlphaGo Zero ==== AlphaGo Zero, another implementation of AlphaGo, was able to train entirely by playing against itself. It was able to quickly train up to the capabilities of the previous agent. === StarCraft series === StarCraft and its sequel StarCraft II are real-time strategy (RTS) video games that have become popular environments for AI research. Blizzard and DeepMind have worked together to release a public StarCraft 2 environment for AI research to be done on. Various deep learning methods have been tested on both games, though most agents usually have trouble outperforming the default AI with cheats enabled or skilled players of the game. ==== Alphastar ==== Alphastar was the first AI agent to beat professional StarCraft 2 players without any in-game advantages. The deep learning network of the agent initially received input from a simplified zoomed out version of the gamestate, but was later updated to play using a camera like other human players. The developers have not publicly released the code or architecture of their model, but have listed several state of the art machine learning techniques such as relational deep reinforcement learning, long short-term memory, auto-regressive policy heads, pointer networks, and centralized value baseline. Alphastar was initially trained with supervised learning, it watched replays of many human games in order to learn basic strategies. It then trained against different versions of itself and was improved through reinforcement learning. The final version was hugely successful, but only trained to play on a specific map in a protoss mirror matchup. === Dota 2 === Dota 2 is a multiplayer online battle arena (MOBA) game. Like other complex games, traditional AI agents have not been able to compete on the same level as professional human player. The only widely published information on AI agents attempted on Dota 2 is OpenAI's deep learning Five agent. ==== OpenAI Five ==== OpenAI Five utilized separate long short-term memory networks to learn each hero. It trained using a reinforcement learning technique known as Proximal Policy Learning running on a system containing 256 GPUs and 128,000 CPU cores. Five trained for months, accumulating 180 years of game experience each day, before facing off with professional players. It was eventually able to beat the 2018 Dota 2 esports champion team in a 2019 series of games. === Planetary Annihilation === Planetary Annihilation is a real-time strategy game which focuses on massive scale war. The developers use ANNs in their default AI agent. === Supreme Commander 2 === Supreme Commander 2 is a real-time strategy (RTS) video game. The game uses Multilayer Perceptrons (MLPs) to control a platoon’s reaction to encountered enemy units. Total of four MLPs are used, one for each platoon type: land, naval

    Read more →
  • Lazy learning

    Lazy learning

    (Not to be confused with the lazy learning regime, see Neural tangent kernel). In machine learning, lazy learning is a learning method in which generalization of the training data is, in theory, delayed until a query is made to the system, as opposed to eager learning, where the system tries to generalize the training data before receiving queries. The primary motivation for employing lazy learning, as in the K-nearest neighbors algorithm, used by online recommendation systems ("people who viewed/purchased/listened to this movie/item/tune also ...") is that the data set is continuously updated with new entries (e.g., new items for sale at Amazon, new movies to view at Netflix, new clips at YouTube, new music at Spotify or Pandora). Because of the continuous update, the "training data" would be rendered obsolete in a relatively short time especially in areas like books and movies, where new best-sellers or hit movies/music are published/released continuously. Therefore, one cannot really talk of a "training phase". Lazy classifiers are most useful for large, continuously changing datasets with few attributes that are commonly queried. Specifically, even if a large set of attributes exist - for example, books have a year of publication, author/s, publisher, title, edition, ISBN, selling price, etc. - recommendation queries rely on far fewer attributes - e.g., purchase or viewing co-occurrence data, and user ratings of items purchased/viewed. == Advantages == The main advantage gained in employing a lazy learning method is that the target function will be approximated locally, such as in the k-nearest neighbor algorithm. Because the target function is approximated locally for each query to the system, lazy learning systems can simultaneously solve multiple problems and deal successfully with changes in the problem domain. At the same time they can reuse a lot of theoretical and applied results from linear regression modelling (notably PRESS statistic) and control. It is said that the advantage of this system is achieved if the predictions using a single training set are only developed for few objects. This can be demonstrated in the case of the k-NN technique, which is instance-based and function is only estimated locally. == Disadvantages == Theoretical disadvantages with lazy learning include: The large space requirement to store the entire training dataset. In practice, this is not an issue because of advances in hardware and the relatively small number of attributes (e.g., as co-occurrence frequency) that need to be stored. Particularly noisy training data increases the case base unnecessarily, because no abstraction is made during the training phase. In practice, as stated earlier, lazy learning is applied to situations where any learning performed in advance soon becomes obsolete because of changes in the data. Also, for the problems for which lazy learning is optimal, "noisy" data does not really occur - the purchaser of a book has either bought another book or hasn't. Lazy learning methods are usually slower to evaluate. In practice, for very large databases with high concurrency loads, the queries are not postponed until actual query time, but recomputed in advance on a periodic basis - e.g., nightly, in anticipation of future queries, and the answers stored. This way, the next time new queries are asked about existing entries in the database, the answers are merely looked up rapidly instead of having to be computed on the fly, which would almost certainly bring a high-concurrency multi-user system to its knees. Larger training data also entail increased cost. Particularly, there is the fixed amount of computational cost, where a processor can only process a limited amount of training data points. There are standard techniques to improve re-computation efficiency so that a particular answer is not recomputed unless the data that impact this answer has changed (e.g., new items, new purchases, new views). In other words, the stored answers are updated incrementally. This approach, used by large e-commerce or media sites, has long been used in the Entrez portal of the National Center for Biotechnology Information (NCBI) to precompute similarities between the different items in its large datasets: biological sequences, 3-D protein structures, published-article abstracts, etc. Because "find similar" queries are asked so frequently, the NCBI uses highly parallel hardware to perform nightly recomputation. The recomputation is performed only for new entries in the datasets against each other and against existing entries: the similarity between two existing entries need not be recomputed. == Examples of Lazy Learning Methods == K-nearest neighbors, which is a special case of instance-based learning. Local regression. Lazy naive Bayes rules, which are extensively used in commercial spam detection software. Here, the spammers keep getting smarter and revising their spamming strategies, and therefore the learning rules must also be continually updated.

    Read more →
  • Aporia (company)

    Aporia (company)

    Aporia is a machine learning observability platform based in Tel Aviv, Israel. The company has a US office located in San Jose, California. Aporia has developed software for monitoring and controlling undetected defects and failures used by other companies to detect and report anomalies, and warn in the early stages of faults. == History == Aporia was founded in 2019 by Liran Hason and Alon Gubkin. In April 2021, the company raised a $5 million seed round for its monitoring platform for ML models. In February 2022, the company closed a Series A round of $25 million for its ML observability platform. Aporia was named by Forbes as the Next Billion-Dollar Company in June 2022. In November, the company partnered with ClearML, an MLOPs platform, to improve ML pipeline optimization. In January 2023, Aporia launched Direct Data Connectors, a novel technology allowing organizations to monitor their ML models in minutes (previously the process of integrating ML monitoring into a customer’s cloud environment took weeks or more.) DDC (Direct Data Connectors) enables users to connect Aporia to their preferred data source and monitor all of their data at once, without data sampling or data duplication (which is a huge security risk for major organizations. In April 2023, Aporia announced the company partnered with Amazon Web Services (AWS) to provide more reliable ML observability to AWS consumers by deploying Aporia's architecture to their AWS environment, this will allow customers to monitor their models in production regardless of platform.

    Read more →
  • VieON

    VieON

    VieON is an mobile application for television and video on demand provided by VieON Joint Stock Company (formerly Dzones), a subsidiary of DatVietVAC Media and Entertainment Group in Vietnam. The app was launched in 2020, featuring over 140 domestic and international television channels, original series, popular entertainment programs known nationwide, top-tier sports events and live streaming of major events. Additionally, VieON provides animated films, television series and television programs from various countries such as South Korea and China. == History == The application was planned for development in 2016, with the cooperation of strategic consulting partner BCG Digital Ventures from the United States. Prior to 2020, VieON was a rebranded version of VTVcab ON, a product managed by Vietnam Cable Television Corporation (VTVCab) and DatVietVAC. On June 15, 2020, after four years of research and testing, the new version of VieON was officially released by DatVietVAC Group, with Vie Channel Joint Stock Company as the business entity and service provider. This is considered the official launch date of the application. On July 21, 2023, VieON transitioned its business operations and service provision to VieON Joint Stock Company. In January 2024, VieON officially launched its global version, VieON Global, targeting Vietnamese users living abroad. == Background == According to Kantar Media Vietnam, up to 84% of Vietnamese people aged 15–54 use social media daily, and in a similar survey by Nielsen, 90% of respondents said they watch live TV weekly. Additionally, according to research organization Muvi, Southeast Asia's OTT market revenue could reach $650 million annually starting next year. Understanding this, DatVietVAC Group has planned to research and develop an OTT application, even though the Vietnamese market already has some major players such as FPT Play and the international giant Netflix. Additionally, DatVietVAC does not hide its ambition to make this application the number one entertainment channel for Vietnamese people.

    Read more →
  • Decision tree pruning

    Decision tree pruning

    Pruning is a data compression technique in machine learning and search algorithms that reduces the size of decision trees by removing sections of the tree that are non-critical and redundant to classify instances. Pruning reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of overfitting. One of the questions that arises in a decision tree algorithm is the optimal size of the final tree. A tree that is too large risks overfitting the training data and poorly generalizing to new samples. A small tree might not capture important structural information about the sample space. However, it is hard to tell when a tree algorithm should stop because it is impossible to tell if the addition of a single extra node will dramatically decrease error. This problem is known as the horizon effect. A common strategy is to grow the tree until each node contains a small number of instances then use pruning to remove nodes that do not provide additional information. Pruning should reduce the size of a learning tree without reducing predictive accuracy as measured by a cross-validation set. There are many techniques for tree pruning that differ in the measurement that is used to optimize performance. == Techniques == Pruning processes can be divided into two types (pre- and post-pruning). Pre-pruning procedures prevent a complete induction of the training set by replacing a stop () criterion in the induction algorithm (e.g. max. Tree depth or information gain (Attr)> minGain). Pre-pruning methods are considered to be more efficient because they do not induce an entire set, but rather trees remain small from the start. Prepruning methods share a common problem, the horizon effect. This is to be understood as the undesired premature termination of the induction by the stop () criterion. Post-pruning (or just pruning) is the most common way of simplifying trees. Here, nodes and subtrees are replaced with leaves to reduce complexity. Pruning can not only significantly reduce the size but also improve the classification accuracy of unseen objects. It may be the case that the accuracy of the assignment on the train set deteriorates, but the accuracy of the classification properties of the tree increases overall. The procedures are differentiated on the basis of their approach in the tree (top-down or bottom-up). === Bottom-up pruning === These procedures start at the last node in the tree (the lowest point). Following recursively upwards, they determine the relevance of each individual node. If the relevance for the classification is not given, the node is dropped or replaced by a leaf. The advantage is that no relevant sub-trees can be lost with this method. These methods include Reduced Error Pruning (REP), Minimum Cost Complexity Pruning (MCCP), or Minimum Error Pruning (MEP). === Top-down pruning === In contrast to the bottom-up method, this method starts at the root of the tree. Following the structure below, a relevance check is carried out which decides whether a node is relevant for the classification of all n items or not. By pruning the tree at an inner node, it can happen that an entire sub-tree (regardless of its relevance) is dropped. One of these representatives is pessimistic error pruning (PEP), which brings quite good results with unseen items. == Pruning algorithms == === Reduced error pruning === One of the simplest forms of pruning is reduced error pruning. Starting at the leaves, each node is replaced with its most popular class. If the prediction accuracy is not affected then the change is kept. While somewhat naive, reduced error pruning has the advantage of simplicity and speed. === Cost complexity pruning === Cost complexity pruning generates a series of trees ⁠ T 0 … T m {\displaystyle T_{0}\dots T_{m}} ⁠ where ⁠ T 0 {\displaystyle T_{0}} ⁠ is the initial tree and ⁠ T m {\displaystyle T_{m}} ⁠ is the root alone. At step ⁠ i {\displaystyle i} ⁠, the tree is created by removing a subtree from tree ⁠ i − 1 {\displaystyle i-1} ⁠ and replacing it with a leaf node with value chosen as in the tree building algorithm. The subtree that is removed is chosen as follows: Define the error rate of tree ⁠ T {\displaystyle T} ⁠ over data set ⁠ S {\displaystyle S} ⁠ as ⁠ err ⁡ ( T , S ) {\displaystyle \operatorname {err} (T,S)} ⁠. The subtree t {\displaystyle t} that minimizes err ⁡ ( prune ⁡ ( T , t ) , S ) − err ⁡ ( T , S ) | leaves ⁡ ( T ) | − | leaves ⁡ ( prune ⁡ ( T , t ) ) | {\displaystyle {\frac {\operatorname {err} (\operatorname {prune} (T,t),S)-\operatorname {err} (T,S)}{\left\vert \operatorname {leaves} (T)\right\vert -\left\vert \operatorname {leaves} (\operatorname {prune} (T,t))\right\vert }}} is chosen for removal. The function ⁠ prune ⁡ ( T , t ) {\displaystyle \operatorname {prune} (T,t)} ⁠ defines the tree obtained by pruning the subtrees ⁠ t {\displaystyle t} ⁠ from the tree ⁠ T {\displaystyle T} ⁠. Once the series of trees has been created, the best tree is chosen by generalized accuracy as measured by a training set or cross-validation. == Examples == Pruning could be applied in a compression scheme of a learning algorithm to remove the redundant details without compromising the model's performances. In neural networks, pruning removes entire neurons or layers of neurons.

    Read more →
  • Character computing

    Character computing

    Character computing is a trans-disciplinary field of research at the intersection of computer science and psychology. It is any computing that incorporates the human character within its context. Character is defined as all features or characteristics defining an individual and guiding their behavior in a specific situation. It consists of stable trait markers (e.g., personality, background, history, socio-economic embeddings, culture,...) and variable state markers (emotions, health, cognitive state, ...). Character computing aims at providing a holistic psychologically driven model of human behavior. It models and predicts behavior based on the relationships between a situation and character. Three main research modules fall under the umbrella of character computing: character sensing and profiling, character-aware adaptive systems, and artificial characters. == Overview == Character computing can be viewed as an extension of the well-established field of affective computing. Based on the foundations of the different psychology branches, it advocates defining behavior as a compound attribute that is not driven by either personality, emotions, situation or cognition alone. It rather defines behavior as a function of everything that makes up an individual i.e., their character and the situation they are in. Affective computing aims at allowing machines to understand and translate the non-verbal cues of individuals into affect. Accordingly, character computing aims at understanding the character attributes of an individual and the situation to translate it to predicted behavior, and vice versa. ''In practical terms, depending on the application context, character computing is a branch of research that deals with the design of systems and interfaces that can observe, sense, predict, adapt to, affect, understand, or simulate the following: character based on behavior and situation, behavior based on character and situation, or situation based on character and behavior.'' The Character-Behavior-Situation (CBS) triad is at the core of character computing and defines each of the three edges based on the other two. Character computing relies on simultaneous development from a computational and psychological perspective and is intended to be used by researchers in both fields. Its main concept is aligning the computational model of character computing with empirical results from in-lab and in-the-wild psychology experiments. The model is to be continuously built and validated through the emergence of new data. Similar to affective and personality computing, the model is to be used as a base for different applications towards improving user experience. == History == Character computing as such was first coined in its first workshop in 2017. Since then it has had 3 international workshops and numerous publications. Despite its young age, it has already drawn some interest in the research community, leading to the publication of the first book under the same title in early 2020 published by Springer Nature. Research that can be categorized under the field dates much older than 2017. The notion of combining several factors towards the explanation of behavior or traits and states has long been investigated in both Psychology and Computer Science, for example. == Character == The word character originates from the Greek word meaning “stamping tool”, referring to distinctive features and traits. Over the years it has been given many different connotations, like the moral character in philosophy, the temperament in psychology, a person in literature or an avatar in various virtual worlds, including video games. According to character computing character is a unification of all the previous definitions, by referring back to the original meaning of the word. Character is defined as the holistic concept representing all interacting trait and state markers that distinguish an individual. Traits are characteristics that mainly remain stable over time. Traits include personality, affect, socio-demographics, and general health. States are characteristics that vary in short periods of time. They include emotions, well-being, health, cognitive state. Each characteristic has many representation methods and psychological models. The different models can be combined or one model can be preset for each characteristic. This depends on the use-case and the design choices. == Areas == Research into character computing can be divided into three areas, which complement each other but can each be investigated separately. The first area is sensing and predicting character states and traits or ensuing behavior. The second area is adapting applications to certain character states or traits and the behavior they predict. It also deals with trying to change or monitor such behavior. The final area deals with creating artificial agents e.g., chatbots or virtual reality avatars that exhibit certain characteristics. The three areas are investigated separately and build on existing findings in the literature. The results of each of the three areas can also be used as a stepping stone for the next area. Each of the three areas has already been investigated on its own in different research fields with focus on different subsets of character. For example, affective computing and personality computing both cover different areas with a focus on some character components without the others to account for human behavior. == The Character-Behavior-Situation triad == Character computing is based on a holistic psychologically driven model of human behavior. Human behavior is modeled and predicted based on the relationships between a situation and a human's character. To further define character in a more formal or holistic manner, we represent it in light of the Character–Behavior–Situation triad. This highlights that character not only determines who we are but how we are, i.e., how we behave. The triad investigated in Personality Psychology is extended through character computing to the Character–Behavior–Situation triad. Any member of the CBS triad is a function of the two other members, e.g., given the situation and personality, the behavior can be predicted. Each of the components in the triad can be further decomposed into smaller units and features that may best represent the human's behavior or character in a particular situation. Character is thus behind a person's behavior in any given situation. While this is a causality relation, the correlation between the three components is often more easily used to predict the components that are most difficult to measure from those measured more easily. There are infinitely many components to include in the representation of any of C, B, and S. The challenge is always to choose the smallest subset needed for prediction of a person's behavior in a particular situation.

    Read more →
  • Empirical dynamic modeling

    Empirical dynamic modeling

    Empirical dynamic modeling (EDM) is a framework for analysis and prediction of nonlinear dynamical systems. Applications include population dynamics, ecosystem service, medicine, neuroscience, dynamical systems, geophysics, and human-computer interaction. EDM was originally developed by Robert May and George Sugihara. It can be considered a methodology for data modeling, predictive analytics, dynamical system analysis, machine learning and time series analysis. == Description == Mathematical models have tremendous power to describe observations of real-world systems. They are routinely used to test hypothesis, explain mechanisms and predict future outcomes. However, real-world systems are often nonlinear and multidimensional, in some instances rendering explicit equation-based modeling problematic. Empirical models, which infer patterns and associations from the data instead of using hypothesized equations, represent a natural and flexible framework for modeling complex dynamics. Donald DeAngelis and Simeon Yurek illustrated that canonical statistical models are ill-posed when applied to nonlinear dynamical systems. A hallmark of nonlinear dynamics is state-dependence: system states are related to previous states governing transition from one state to another. EDM operates in this space, the multidimensional state-space of system dynamics rather than on one-dimensional observational time series. EDM does not presume relationships among states, for example, a functional dependence, but projects future states from localised, neighboring states. EDM is thus a state-space, nearest-neighbors paradigm where system dynamics are inferred from states derived from observational time series. This provides a model-free representation of the system naturally encompassing nonlinear dynamics. A cornerstone of EDM is recognition that time series observed from a dynamical system can be transformed into higher-dimensional state-spaces by time-delay embedding with Takens's theorem. The state-space models are evaluated based on in-sample fidelity to observations, conventionally with Pearson correlation between predictions and observations. == Methods == Primary EDM algorithms include Simplex projection, Sequential locally weighted global linear maps (S-Map) projection, Multivariate embedding in Simplex or S-Map, Convergent cross mapping (CCM), and Multiview Embeding, described below. Nearest neighbors are found according to: NN ( y , X , k ) = ‖ X N i E − y ‖ ≤ ‖ X N j E − y ‖ if 1 ≤ i ≤ j ≤ k {\displaystyle {\text{NN}}(y,X,k)=\|X_{N_{i}}^{E}-y\|\leq \|X_{N_{j}}^{E}-y\|{\text{ if }}1\leq i\leq j\leq k} === Simplex === Simplex projection is a nearest neighbor projection. It locates the k {\displaystyle k} nearest neighbors to the location in the state-space from which a prediction is desired. To minimize the number of free parameters k {\displaystyle k} is typically set to E + 1 {\displaystyle E+1} defining an E + 1 {\displaystyle E+1} dimensional simplex in the state-space. The prediction is computed as the average of the weighted phase-space simplex projected T p {\displaystyle Tp} points ahead. Each neighbor is weighted proportional to their distance to the projection origin vector in the state-space. Find k {\displaystyle k} nearest neighbor: N k ← NN ( y , X , k ) {\displaystyle N_{k}\gets {\text{NN}}(y,X,k)} Define the distance scale: d ← ‖ X N 1 E − y ‖ {\displaystyle d\gets \|X_{N_{1}}^{E}-y\|} Compute weights: For{ i = 1 , … , k {\displaystyle i=1,\dots ,k} } : w i ← exp ⁡ ( − ‖ X N i E − y ‖ / d ) {\displaystyle w_{i}\gets \exp(-\|X_{N_{i}}^{E}-y\|/d)} Average of state-space simplex: y ^ ← ∑ i = 1 k ( w i X N i + T p ) / ∑ i = 1 k w i {\displaystyle {\hat {y}}\gets \sum _{i=1}^{k}\left(w_{i}X_{N_{i}+T_{p}}\right)/\sum _{i=1}^{k}w_{i}} === S-Map === S-Map extends the state-space prediction in Simplex from an average of the E + 1 {\displaystyle E+1} nearest neighbors to a linear regression fit to all neighbors, but localised with an exponential decay kernel. The exponential localisation function is F ( θ ) = exp ( − θ d / D ) {\displaystyle F(\theta )={\text{exp}}(-\theta d/D)} , where d {\displaystyle d} is the neighbor distance and D {\displaystyle D} the mean distance. In this way, depending on the value of θ {\displaystyle \theta } , neighbors close to the prediction origin point have a higher weight than those further from it, such that a local linear approximation to the nonlinear system is reasonable. This localisation ability allows one to identify an optimal local scale, in-effect quantifying the degree of state dependence, and hence nonlinearity of the system. Another feature of S-Map is that for a properly fit model, the regression coefficients between variables have been shown to approximate the gradient (directional derivative) of variables along the manifold. These Jacobians represent the time-varying interaction strengths between system variables. Find k {\displaystyle k} nearest neighbor: N ← NN ( y , X , k ) {\displaystyle N\gets {\text{NN}}(y,X,k)} Sum of distances: D ← 1 k ∑ i = 1 k ‖ X N i E − y ‖ {\displaystyle D\gets {\frac {1}{k}}\sum _{i=1}^{k}\|X_{N_{i}}^{E}-y\|} Compute weights: For{ i = 1 , … , k {\displaystyle i=1,\dots ,k} } : w i ← exp ⁡ ( − θ ‖ X N i E − y ‖ / D ) {\displaystyle w_{i}\gets \exp(-\theta \|X_{N_{i}}^{E}-y\|/D)} Reweighting matrix: W ← diag ( w i ) {\displaystyle W\gets {\text{diag}}(w_{i})} Design matrix: A ← [ 1 X N 1 X N 1 − 1 … X N 1 − E + 1 1 X N 2 X N 2 − 1 … X N 2 − E + 1 ⋮ ⋮ ⋮ ⋱ ⋮ 1 X N k X N k − 1 … X N k − E + 1 ] {\displaystyle A\gets {\begin{bmatrix}1&X_{N_{1}}&X_{N_{1}-1}&\dots &X_{N_{1}-E+1}\\1&X_{N_{2}}&X_{N_{2}-1}&\dots &X_{N_{2}-E+1}\\\vdots &\vdots &\vdots &\ddots &\vdots \\1&X_{N_{k}}&X_{N_{k}-1}&\dots &X_{N_{k}-E+1}\end{bmatrix}}} Weighted design matrix: A ← W A {\displaystyle A\gets WA} Response vector at T p {\displaystyle Tp} : b ← [ X N 1 + T p X N 2 + T p ⋮ X N k + T p ] {\displaystyle b\gets {\begin{bmatrix}X_{N_{1}+T_{p}}\\X_{N_{2}+T_{p}}\\\vdots \\X_{N_{k}+T_{p}}\end{bmatrix}}} Weighted response vector: b ← W b {\displaystyle b\gets Wb} Least squares solution (SVD): c ^ ← argmin c ‖ A c − b ‖ 2 2 {\displaystyle {\hat {c}}\gets {\text{argmin}}_{c}\|Ac-b\|_{2}^{2}} Local linear model c ^ {\displaystyle {\hat {c}}} is prediction: y ^ ← c ^ 0 + ∑ i = 1 E c ^ i y i {\displaystyle {\hat {y}}\gets {\hat {c}}_{0}+\sum _{i=1}^{E}{\hat {c}}_{i}y_{i}} === Multivariate Embedding === Multivariate Embedding recognizes that time-delay embeddings are not the only valid state-space construction. In Simplex and S-Map one can generate a state-space from observational vectors, or time-delay embeddings of a single observational time series, or both. === Convergent Cross Mapping === Convergent cross mapping (CCM) leverages a corollary to the Generalized Takens Theorem that it should be possible to cross predict or cross map between variables observed from the same system. Suppose that in some dynamical system involving variables X {\displaystyle X} and Y {\displaystyle Y} , X {\displaystyle X} causes Y {\displaystyle Y} . Since X {\displaystyle X} and Y {\displaystyle Y} belong to the same dynamical system, their reconstructions (via embeddings) M x {\displaystyle M_{x}} , and M y {\displaystyle M_{y}} , also map to the same system. The causal variable X {\displaystyle X} leaves a signature on the affected variable Y {\displaystyle Y} , and consequently, the reconstructed states based on Y {\displaystyle Y} can be used to cross predict values of X {\displaystyle X} . CCM leverages this property to infer causality by predicting X {\displaystyle X} using the M y {\displaystyle M_{y}} library of points (or vice versa for the other direction of causality), while assessing improvements in cross map predictability as larger and larger random samplings of M y {\displaystyle M_{y}} are used. If the prediction skill of X {\displaystyle X} increases and saturates as the entire M y {\displaystyle M_{y}} is used, this provides evidence that X {\displaystyle X} is casually influencing Y {\displaystyle Y} . === Multiview Embedding === Multiview Embedding is a Dimensionality reduction technique where a large number of state-space time series vectors are combitorially assessed towards maximal model predictability. == Extensions == Extensions to EDM techniques include: Generalized Theorems for Nonlinear State Space Reconstruction Extended Convergent Cross Mapping Dynamic stability S-Map regularization Visual analytics with EDM Convergent Cross Sorting Expert system with EDM hybrid Sliding windows based on the extended convergent cross-mapping Empirical Mode Modeling Accounting for missing data and variable step sizes Accounting for observation noise Hierarchical Bayesian EDM via Gaussian processes Intelligent and Adaptive Control Optimal control via Empirical dynamic programming Multiview distance regularised S-map

    Read more →
  • Algorithmic bias

    Algorithmic bias

    Algorithmic bias describes systematic and repeatable harmful tendency in a computerized sociotechnical system to create "unfair" outcomes, such as "privileging" one category over another in ways that may or may not be different from the intended function of the algorithm. Bias can emerge from many factors, including intentionally biased design decisions or the unintended or unanticipated use or decisions relating to the way data is coded, collected, selected or used to train the algorithm. For example, algorithmic bias has been observed in search engine results and social media platforms. This bias can have impacts ranging from privacy violations to reinforcing social biases of race, gender, sexuality, and ethnicity. The study of algorithmic bias is most concerned with algorithms that reflect "systematic and unfair" discrimination. This bias has only recently been addressed in legal frameworks, such as the European Union's General Data Protection Regulation (enforced in 2018) and the Artificial Intelligence Act (proposed in 2021 and adopted in 2024). As algorithms expand their ability to organize society, politics, institutions, and behavior, sociologists have become concerned with the ways in which unanticipated output and manipulation of data can impact the physical world. Because algorithms are often considered to be neutral and unbiased, they can inaccurately project greater authority than human expertise (in part due to the psychological phenomenon of automation bias), and in some cases, reliance on algorithms can displace human responsibility for their outcomes, without last mile thinking. Bias can enter into algorithmic systems as a result of pre-existing cultural, social, or institutional expectations; by how features and labels are chosen; because of technical limitations of their design; or by being used in unanticipated contexts or by audiences who are not considered in the software's initial design. Algorithmic bias has been cited in cases ranging from election outcomes to the spread of online hate speech. It has also arisen in criminal justice, healthcare, and hiring, compounding existing racial, socioeconomic, and gender biases. The relative inability of facial recognition technology to accurately identify darker-skinned faces has been linked to multiple wrongful arrests of black men, an issue stemming from imbalanced datasets. Problems in understanding, researching, and discovering algorithmic bias persist due to the proprietary nature of algorithms, which are typically treated as trade secrets. Even when full transparency is provided, the complexity of certain algorithms poses a barrier to understanding their functioning. Furthermore, algorithms may change, or respond to input or output in ways that cannot be anticipated or easily reproduced for analysis. In many cases, even within a single website or application, there is no single "algorithm" to examine, but a network of many interrelated programs and data inputs, even between users of the same service. A 2021 survey identified multiple forms of algorithmic bias, including historical, representation, and measurement biases, each of which can contribute to unfair outcomes. == Definitions == Algorithms are difficult to define, but may be generally understood as lists of instructions that determine how programs read, collect, process, and analyze data to generate a usable output. For a rigorous technical introduction, see Algorithms. Advances in computer hardware and software have led to an increased capability to process, store and transmit data. This has in turn made the design and adoption of technologies such as machine learning and artificial intelligence technically and commercially feasible. By analyzing and processing data, algorithms are the backbone of search engines, social media websites, recommendation engines, online retail, online advertising, and more. Contemporary social scientists are concerned with algorithmic processes embedded into hardware and software applications because of their political and social impact, and question the underlying assumptions of an algorithm's neutrality. The term algorithmic bias describes systematic and repeatable errors that create unfair outcomes, such as privileging one arbitrary group of users over others. For example, a credit score algorithm may deny a loan without being unfair, if it is consistently weighing relevant financial criteria. If the algorithm recommends loans to one group of users, but denies loans to another set of nearly identical users based on unrelated criteria, and if this behavior can be repeated across multiple occurrences, an algorithm can be described as biased. This bias may be intentional or unintentional (for example, it can come from biased data obtained from a worker that previously did the job the algorithm is going to do from now on). == Methods == Bias can be introduced to an algorithm in several ways. During the assemblage of a dataset, data may be collected, digitized, adapted, and entered into a database according to human-designed cataloging criteria. Next, programmers assign priorities, or hierarchies, for how a program assesses and sorts that data. This requires human decisions about how data is categorized, and which data is included or discarded. Some algorithms collect their own data based on human-selected criteria, which can also reflect the bias of human designers. Other algorithms may reinforce stereotypes and preferences as they process and display "relevant" data for human users, for example, by selecting information based on previous choices of a similar user or group of users. Beyond assembling and processing data, bias can emerge as a result of design. For example, algorithms that determine the allocation of resources or scrutiny (such as determining school placements) may inadvertently discriminate against a category when determining risk based on similar users (as in credit scores). Meanwhile, recommendation engines that work by associating users with similar users, or that make use of inferred marketing traits, might rely on inaccurate associations that reflect broad ethnic, gender, socio-economic, or racial stereotypes. Another example comes from determining criteria for what is included and excluded from results. These criteria could present unanticipated outcomes for search results, such as with flight-recommendation software that omits flights that do not follow the sponsoring airline's flight paths. Algorithms may also display an uncertainty bias, offering more confident assessments when larger data sets are available. This can skew algorithmic processes toward results that more closely correspond with larger samples, which may disregard data from underrepresented populations. == History == === Early critiques === The earliest computer programs were designed to mimic human reasoning and deductions, and were deemed to be functioning when they successfully and consistently reproduced that human logic. In his 1976 book Computer Power and Human Reason, artificial intelligence pioneer Joseph Weizenbaum suggested that bias could arise both from the data used in a program, but also from the way a program is coded. Weizenbaum wrote that programs are a sequence of rules created by humans for a computer to follow. By following those rules consistently, such programs "embody law", that is, enforce a specific way to solve problems. The rules a computer follows are based on the assumptions of a computer programmer for how these problems might be solved. That means the code could incorporate the programmer's imagination of how the world works, including their biases and expectations. While a computer program can incorporate bias in this way, Weizenbaum also noted that any data fed to a machine additionally reflects "human decision making processes" as data is being selected. Finally, he noted that machines might also transfer good information with unintended consequences if users are unclear about how to interpret the results. Weizenbaum warned against trusting decisions made by computer programs that a user doesn't understand, comparing such faith to a tourist who can find his way to a hotel room exclusively by turning left or right on a coin toss. Crucially, the tourist has no basis of understanding how or why he arrived at his destination, and a successful arrival does not mean the process is accurate or reliable. An early example of algorithmic bias resulted in as many as 60 women and ethnic minorities denied entry to St. George's Hospital Medical School per year from 1982 to 1986, based on implementation of a new computer-guidance assessment system that denied entry to women and men with "foreign-sounding names" based on historical trends in admissions. While many schools at the time employed similar biases in their selection process, St. George was most notable for automating said bias through the use of an algorithm, thus gaining the attention of people on a much

    Read more →
  • Moral outsourcing

    Moral outsourcing

    Moral outsourcing is the placing of responsibility for ethical decision-making onto external entities, often algorithms. The term is often used in discussions of computer science and algorithmic fairness, but it can apply to any situation in which one appeals to outside agents in order to absolve themselves of responsibility for their actions. In this context, moral outsourcing specifically refers to the tendency of society to blame technology, rather than its creators or users, for any harm it may cause. == Definition == The term "moral outsourcing" was first coined by Dr. Rumman Chowdhury, a data scientist concerned with the overlap between artificial intelligence and social issues. Chowdhury used the term to describe looming fears of a so-called “Fourth Industrial Revolution” following the rise of artificial intelligence. Moral outsourcing is often applied by technologists to shrink away from their part in building offensive products. In her TED Talk, Chowdhury gives the example of a creator excusing their work by saying they were simply doing their job. This is a case of moral outsourcing and not taking ownership for the consequences of creation. When it comes to AI, moral outsourcing allows for creators to decide when the machine is human and when it is a computer - shifting the blame and responsibility of moral plights off of the technologists and onto the technology. Conversations around AI and bias and its impacts require accountability to bring change. It is difficult to address these biased systems if their creators use moral outsourcing to avoid taking any responsibility for the issue. One example of moral outsourcing is the anger that is directed at machines for “taking jobs away from humans” rather than companies for employing that technology and jeopardizing jobs in the first place. The term "moral outsourcing" refers to the concept of outsourcing, or enlisting an external operation to complete specific work for another organization. In the case of moral outsourcing, the work of resolving moral dilemmas or making choices according to an ethical code is supposed to be conducted by another entity. == Real-world applications == In the medical field, AI is increasingly involved in decision-making processes about which patients to treat, and how to treat them. The responsibility of the doctor to make informed decisions about what is best for their patients is outsourced to an algorithm. Sympathy is also noted to be an important part of medical practice; an aspect that artificial intelligence, glaringly, is missing. This form of moral outsourcing is a major concern in the medical community. Another field of technology in which moral outsourcing is frequently brought up is autonomous vehicles. California Polytechnic State University professor Keith Abney proposed an example scenario: "Suppose we have some [troublemaking] teenagers, and they see an autonomous vehicle, they drive right at it. They know the autonomous vehicle will swerve off the road and go off a cliff, but should it?" The decision of whether to sacrifice the autonomous vehicle (and any passengers inside) or the vehicle coming at it will be written into the algorithms defining the car's behavior. In the case of moral outsourcing, the responsibility of any damage caused by an accident may be attributed to the autonomous vehicle itself, rather than the creators who wrote the protocol the vehicle will use to "decide" what to do. Moral outsourcing is also used to delegate the consequences of predictive policing algorithms to technology, rather than the creators or the police. There are many ethical concerns with predictive policing due to the fact that it results in the over-policing of low income and minority communities. In the context of moral outsourcing, the positive feedback loop of sending disproportionate police forces into minority communities is attributed to the algorithm and the data being fed into this system--rather than the users and creators of the predictive policing technology. == Outside of technology == === Religion === Moral outsourcing is also commonly seen in appeals to religion to justify discrimination or harm. In his book What It Means to be Moral, sociologist Phil Zuckerman contradicts the popular religious notion that morality comes from God. Religion is oftentimes cited as a foundation for a moral stance without any tangible relation between the religious beliefs and personal stance. In these cases, religious individuals will "outsource" their personal beliefs and opinions by claiming that they are a result of their religious identification. This is seen where religion is cited as a factor for political beliefs, medical beliefs, and in extreme cases an excuse for violence. === Manufacturing === Moral outsourcing can also be seen in the business world in terms of manufacturing goods and avoiding environmental responsibility. Some companies in the United States will move their production process to foreign countries with more relaxed environmental policies to avoid the pollution laws that exist in the US. A study by the Harvard Business Review found that "in countries with tight environmental regulation, companies have 29% lower domestic emissions on average. On the other hand, such a tightening in regulation results in 43% higher emissions abroad." The consequences of higher pollution rates are then attributed to the loose regulations in these countries, rather than on the companies themselves who purposefully moved into these areas to avoid strict pollution policy.

    Read more →
  • AI safety

    AI safety

    AI safety is an interdisciplinary field focused on preventing accidents, misuse, or other harmful consequences arising from artificial intelligence systems. It encompasses AI alignment (which aims to ensure AI systems behave as intended), monitoring AI systems for risks, and enhancing their robustness. The field is particularly concerned with existential risks posed by advanced AI models. Beyond technical research, AI safety involves developing norms and policies that promote safety, including advocacy for regulations at different levels of government. The field gained significant popularity in 2023, with rapid progress in generative AI and public concerns voiced by researchers and CEOs about potential dangers. During the 2023 AI Safety Summit, the United States and the United Kingdom both established their own AI Safety Institute. However, researchers have expressed concern that AI safety measures are not keeping pace with the rapid development of AI capabilities. == Motivations == Scholars discuss current risks from critical systems failures, bias, and AI-enabled surveillance, as well as emerging risks like technological unemployment, digital manipulation, weaponization, AI-enabled cyberattacks and bioterrorism. They also discuss speculative risks from losing control of future artificial general intelligence (AGI) agents, or from AI enabling perpetually stable dictatorships. === Existential safety === Some have criticized concerns about AGI, such as Andrew Ng who compared them in 2015 to "worrying about overpopulation on Mars when we have not even set foot on the planet yet". Stuart J. Russell on the other side urges caution, arguing that "it is better to anticipate human ingenuity than to underestimate it". AI researchers have widely differing opinions about the severity and primary sources of risk posed by AI technology – though surveys suggest that experts take high consequence risks seriously. In two surveys of AI researchers, the median respondent was optimistic about AI overall, but placed a 5% probability on an "extremely bad (e.g. human extinction)" outcome of advanced AI. In a 2022 survey of the natural language processing community, 37% agreed or weakly agreed that it is plausible that AI decisions could lead to a catastrophe that is "at least as bad as an all-out nuclear war". == History == Risks from AI began to be seriously discussed at the start of the computer age: Moreover, if we move in the direction of making machines which learn and whose behavior is modified by experience, we must face the fact that every degree of independence we give the machine is a degree of possible defiance of our wishes. In 1988 Blay Whitby published a book outlining the need for AI to be developed along ethical and socially responsible lines. From 2008 to 2009, the Association for the Advancement of Artificial Intelligence (AAAI) commissioned a study to explore and address potential long-term societal influences of AI research and development. The panel was generally skeptical of the radical views expressed by science-fiction authors but agreed that "additional research would be valuable on methods for understanding and verifying the range of behaviors of complex computational systems to minimize unexpected outcomes". In 2011, Roman Yampolskiy introduced the term "AI safety engineering" at the Philosophy and Theory of Artificial Intelligence conference, listing prior failures of AI systems and arguing that "the frequency and seriousness of such events will steadily increase as AIs become more capable". In 2014, philosopher Nick Bostrom published the book Superintelligence: Paths, Dangers, Strategies. He has the opinion that the rise of AGI has the potential to create various societal issues, ranging from the displacement of the workforce by AI, manipulation of political and military structures, to even the possibility of human extinction. His argument that future advanced systems may pose a threat to human existence prompted Elon Musk, Bill Gates, and Stephen Hawking to voice similar concerns. In 2015, dozens of artificial intelligence experts signed an open letter on artificial intelligence calling for research on the societal impacts of AI and outlining concrete directions. To date, the letter has been signed by over 8000 people including Yann LeCun, Shane Legg, Yoshua Bengio, and Stuart Russell. In the same year, a group of academics led by professor Stuart J. Russell founded the Center for Human-Compatible AI at the University of California Berkeley and the Future of Life Institute awarded $6.5 million in grants for research aimed at "ensuring artificial intelligence (AI) remains safe, ethical and beneficial". In 2016, the White House Office of Science and Technology Policy and Carnegie Mellon University announced The Public Workshop on Safety and Control for Artificial Intelligence, which was one of a sequence of four White House workshops aimed at investigating "the advantages and drawbacks" of AI. In the same year, Concrete Problems in AI Safety – one of the first and most influential technical AI Safety agendas – was published. In 2017, the Future of Life Institute sponsored the Asilomar Conference on Beneficial AI, where more than 100 thought leaders formulated principles for beneficial AI including "Race Avoidance: Teams developing AI systems should actively cooperate to avoid corner-cutting on safety standards". In 2018, the DeepMind Safety team outlined AI safety problems in specification, robustness, and assurance. The following year, researchers organized a workshop at ICLR that focused on these problem areas. In 2021, Unsolved Problems in ML Safety was published, outlining research directions in robustness, monitoring, alignment, and systemic safety. In 2023, Rishi Sunak said he wants the United Kingdom to be the "geographical home of global AI safety regulation" and to host the first global summit on AI safety. The AI safety summit took place in November 2023, and focused on the risks of misuse and loss of control associated with frontier AI models. During the summit the intention to create the International Scientific Report on the Safety of Advanced AI was announced. In 2024, The US and UK forged a new partnership on the science of AI safety. The MoU was signed on 1 April 2024 by US commerce secretary Gina Raimondo and UK technology secretary Michelle Donelan to jointly develop advanced AI model testing, following commitments announced at an AI Safety Summit in Bletchley Park in November. In 2025, an international team of 96 experts chaired by Yoshua Bengio published the first International AI Safety Report. The report, commissioned by 30 nations and the United Nations, represents the first global scientific review of potential risks associated with advanced artificial intelligence. It details potential threats stemming from misuse, malfunction, and societal disruption, with the objective of informing policy through evidence-based findings, without providing specific recommendations. == Research focus == AI safety research areas include robustness, monitoring, and alignment. === Robustness === ==== Adversarial robustness ==== AI systems are often vulnerable to adversarial examples or "inputs to machine learning (ML) models that an attacker has intentionally designed to cause the model to make a mistake". For example, in 2013, Szegedy et al. discovered that adding specific imperceptible perturbations to an image could cause it to be misclassified with high confidence. This continues to be an issue with neural networks, though in recent work the perturbations are generally large enough to be perceptible. The image on the right is predicted to be an ostrich after the perturbation is applied. (Left) is a correctly predicted sample, (center) perturbation applied magnified by 10x, (right) adversarial example. Adversarial robustness is often associated with security. Researchers demonstrated that an audio signal could be imperceptibly modified so that speech-to-text systems transcribe it to any message the attacker chooses. Network intrusion and malware detection systems also must be adversarially robust since attackers may design their attacks to fool detectors. Models that represent objectives (reward models) must also be adversarially robust. For example, a reward model might estimate how helpful a text response is and a language model might be trained to maximize this score. Researchers have shown that if a language model is trained for long enough, it will leverage the vulnerabilities of the reward model to achieve a better score and perform worse on the intended task. This issue can be addressed by improving the adversarial robustness of the reward model. More generally, any AI system used to evaluate another AI system must be adversarially robust. This could include monitoring tools, since they could also potentially be tampered with to produce a higher reward. Large language models (LLMs) can be vulnerable to prom

    Read more →