AI Coding Book

AI Coding Book — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Poop Map

    Poop Map

    Poop Map is a social app where users can track on a map where and when they defecate. In addition to logging location and time of each bowel movement, users can also add a photo, "like" other users' logs, and rate each account. The social elements of the app allow for groups of users to create a competitive league. Certain behaviors unlock achievements in-app. == Development == The app was created by app developer Nino Uzelac. It was launched in July 2013. == Popularity == The app charted at number one on the Apple App Store charts in 2021 after going viral on TikTok. As of September 2024, the app has a 4.8 rating on the App Store and more than 58,000 ratings. It also has more than one million downloads on the Google Play Store. Poop Map is notably popular among hikers, and has been written about in the outdoors magazine Outside.

    Read more →
  • Neuro-symbolic AI

    Neuro-symbolic AI

    Neuro-symbolic AI is a subfield of artificial intelligence that integrates neural methods (e.g., neural networks and deep learning) with symbolic methods (e.g., formal logic, knowledge representation, and automated reasoning). The goal is to combine the strengths of both approaches, resulting in AI systems that can be trained from raw data and demonstrate robustness against outliers or errors in the base data, while preserving explainability, explicit use of expert knowledge, and explicit cognitive reasoning. As argued by Leslie Valiant and others, the effective construction of rich computational cognitive models demands the combination of symbolic reasoning and efficient machine learning. Gary Marcus argued, "We cannot construct rich cognitive models in an adequate, automated way without the triumvirate of hybrid architecture, rich prior knowledge, and sophisticated techniques for reasoning." Further, "To build a robust, knowledge-driven approach to AI we must have the machinery of symbol manipulation in our toolkit. Too much of useful knowledge is abstract to make do without tools that represent and manipulate abstraction, and to date, the only known machinery that can manipulate such abstract knowledge reliably is the apparatus of symbol manipulation." Angelo Dalli, Henry Kautz, Francesca Rossi, and Bart Selman also argued for such a synthesis. Their arguments attempt to address the two kinds of thinking, as discussed in Daniel Kahneman's book Thinking, Fast and Slow. It describes cognition as encompassing two components: System 1 is fast, reflexive, intuitive, and unconscious. System 2 is slower, step-by-step, and explicit. System 1 is used for pattern recognition. System 2 handles planning, deduction, and deliberative thinking. In this view, deep learning best handles the first kind of cognition, while symbolic reasoning best handles the second kind. Both are necessary for the development of a robust and reliable AI system capable of learning, reasoning, and interacting with humans to accept advice and answer questions. Since the 1990s, dual-process models with explicit references to the two contrasting systems have been the focus of research in both the fields of AI and cognitive science by numerous researchers. In 2025, the adoption of neurosymbolic AI, an approach that integrates neural networks with symbolic reasoning, increased in response to the need to address hallucination issues in large language models. For example, Amazon implemented Neurosymbolic AI in its Vulcan warehouse robots and Rufus shopping assistant to enhance accuracy and decision-making. == Approaches == Approaches for integration are diverse. Henry Kautz's taxonomy of neuro-symbolic architectures follows, along with some examples: Symbolic Neural symbolic is the current approach of many neural models in natural language processing, where words or subword tokens are the ultimate input and output of large language models. Examples include BERT, RoBERTa, and GPT-3. Symbolic[Neural] is exemplified by AlphaGo, where symbolic techniques are used to invoke neural techniques. In this case, the symbolic approach is Monte Carlo tree search and the neural techniques learn how to evaluate game positions. Neural | Symbolic uses a neural architecture to interpret perceptual data as symbols and relationships that are reasoned about symbolically. Neural-Concept Learner is an example. Neural: Symbolic → Neural relies on symbolic reasoning to generate or label training data that is subsequently learned by a deep learning model, e.g., to train a neural model for symbolic computation by using a Macsyma-like symbolic mathematics system to create or label examples. NeuralSymbolic uses a neural net that is generated from symbolic rules. An example is the Neural Theorem Prover, which constructs a neural network from an AND-OR proof tree generated from knowledge base rules and terms. Logic Tensor Networks also fall into this category. Neural[Symbolic] according to Kautz, this approach embeds true symbolic reasoning inside a neural network. These are tightly-coupled neural-symbolic systems, in which the logical inference rules are internal to the neural network. This way, the neural network internally computes the inference from the premises and learns to reason based on logical inference systems. Early work on connectionist modal and temporal logics by Garcez, Lamb, and Gabbay is aligned with this approach. These categories are not exhaustive, as they do not consider multi-agent systems. In 2005, Bader and Hitzler presented a more fine-grained categorization that took into account, e.g., whether the use of symbols included logic and, if so, whether the logic was propositional or first-order logic. The 2005 categorization and Kautz's taxonomy above are compared and contrasted in a 2021 article. Sepp Hochreiter argued that Graph Neural Networks "...are the predominant models of neural-symbolic computing" since "[t]hey describe the properties of molecules, simulate social networks, or predict future states in physical and engineering applications with particle-particle interactions." == Artificial general intelligence == Gary Marcus argues that "...hybrid architectures that combine learning and symbol manipulation are necessary for robust intelligence, but not sufficient", and that there are ...four cognitive prerequisites for building robust artificial intelligence: hybrid architectures that combine large-scale learning with the representational and computational powers of symbol manipulation, large-scale knowledge bases—likely leveraging innate frameworks—that incorporate symbolic knowledge along with other forms of knowledge, reasoning mechanisms capable of leveraging those knowledge bases in tractable ways, and rich cognitive models that work together with those mechanisms and knowledge bases. This echoes earlier calls for hybrid models as early as the 1990s. == History == Garcez and Lamb described research in this area as ongoing, at least since the 1990s. During that period, the terms symbolic and sub-symbolic AI were popular. A series of workshops on neuro-symbolic AI has been held annually since 2005 Neuro-Symbolic Artificial Intelligence. In the early 1990s, an initial set of workshops on this topic were organized. == Research == Key research questions remain, such as: What is the best way to integrate neural and symbolic architectures? How should symbolic structures be represented within neural networks and extracted from them? How should common-sense knowledge be learned and reasoned about? How can abstract knowledge that is hard to encode logically be handled? == Implementations == Implementations of neuro-symbolic approaches include: AllegroGraph: an integrated Knowledge Graph based platform for neuro-symbolic application development. Scallop: a language based on Datalog that supports differentiable logical and relational reasoning. Scallop can be integrated in Python and with a PyTorch learning module. Logic Tensor Networks: encode logical formulas as neural networks and simultaneously learn term encodings, term weights, and formula weights. DeepProbLog: combines neural networks with the probabilistic reasoning of ProbLog. Abductive Learning: integrates machine learning and logical reasoning in a balanced-loop via abductive reasoning, enabling them to work together in a mutually beneficial way. SymbolicAI: a compositional differentiable programming library.

    Read more →
  • Intelligent control

    Intelligent control

    Intelligent control is a class of control techniques that use various artificial intelligence computing approaches like neural networks, Bayesian probability, fuzzy logic, machine learning, reinforcement learning, evolutionary computation and genetic algorithms. == Overview == Intelligent control can be divided into the following major sub-domains: Neural network control Machine learning control Reinforcement learning Bayesian control Fuzzy control Neuro-fuzzy control Expert Systems Genetic control New control techniques are created continuously as new models of intelligent behavior are created and computational methods developed to support them. === Neural network controller === Neural networks have been used to solve problems in almost all spheres of science and technology. Neural network control basically involves two steps: System identification Control It has been shown that a feedforward network with nonlinear, continuous and differentiable activation functions have universal approximation capability. Recurrent networks have also been used for system identification. Given, a set of input-output data pairs, system identification aims to form a mapping among these data pairs. Such a network is supposed to capture the dynamics of a system. For the control part, deep reinforcement learning has shown its ability to control complex systems. === Bayesian controllers === Bayesian probability has produced a number of algorithms that are in common use in many advanced control systems, serving as state space estimators of some variables that are used in the controller. The Kalman filter and the Particle filter are two examples of popular Bayesian control components. The Bayesian approach to controller design often requires an important effort in deriving the so-called system model and measurement model, which are the mathematical relationships linking the state variables to the sensor measurements available in the controlled system. In this respect, it is very closely linked to the system-theoretic approach to control design.

    Read more →
  • Enterprise cognitive system

    Enterprise cognitive system

    Enterprise cognitive systems (ECS) are part of a broader shift in computing, from a programmatic to a probabilistic approach, called cognitive computing. An Enterprise Cognitive System makes a new class of complex decision support problems computable, where the business context is ambiguous, multi-faceted, and fast-evolving, and what to do in such a situation is usually assessed today by the business user. An ECS is designed to synthesize a business context and link it to the desired outcome. It recommends evidence-based actions to help the end-user achieve the desired outcome. It does so by finding past situations similar to the current situation, and extracting the repeated actions that best influence the desired outcome. While general-purpose cognitive systems can be used for different outputs, prescriptive, suggestive, instructive, or simply entertaining, an enterprise cognitive system is focused on action, not insight, to help in assessing what to do in a complex situation. == Key characteristics == ECS have to be: Adaptive: They must learn as information changes, and as goals and requirements evolve. They must resolve ambiguity and tolerate unpredictability. They must be engineered to feed on dynamic data in real time, or near real time. In the Enterprise, near-real time learning from data requires an agile information federation approach to ingest incremental data updates as they occur, and an unsupervised learning approach to ensure that new best practice is leveraged across the organization in a timely manner. Interactive: They must interact easily with users so that those users can define their needs comfortably. They may also interact with other processors, devices, and Cloud services, as well as with people. In the Enterprise, interactions are controlled via existing workflows and UIs. Therefore, embedding best practices directly into these existing interfaces, in the context of a specific step, is critical to ensure maximum end-user adoption. Iterative and stateful: They must aid in defining a problem by asking questions or finding additional source input if a problem statement is ambiguous or incomplete. They must “remember” previous interactions in a process and return information that is suitable for the specific application at that point in time. In the Enterprise, business context is often structured by a business process, and therefore sufficiently data-rich to make relevant recommendations without significant iterations from the end-user. A stateful memory of overall interactions across communication channels is critical for understanding of context, as a static profile will not capture intent and outcome potential the way behavior does. Contextual: They must understand, identify, and extract contextual elements such as meaning, syntax, time, location, appropriate domain, regulations, user's profile, process, task and goal. They may draw on multiple sources of information, including both structured and unstructured digital information, as well as sensory inputs (visual, gestural, auditory, or sensor-provided). In the Enterprise, Context is fragmented and must be aggregated across data types, sources, and locations. In most business environments, such data is captured in existing enterprise information systems, and the effort is linked to quickly source and unify such information. It is rare to have to directly process sensor, audio or visual data in real-time as direct input into the enterprise cognitive system. Instead, these data types are captured by Enterprise Applications and pre-processed into a binary or text format prior to consumption by the System. == Business applications powered by an ECS == Bottlenose – trends and brands monitoring Cybereason – security threat monitoring Dataminr – social media monitoring

    Read more →
  • Automatic scorer

    Automatic scorer

    An automatic scorer is the computerized scoring system to keep track of scoring in ten-pin bowling. It was introduced en masse in bowling alleys in the 1970s and combined with mechanical pinsetters to detect overturned pins. By eliminating the need for manual score-keeping, these systems have introduced new bowlers into the game who otherwise would not participate because they had to count the score themselves, as many do not understand the mathematical formula involved in bowler scoring. At first, people were skeptical about whether a computer could keep an accurate score. In the twenty-first century, automatic scorers are used in most bowling centers around the world. The three manufacturers of these specialty computers have been Brunswick Bowling, AMF Bowling (later QubicaAMF), and RCA. == History == Automatic equipment is considered a cornerstone of the modern bowling center. The traditional bowling center of the early 20th century was advanced in automation when the pinsetter person ("pin boy"), who set back up by hand the bowled down pins, was replaced by a machine that automatically replaced the pins in their proper play positions. This machine came out in the 1950s. A detection system was developed from the pinsetter mechanism in the 1960s that could tell which pins had been knocked down, and that information could be transferred to a digital computer. Automatic electronic scoring was first conceived by Robert Reynolds, who was described by a newspaper story at the time as "a West Coast electronics calculator expert." He worked with the technical staff of Brunswick Bowling to develop it. The goal was realized in the late 1960s when a specialized computer was designed for the purpose of automatic scorekeeping for bowling. The field test for the automatic scorer took place at Village Lanes bowling center, Chicago in 1967. The scoring machine received approval for official use by the American Bowling Congress in August of that year. They were first used in national official league gaming on October 10, 1967. In November, Brunswick announced that they were accepting orders for the new digital computer, which cost around $3,000 per bowling lane. Bowling centers that installed these new automatic scoring devices in the 1970s charged a ten cents extra per line of scoring for the convenience. == Description == Each Automatic Scorer computer unit kept score for four lanes. It had two bowler identification panels serving two lanes each. The bowler pushed it into his named position when his turn came up so the computer knew who was bowling and score accordingly. After the bowler rolled the bowling ball down the lane and knocked down pins, the pinsetter detected which pins were down and relayed this information back to the computer for scoring. The result was then printed on a scoresheet and projected overhead onto a large screen for all to see. The Automatic Scorer digital computer was mathematically accurate, however the detection system at the pinsetter mechanism sometimes reported the wrong number of pins knocked down. The computer could be corrected manually for any errors in the system; similarly, human errors, such as neglecting to move the bowler identification mechanism, could be corrected for by manual action. The scorer could take into account bowlers' handicaps and could adjust for late-arriving bowlers. The automatic scorer is directly connected to the foul detection unit. As a result, foul line violations are automatically scored. Brunswick had put ten years of research and development into the Automatic Scorer, and by 1972 there were over 500 of these computers installed in bowling centers around the world. AMF Bowling, competitor to Brunswick, entered into the automatic scorer computer field during the 1970s and their systems were installed into their brand of bowling centers. By 1974, RCA was also making these computers for automatic scoring. == Reception and further developments == The purposes of the computerized scoring were to avoid errors by human scorers and to prevent cheating. It had the side benefit of speeding up the progress of the game and introducing new bowlers to the game. Score-keeping for bowling is based on a formula that many new to bowling were not familiar with and thought difficult to learn. These casual bowlers unfamiliar with the formula thought the scores given by the computers were confusing. Some bowlers were not comfortable with automatic scorers when they were introduced in the 1970s, so kept score using the traditional method on paper score sheets. The introduction of this device increased the popularity of the sport. Automatic scorers came to be considered a normal part of modern bowling installations worldwide, with owners and managers saying that bowlers expect such equipment to be present in bowling establishments and that business increased following their introduction. Brunswick introduced a color television style automatic scorer in 1983. Bowling center owners could use these style automatic scorers for advertising, management, videos, and live television. By the 2010s, these types of electronic visual displays could show bowler avatars and social media connections to publicize the bowlers' scores. Some are capable of being extended entertainment systems of games for children and adults. Some scoring systems support variations on traditional bowling, such as different kinds of bingo games where certain pins have to be knocked down at certain times or practice regimes where certain spares have to be accomplished. By this point, QubicaAMF Worldwide, an outgrowth of AMF, was one of the leading providers of bowling scoring equipment.

    Read more →
  • Automated decision-making

    Automated decision-making

    Automated decision-making (ADM) is the use of data, machines and algorithms to make decisions in a range of contexts, including public administration, business, health, education, law, employment, transport, media and entertainment, with varying degrees of human oversight or intervention. ADM may involve large-scale data from a range of sources, such as databases, text, social media, sensors, images or speech, that is processed using various technologies including computer software, algorithms, machine learning, natural language processing, artificial intelligence, augmented intelligence and robotics. The increasing use of automated decision-making systems (ADMS) across a range of contexts presents many benefits and challenges to human society requiring consideration of the technical, legal, ethical, societal, educational, economic and health consequences. == Overview == There are different definitions of ADM based on the level of automation involved. Some definitions suggests ADM involves decisions made through purely technological means without human input, such as the EU's General Data Protection Regulation (Article 22). However, ADM technologies and applications can take many forms ranging from decision-support systems that make recommendations for human decision-makers to act on, sometimes known as augmented intelligence or 'shared decision-making', to fully automated decision-making processes that make decisions on behalf of individuals or organizations without human involvement. Models used in automated decision-making systems can be as simple as checklists and decision trees through to artificial intelligence and deep neural networks (DNN). Since the 1950s computers have gone from being able to do basic processing to having the capacity to undertake complex, ambiguous and highly skilled tasks such as image and speech recognition, gameplay, scientific and medical analysis and inferencing across multiple data sources. ADM is now being increasingly deployed across all sectors of society and many diverse domains from entertainment to transport. An ADM system (ADMS) may involve multiple decision points, data sets, and technologies (ADMT) and may sit within a larger administrative or technical system such as a criminal justice system or business process. == Data == Automated decision-making involves using data as input to be analyzed within a process, model, or algorithm or for learning and generating new models. ADM systems may use and connect a wide range of data types and sources depending on the goals and contexts of the system, for example, sensor data for self-driving cars and robotics, identity data for security systems, demographic and financial data for public administration, medical records in health, criminal records in law. This can sometimes involve vast amounts of data and computing power. === Data quality === The quality of the available data and its ability to be used in ADM systems is fundamental to the outcomes. It is often highly problematic for many reasons. Datasets are often highly variable; corporations or governments may control large-scale data, restricted for privacy or security reasons, incomplete, biased, limited in terms of time or coverage, measuring and describing terms in different ways, and many other issues. For machines to learn from data, large corpora are often required, which can be challenging to obtain or compute; however, where available, they have provided significant breakthroughs, for example, in diagnosing chest X-rays. == ADM technologies == Automated decision-making technologies (ADMT) are software-coded digital tools that automate the translation of input data to output data, contributing to the function of automated decision-making systems. There are a wide range of technologies in use across ADM applications and systems. ADMTs involving basic computational operations Search (includes 1-2-1, 1-2-many, data matching/merge) Matching (two different things) Mathematical Calculation (formula) ADMTs for assessment and grouping: User profiling Recommender systems Clustering Classification Feature learning Predictive analytics (includes forecasting) ADMTs relating to space and flows: Social network analysis (includes link prediction) Mapping Routing ADMTs for processing of complex data formats Image processing Audio processing Natural Language Processing (NLP) Other ADMT Business rules management systems Time series analysis Anomaly detection Modelling/Simulation === Machine learning === Machine learning (ML) involves training computer programs through exposure to large data sets and examples to learn from experience and solve problems. Machine learning can be used to generate and analyse data as well as make algorithmic calculations and has been applied to image and speech recognition, translations, text, data and simulations. While machine learning has been around for some time, it is becoming increasingly powerful due to recent breakthroughs in training deep neural networks (DNNs), and dramatic increases in data storage capacity and computational power with GPU coprocessors and cloud computing. Machine learning systems based on foundation models run on deep neural networks and use pattern matching to train a single huge system on large amounts of general data such as text and images. Early models tended to start from scratch for each new problem however since the early 2020s many are able to be adapted to new problems. Examples of these technologies include Open AI's DALL-E (an image creation program) and their various GPT language models, and Google's PaLM language model program. == Applications == ADM is being used to replace or augment human decision-making by both public and private-sector organisations for a range of reasons including to help increase consistency, improve efficiency, reduce costs and enable new solutions to complex problems. === Debate === Research and development are underway into uses of technology to assess argument quality, assess argumentative essays and judge debates. Potential applications of these argument technologies span education and society. Scenarios to consider, in these regards, include those involving the assessment and evaluation of conversational, mathematical, scientific, interpretive, legal, and political argumentation and debate. === Law === In legal systems around the world, algorithmic tools such as risk assessment instruments (RAI), are being used to supplement or replace the human judgment of judges, civil servants and police officers in many contexts. In the United States RAI are being used to generate scores to predict the risk of recidivism in pre-trial detention and sentencing decisions, evaluate parole for prisoners and to predict "hot spots" for future crime. These scores may result in automatic effects or may be used to inform decisions made by officials within the justice system. In Canada ADM has been used since 2014 to automate certain activities conducted by immigration officials and to support the evaluation of some immigrant and visitor applications. === Economics === Automated decision-making systems are used in certain computer programs to create buy and sell orders related to specific financial transactions and automatically submit the orders in the international markets. Computer programs can automatically generate orders based on predefined set of rules using trading strategies which are based on technical analyses, advanced statistical and mathematical computations, or inputs from other electronic sources. === Business === ==== Continuous auditing ==== Continuous auditing uses advanced analytical tools to automate auditing processes. It can be utilized in the private sector by business enterprises and in the public sector by governmental organizations and municipalities. As artificial intelligence and machine learning continue to advance, accountants and auditors may make use of increasingly sophisticated algorithms which make decisions such as those involving determining what is anomalous, whether to notify personnel, and how to prioritize those tasks assigned to personnel. === Media and entertainment === Digital media, entertainment platforms, and information services increasingly provide content to audiences via automated recommender systems based on demographic information, previous selections, collaborative filtering or content-based filtering. This includes music and video platforms, publishing, health information, product databases and search engines. Many recommender systems also provide some agency to users in accepting recommendations and incorporate data-driven algorithmic feedback loops based on the actions of the system user. Large-scale machine learning language models and image creation programs being developed by companies such as OpenAI and Google in the 2020s have restricted access however they are likely to have widespread application in fields such as advertising, copywriting, stock imagery and gra

    Read more →
  • Kernel density estimation

    Kernel density estimation

    In statistics, kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on kernels as weights. KDE answers a fundamental data smoothing problem where inferences about the population are made based on a finite data sample. In some fields such as signal processing and econometrics it is also termed the Parzen–Rosenblatt window method, after Emanuel Parzen and Murray Rosenblatt, who are usually credited with independently creating it in its current form. One of the famous applications of kernel density estimation is in estimating the class-conditional marginal densities of data when using a naive Bayes classifier, which can improve its prediction accuracy. == Definition == Let x = ( x 1 , x 2 , x 3 , . . . ) {\displaystyle \mathbf {x} =\left(x_{1},x_{2},x_{3},...\right)} be independent and identically distributed samples drawn from some univariate distribution with an unknown density f at any given point x. We are interested in estimating the shape of this function f. Its kernel density estimator is f ^ h ( x ) = 1 n ∑ i = 1 n K h ( x − x i ) = 1 n h ∑ i = 1 n K ( x − x i h ) , {\displaystyle {\hat {f}}_{h}(x)={\frac {1}{n}}\sum _{i=1}^{n}K_{h}(x-x_{i})={\frac {1}{nh}}\sum _{i=1}^{n}K{\left({\frac {x-x_{i}}{h}}\right)},} where K is the kernel — a non-negative function — and h > 0 is a smoothing parameter called the bandwidth or simply width. A kernel with subscript h is called the scaled kernel and defined as Kh(x) = ⁠1/h⁠ K(⁠x/h⁠). Intuitively one wants to choose h as small as the data will allow; however, there is always a trade-off between the bias of the estimator and its variance. The choice of bandwidth is discussed in more detail below. A range of kernel functions are commonly used: uniform, triangular, biweight, triweight, Epanechnikov (parabolic), normal, and others. The Epanechnikov kernel is optimal in a mean square error sense, though the loss of efficiency is small for the kernels listed previously. Due to its convenient mathematical properties, the normal kernel is often used, which means K(x) = ϕ(x), where ϕ is the standard normal density function. The kernel density estimator then becomes f ^ h ( x ) = 1 n ∑ i = 1 n 1 h 2 π exp ⁡ ( − ( x − x i ) 2 2 h 2 ) , {\displaystyle {\hat {f}}_{h}(x)={\frac {1}{n}}\sum _{i=1}^{n}{\frac {1}{h{\sqrt {2\pi }}}}\exp \left({\frac {-(x-x_{i})^{2}}{2h^{2}}}\right),} where h {\displaystyle h} is the standard deviation of the sample x {\displaystyle \mathbf {x} } . The construction of a kernel density estimate finds interpretations in fields outside of density estimation. For example, in thermodynamics, this is equivalent to the amount of heat generated when heat kernels (the fundamental solution to the heat equation) are placed at each data point locations xi. Similar methods are used to construct discrete Laplace operators on point clouds for manifold learning (e.g. diffusion map). == Example == Kernel density estimates are closely related to histograms, but can be endowed with properties such as smoothness or continuity by using a suitable kernel. The diagram below based on these 6 data points illustrates this relationship: For the histogram, first, the horizontal axis is divided into sub-intervals or bins which cover the range of the data: In this case, six bins each of width 2. Whenever a data point falls inside this interval, a box of height 1/12 is placed there. If more than one data point falls inside the same bin, the boxes are stacked on top of each other. For the kernel density estimate, normal kernels with a standard deviation of 1.5 (indicated by the red dashed lines) are placed on each of the data points xi. The kernels are summed to make the kernel density estimate (solid blue curve). The smoothness of the kernel density estimate (compared to the discreteness of the histogram) illustrates how kernel density estimates converge faster to the true underlying density for continuous random variables. == Bandwidth selection == The bandwidth of the kernel is a free parameter which exhibits a strong influence on the resulting estimate. To illustrate its effect, we take a simulated random sample from the standard normal distribution (plotted at the blue spikes in the rug plot on the horizontal axis). The grey curve is the true density (a normal density with mean 0 and variance 1). In comparison, the red curve is undersmoothed since it contains too many spurious data artifacts arising from using a bandwidth h = 0.05, which is too small. The green curve is oversmoothed since using the bandwidth h = 2 obscures much of the underlying structure. The black curve with a bandwidth of h = 0.337 is considered to be optimally smoothed since its density estimate is close to the true density. An extreme situation is encountered in the limit h → 0 {\displaystyle h\to 0} (no smoothing), where the estimate is a sum of n delta functions centered at the coordinates of analyzed samples. In the other extreme limit h → ∞ {\displaystyle h\to \infty } the estimate retains the shape of the used kernel, centered on the mean of the samples (completely smooth). The most common optimality criterion used to select this parameter is the expected L2 risk function, also termed the mean integrated squared error: MISE ⁡ ( h ) = E [ ∫ ( f ^ h ( x ) − f ( x ) ) 2 d x ] {\displaystyle \operatorname {MISE} (h)=\operatorname {E} \!\left[\int \!{\left({\hat {f}}\!_{h}(x)-f(x)\right)}^{2}dx\right]} Under weak assumptions on f and K, (f is the, generally unknown, real density function), MISE ⁡ ( h ) = AMISE ⁡ ( h ) + o ( ( n h ) − 1 + h 4 ) {\displaystyle \operatorname {MISE} (h)=\operatorname {AMISE} (h)+{\mathcal {o}}{\left((nh)^{-1}+h^{4}\right)}} where o is the little o notation, and n the sample size (as above). The AMISE is the asymptotic MISE, i. e. the two leading terms, AMISE ⁡ ( h ) = R ( K ) n h + 1 4 m 2 ( K ) 2 h 4 R ( f ″ ) {\displaystyle \operatorname {AMISE} (h)={\frac {R(K)}{nh}}+{\frac {1}{4}}m_{2}(K)^{2}h^{4}R(f'')} where R ( g ) = ∫ g ( x ) 2 d x {\textstyle R(g)=\int g(x)^{2}\,dx} for a function g, m 2 ( K ) = ∫ x 2 K ( x ) d x {\textstyle m_{2}(K)=\int x^{2}K(x)\,dx} and f ″ {\displaystyle f''} is the second derivative of f {\displaystyle f} and K {\displaystyle K} is the kernel. The minimum of this AMISE is the solution to this differential equation ∂ ∂ h AMISE ⁡ ( h ) = − R ( K ) n h 2 + m 2 ( K ) 2 h 3 R ( f ″ ) = 0 {\displaystyle {\frac {\partial }{\partial h}}\operatorname {AMISE} (h)=-{\frac {R(K)}{nh^{2}}}+m_{2}(K)^{2}h^{3}R(f'')=0} or h AMISE = R ( K ) 1 / 5 m 2 ( K ) 2 / 5 R ( f ″ ) 1 / 5 n − 1 / 5 = C n − 1 / 5 {\displaystyle h_{\operatorname {AMISE} }={\frac {R(K)^{1/5}}{m_{2}(K)^{2/5}R(f'')^{1/5}}}n^{-1/5}=Cn^{-1/5}} Neither the AMISE nor the hAMISE formulas can be used directly since they involve the unknown density function f {\displaystyle f} or its second derivative f ″ {\displaystyle f''} . To overcome that difficulty, a variety of automatic, data-based methods have been developed to select the bandwidth. Several review studies have been undertaken to compare their efficacies, with the general consensus that the plug-in selectors and cross validation selectors are the most useful over a wide range of data sets. Substituting any bandwidth h which has the same asymptotic order n−1/5 as hAMISE into the AMISE gives that AMISE(h) = O(n−4/5), where O is the big O notation. It can be shown that, under weak assumptions, there cannot exist a non-parametric estimator that converges at a faster rate than the kernel estimator. Note that the n−4/5 rate is slower than the typical n−1 convergence rate of parametric methods. If the bandwidth is not held fixed, but is varied depending upon the location of either the estimate (balloon estimator) or the samples (pointwise estimator), this produces a particularly powerful method termed adaptive or variable bandwidth kernel density estimation. Bandwidth selection for kernel density estimation of heavy-tailed distributions is relatively difficult. === A rule-of-thumb bandwidth estimator === If Gaussian basis functions are used to approximate univariate data, and the underlying density being estimated is Gaussian, the optimal choice for h (that is, the bandwidth that minimises the mean integrated squared error) is: h = ( 4 σ ^ 5 3 n ) 1 / 5 ≈ 1.06 σ ^ n − 1 / 5 , {\displaystyle h={\left({\frac {4{\hat {\sigma }}^{5}}{3n}}\right)}^{1/5}\approx 1.06\,{\hat {\sigma }}\,n^{-1/5},} An h {\displaystyle h} value is considered more robust when it improves the fit for long-tailed and skewed distributions or for bimodal mixture distributions. This is often done empirically by replacing the standard deviation σ ^ {\displaystyle {\hat {\sigma }}} by the parameter A {\displaystyle A} below: A = min ( σ ^ , I Q R 1.34 ) {\displaystyle A=\min \left({\hat {\sigma }},{\frac {\mathrm {IQR} }{1.34}}\right)} where IQR is the

    Read more →
  • Instance selection

    Instance selection

    Instance selection (or dataset reduction, or dataset condensation) is an important data pre-processing step that can be applied in many machine learning (or data mining) tasks. Approaches for instance selection can be applied for reducing the original dataset to a manageable volume, leading to a reduction of the computational resources that are necessary for performing the learning process. Algorithms of instance selection can also be applied for removing noisy instances, before applying learning algorithms. This step can improve the accuracy in classification problems. Algorithm for instance selection should identify a subset of the total available data to achieve the original purpose of the data mining (or machine learning) application as if the whole data had been used. Considering this, the optimal outcome of IS would be the minimum data subset that can accomplish the same task with no performance loss, in comparison with the performance achieved when the task is performed using the whole available data. Therefore, every instance selection strategy should deal with a trade-off between the reduction rate of the dataset and the classification quality. == Instance selection algorithms == The literature provides several different algorithms for instance selection. They can be distinguished from each other according to several different criteria. Considering this, instance selection algorithms can be grouped in two main classes, according to what instances they select: algorithms that preserve the instances at the boundaries of classes and algorithms that preserve the internal instances of the classes. Within the category of algorithms that select instances at the boundaries it is possible to cite DROP3, ICF and LSBo. On the other hand, within the category of algorithms that select internal instances, it is possible to mention ENN and LSSm. In general, algorithm such as ENN and LSSm are used for removing harmful (noisy) instances from the dataset. They do not reduce the data as the algorithms that select border instances, but they remove instances at the boundaries that have a negative impact on the data mining task. They can be used by other instance selection algorithms, as a filtering step. For example, the ENN algorithm is used by DROP3 as the first step, and the LSSm algorithm is used by LSBo. There is also another group of algorithms that adopt different selection criteria. For example, the algorithms LDIS, CDIS and XLDIS select the densest instances in a given arbitrary neighborhood. The selected instances can include both, border and internal instances. The LDIS and CDIS algorithms are very simple and select subsets that are very representative of the original dataset. Besides that, since they search by the representative instances in each class separately, they are faster (in terms of time complexity and effective running time) than other algorithms, such as DROP3 and ICF. Besides that, there is a third category of algorithms that, instead of selecting actual instances of the dataset, select prototypes (that can be synthetic instances). In this category it is possible to include PSSA, PSDSP and PSSP. The three algorithms adopt the notion of spatial partition (a hyperrectangle) for identifying similar instances and extract prototypes for each set of similar instances. In general, these approaches can also be modified for selecting actual instances of the datasets. The algorithm ISDSP adopts a similar approach for selecting actual instances (instead of prototypes).

    Read more →
  • Spleak

    Spleak

    Spleak was an IM platform where users could publish and rate content. It existed in the form of six bots covering as many subject areas: CelebSpleak, SportSpleak, VoteSpleak, TVSpleak, GameSpleak, and StyleSpleak. == Overview == Users can add a "multi-Spleak" (which contains all of the different Spleak bots in one) or add the separate bots to their IM buddy lists on MSN and AIM. Users are also allowed access to Spleak online by using a CelebSpleak, SportSpleak, or VoteSpleak widget, or through the CelebSpleak and SportSpleak applications with Facebook. Spleak was an alternate reality game and is moving to its own company, Spleak Media Network. "Celebrate Spleak" was introduced throughout 2007, launched in 2008, and was forced to retire in 2009. == Key people == Spleak was co-founded by Morten Lund and Nicolaj Reffstrup. The company's chief executive officer is Morrie Eisenburg; Josh Scott is Vice President in Product and Tyler Wells is Vice President in Engineering.

    Read more →
  • Ideonomy

    Ideonomy

    Ideonomy is a combinatorial "science of ideas" developed by American independent scholar Patrick M. Gunkel (1947–2017). Specifically, Ideonomy is concerned with the systematic organization of ideas and the discovery of the rules behind how ideas combine, diverge, and transform. Gunkel defined ideonomy as "the science of the laws of ideas and of the application of such laws to the generation of all possible ideas in connection with any subject, idea, or thing." In his 1992 book A History of Knowledge, Charles Van Doren compared ideonomy to a "mining operation" that excavates meanings and thought to discover treasures hidden deep within language. Sources from the 1980s and 1990s demonstrate that ideonomy was useful to academic researchers in fields including biology, toxicology, and nursing/patient care. Beginning in the 2010s, academics in a wide range of fields including machine learning, marketing, computational modeling, and cybersecurity have relied on materials generated for ideonomy to provide methodological support for their research. == Etymology and definition == The word "ideonomy" combines the Greek roots ideo- (from idea, meaning pattern or form) and -nomy (from nomos, meaning law or custom). The suffix -nomy suggests the laws concerning or the totality of knowledge about a given subject, as in astronomy or taxonomy. In a note posted on the MIT ideonomy website, Gunkel states that the word was supposedly first coined by the French Encyclopedists to refer to a science of ideas. No evidence is provided for this statement, however. The concept bears some relationship to Antoine Destutt de Tracy's "ideology" (1796), which originally meant a systematic science of ideas before acquiring its modern political connotations. Gunkel provided several metaphorical descriptions of ideonomy: An "idea bank": a computer network enabling systematic exploration of infinite possible ideas A "kaleidoscope" that can exhibit all possible combinations and transformations of ideas A "prism" capable of diffracting any idea into its cognitive components A "gigantic microscope for magnifying the ideocosm" == History and development == In 1984, Gunkel received a five-year unsolicited grant from the Richard Lounsbery Foundation of New York to develop ideonomy. A June 1, 1987 article on the front page of The Wall Street Journal brought Gunkel and ideonomy to wider public attention. Some academics were interested in using ideonomy's techniques, including biologist Betsey Dyer, who published several contemporaneous peer-reviewed studies citing ideonomy. Academic researchers in the field of toxicology and nursing/patient care also used ideonomy. However, ideonomy's broadest contribution to date came beginning in the 2010s, as a list of personality traits generated for combinatorial matching was used by researchers in artificial intelligence to code human emotions for machine-learning tasks, develop computational models related to personality, develop a measurement framework for influencer-brand recommender systems, and aid information awareness/cybersecurity assessment. == Methodology == The foundational empirical method of ideonomy involves the systematic creation of extensive lists. Gunkel's apartment reportedly contained thousands of lists on every conceivable topic. Gunkel termed each list an "organon," which he described as expanding through "combination, permutation, transformation, generalization, specialization, intersection, interaction, reapplication, recursive use, etc. of existing organons." The ideonomic process follows a progressive structure. The ideonomist begins with a simple list of examples of a particular idea, concept, or thing. The list need not be exhaustive. By studying this list, the ideonomist isolates and identifies types. This categorical analysis then reveals missing items, allowing the primary list to be improved and refined. Gunkel emphasized that list items must not only cover genuine categories of nature but also be formulated in ways that yield the largest possible number of syntactically coherent possibilities when combined. The core technique of ideonomy is "ideocombinatorics"—the systematic intersection and combination of items from different lists to generate novel composite concepts. Gunkel developed computer programs to automate this process. For example, combining a list of 230 Universal Elementary Shapes (pits, pyramids, trenches, hemispheres, needles) with a list of 74 Types of Order (recurrence, identity, likeness of parts) yields 17,020 possible "shapes of order." These combinations, when phrased as questions ("Can there be pits of recurrence?"), could suggest new categories of phenomena worthy of investigation. The computer-generated output is typically repetitive and often meaningless. However, with sufficient frequency, the combinations yield results that are unexpectedly interesting and fruitful. In one documented case, Gunkel's programs generated 45,540 questions about toxins for microbiologist David Bermudes. One question—"Can hierarchies of cell process be used as a basis for classifying toxic action?"—prompted Bermudes to develop a novel approach to classifying biological toxins by the type of molecule they attack, rather than by chemical structure or physiological system affected. According to one contemporaneous account of ideonomy, "Gunkel takes for his field all fields and all ideas about anything. He uses a computer to generate lists of words and phrases and by juxtaposition reviews the resultant patterns for novel ideas. The computer is ideal for this task because the mind would rebel at the formidable processing task ideonomy involves. What we have here is computer generated originality." == Applications == Gunkel and his supporters identified several practical applications for ideonomic methods: Scientific research: Biologist Betsey Dyer of Wheaton College published research crediting ideonomy for helping to generate ideas. Medical science: When Austin pathologist Michael T. O'Brien was presented with the ideonomically-generated question "Can arteries have rashes?", he initially dismissed it as nonsense. Upon reflection, he realized that large arteries are supplied with blood by tiny vessels that might become inflamed and dilated, analogous to skin vessels in a rash—a phenomenon potentially worth researching. Analogical thinking: Harvard law professor Robert Clark used ideonomic analogies to write a research paper comparing plant structure with human hierarchies. Artificial intelligence: Douglas Lenat, a researcher at Microelectronics and Computer Technology Corporation (MCC) in Austin, suggested that Gunkel's lists enumerating types of human mistakes could help design AI systems capable of recognizing and correcting their own errors. == Reception and criticism == Ideonomy received mixed reactions from the academic and scientific communities. Prominent supporters included: Edward Fredkin, former director of MIT's computer science laboratory, who praised Gunkel's "provocative ideas on artificial intelligence." Marvin Minsky, AI scientist and MIT professor, who described ideonomy as "perhaps the most extensive study of ways to generate ideas." Frederick Seitz, president emeritus of Rockefeller University, who noted Gunkel's "encyclopedic scope" Robert C. Clark, Harvard law professor, who called Gunkel "the most intelligent person I ever met" However, skeptics questioned whether ideonomy constituted a genuine science. Fredkin himself noted that Gunkel "pours out about 60 ideas a minute, and 59 of them are bad," though he added that "even with one good idea out of 60, it's still an amazing accomplishment." Douglas Lenat observed that brainstorming with Gunkel was "a bit like being hit over the head by the muse with a sledgehammer" and that "he puts people off." Gunkel himself acknowledged that ideonomy was in its infancy and might seem "absurdly utopian." His planned magnum opus on ideonomy remained incomplete, and was posted on an MIT website thanks to faculty advisor Whitman Richards. Gunkel wrote: "Pioneering in a completely new field, yes in a new science, is almost unreal. It is heartbreaking, it is pitiable, it is almost inhuman. Honestly, it is a hell. There is nothing heroic about it." == Related concepts == Gunkel identified several historical precedents for ideonomic thinking: Gottfried Wilhelm Leibniz (1646–1716): The philosopher's work on a universal characteristic (characteristica universalis) and calculus of reasoning Peter Mark Roget (1779–1869): Creator of Roget's Thesaurus, which organized concepts into a systematic taxonomy Dmitri Mendeleev (1834–1907): Developer of the periodic table, demonstrating how combining lists of element families could reveal previously unseen connections Fritz Zwicky (1898–1974): The Caltech astrophysicist whom Gunkel called the "grandfather of ideonomy" for his development of "morphological research"—systematic exploration of all possible solutions t

    Read more →
  • Knowledge integration

    Knowledge integration

    Knowledge integration is the process of synthesizing multiple knowledge models (or representations) into a common model (representation). Compared to information integration, which involves merging information having different schemas and representation models, knowledge integration focuses more on synthesizing the understanding of a given subject from different perspectives. For example, multiple interpretations are possible of a set of student grades, typically each from a certain perspective. An overall, integrated view and understanding of this information can be achieved if these interpretations can be put under a common model, say, a student performance index. The Web-based Inquiry Science Environment (WISE), from the University of California at Berkeley has been developed along the lines of knowledge integration theory. Knowledge integration has also been studied as the process of incorporating new information into a body of existing knowledge with an interdisciplinary approach. This process involves determining how the new information and the existing knowledge interact, how existing knowledge should be modified to accommodate the new information, and how the new information should be modified in light of the existing knowledge. A learning agent that actively investigates the consequences of new information can detect and exploit a variety of learning opportunities; e.g., to resolve knowledge conflicts and to fill knowledge gaps. By exploiting these learning opportunities the learning agent is able to learn beyond the explicit content of the new information. The machine learning program KI, developed by Murray and Porter at the University of Texas at Austin, was created to study the use of automated and semi-automated knowledge integration to assist knowledge engineers constructing a large knowledge base. A possible technique which can be used is semantic matching. More recently, a technique useful to minimize the effort in mapping validation and visualization has been presented which is based on Minimal Mappings. Minimal mappings are high quality mappings such that i) all the other mappings can be computed from them in time linear in the size of the input graphs, and ii) none of them can be dropped without losing property i). The University of Waterloo operates a Bachelor of Knowledge Integration undergraduate degree program as an academic major or minor. The program started in 2008.

    Read more →
  • Confusion matrix

    Confusion matrix

    In machine learning, a confusion matrix, also known as error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one. In unsupervised learning it is usually called a matching matrix. The term is used specifically in the problem of statistical classification. Each row of the matrix represents the instances in an actual class while each column represents the instances in a predicted class, or vice versa – both variants are found in the literature. The diagonal of the matrix therefore represents all instances that are correctly predicted. The name stems from the fact that it makes it easy to identify whether the system is confusing two classes (i.e., commonly mislabeling one class as another). The confusion matrix has its origins in human perceptual studies of auditory stimuli. It was adapted for machine learning studies and used by Frank Rosenblatt, among other early researchers, to compare human and machine classifications of visual (and later auditory) stimuli. It is a special kind of contingency table, with two dimensions ("actual" and "predicted"), and identical sets of "classes" in both dimensions (each combination of dimension and class is a variable in the contingency table). == Example == Given a sample of 12 individuals, 8 that have been diagnosed with cancer and 4 that are cancer-free, where individuals with cancer belong to class 1 (positive) and non-cancer individuals belong to class 0 (negative), we can display that data as follows: Assume that we have a classifier that distinguishes between individuals with and without cancer in some way, we can take the 12 individuals and run them through the classifier. The classifier then makes 9 accurate predictions and misses 3: 2 individuals with cancer wrongly predicted as being cancer-free (sample 1 and 2), and 1 person without cancer that is wrongly predicted to have cancer (sample 9). Notice, that if we compare the actual classification set to the predicted classification set, there are 4 different outcomes that could result in any particular column: The actual classification is positive and the predicted classification is positive (1,1). This is called a true positive result because the positive sample was correctly identified by the classifier. The actual classification is positive and the predicted classification is negative (1,0). This is called a false negative result because the positive sample is incorrectly identified by the classifier as being negative. The actual classification is negative and the predicted classification is positive (0,1). This is called a false positive result because the negative sample is incorrectly identified by the classifier as being positive. The actual classification is negative and the predicted classification is negative (0,0). This is called a true negative result because the negative sample gets correctly identified by the classifier. We can then perform the comparison between actual and predicted classifications and add this information to the table, making correct results appear in green so they are more easily identifiable. The template for any binary confusion matrix uses the four kinds of results discussed above (true positives, false negatives, false positives, and true negatives) along with the positive and negative classifications. The four outcomes can be formulated in a 2×2 confusion matrix, as follows: The color convention of the three data tables above were picked to match this confusion matrix, in order to easily differentiate the data. Now, we can simply total up each type of result, substitute into the template, and create a confusion matrix that will concisely summarize the results of testing the classifier: In this confusion matrix, of the 8 samples with cancer, the system judged that 2 were cancer-free, and of the 4 samples without cancer, it predicted that 1 did have cancer. All correct predictions are located in the diagonal of the table (highlighted in green), so it is easy to visually inspect the table for prediction errors, as values outside the diagonal will represent them. By summing up the 2 rows of the confusion matrix, one can also deduce the total number of positive (P) and negative (N) samples in the original dataset, i.e. P = T P + F N {\displaystyle P=TP+FN} and N = F P + T N {\displaystyle N=FP+TN} . == Table of confusion == In predictive analytics, a table of confusion (sometimes also called a confusion matrix) is a table with two rows and two columns that reports the number of true positives, false negatives, false positives, and true negatives. This allows more detailed analysis than simply observing the proportion of correct classifications (accuracy). Accuracy will yield misleading results if the data set is unbalanced; that is, when the numbers of observations in different classes vary greatly. For example, if there were 95 cancer samples and only 5 non-cancer samples in the data, a particular classifier might classify all the observations as having cancer. The overall accuracy would be 95%, but in more detail the classifier would have a 100% recognition rate (sensitivity) for the cancer class but a 0% recognition rate for the non-cancer class. F1 score is even more unreliable in such cases, and here would yield over 97.4%, whereas informedness removes such bias and yields 0 as the probability of an informed decision for any form of guessing (here always guessing cancer). According to Davide Chicco and Giuseppe Jurman, the most informative metric to evaluate a confusion matrix is the Matthews correlation coefficient (MCC). Other metrics can be included in a confusion matrix, each of them having their significance and use. Some researchers have argued that the confusion matrix, and the metrics derived from it, do not truly reflect a model's knowledge. In particular, the confusion matrix cannot show whether correct predictions were reached through sound reasoning or merely by chance (a problem known in philosophy as epistemic luck). It also does not capture situations where the facts used to make a prediction later change or turn out to be wrong (defeasibility). This means that while the confusion matrix is a useful tool for measuring classification performance, it may give an incomplete picture of a model’s true reliability. == Confusion matrices with more than two categories == Confusion matrix is not limited to binary classification and can be used in multi-class classifiers as well. The confusion matrices discussed above have only two conditions: positive and negative. For example, the table below summarizes communication of a whistled language between two speakers, with zero values omitted for clarity. == Confusion matrices in multi-label and soft-label classification == Confusion matrices are not limited to single-label classification (where only one class is present) or hard-label settings (where classes are either fully present, 1, or absent, 0). They can also be extended to Multi-label classification (where multiple classes can be predicted at once) and soft-label classification (where classes can be partially present). One such extension is the Transport-based Confusion Matrix (TCM), which builds on the theory of optimal transport and the principle of maximum entropy. TCM applies to single-label, multi-label, and soft-label settings. It retains the familiar structure of the standard confusion matrix: a square matrix sized by the number of classes, with diagonal entries indicating correct predictions and off-diagonal entries indicating confusion. In the single-label case, TCM is identical to the standard confusion matrix. TCM follows the same reasoning as the standard confusion matrix: if class A is overestimated (its predicted value is greater than its label value) and class B is underestimated (its predicted value is less than its label value), A is considered confused with B, and the entry (B, A) is increased. If a class is both predicted and present, it is correctly identified, and the diagonal entry (A, A) increases. Optimal transport and maximum entropy are used to determine the extent to which these entries are updated. TCM enables clearer comparison between predictions and labels in complex classification tasks, while maintaining a consistent matrix format across settings.

    Read more →
  • Outline of machine learning

    Outline of machine learning

    The following outline is provided as an overview of, and topical guide to, machine learning: Machine learning (ML) is a subfield of artificial intelligence within computer science that evolved from the study of pattern recognition and computational learning theory. In 1959, Arthur Samuel defined machine learning as a "field of study that gives computers the ability to learn without being explicitly programmed". ML involves the study and construction of algorithms that can learn from and make predictions on data. These algorithms operate by building a model from a training set of example observations to make data-driven predictions or decisions expressed as outputs, rather than following strictly static program instructions. == How can machine learning be categorized? == An academic discipline A branch of science An applied science A subfield of computer science A branch of artificial intelligence A subfield of soft computing Application of statistics === Paradigms of machine learning === Supervised learning, where the model is trained on labeled data Unsupervised learning, where the model tries to identify patterns in unlabeled data Reinforcement learning, where the model learns to make decisions by receiving rewards or penalties. == Applications of machine learning == Applications of machine learning Bioinformatics Biomedical informatics Computer vision Customer relationship management Data mining Earth sciences Email filtering Inverted pendulum (balance and equilibrium system) Natural language processing Named Entity Recognition Automatic summarization Automatic taxonomy construction Dialog system Grammar checker Language recognition Handwriting recognition Optical character recognition Speech recognition Text to Speech Synthesis Speech Emotion Recognition Machine translation Question answering Speech synthesis Text mining Term frequency–inverse document frequency Text simplification Pattern recognition Facial recognition system Handwriting recognition Image recognition Optical character recognition Speech recognition Recommendation system Collaborative filtering Content-based filtering Hybrid recommender systems Search engine Search engine optimization Social engineering == Machine learning hardware == Graphics processing unit Tensor processing unit Vision processing unit == Machine learning tools == Comparison of machine learning software Comparison of deep learning software === Machine learning frameworks === ==== Proprietary machine learning frameworks ==== Amazon Machine Learning Microsoft Azure Machine Learning Studio DistBelief (replaced by TensorFlow) ==== Open source machine learning frameworks ==== Apache Singa Apache MXNet Caffe PyTorch mlpack TensorFlow Torch CNTK Accord.Net Jax MLJ.jl – A machine learning framework for Julia === Machine learning libraries === Deeplearning4j Theano scikit-learn Keras === Machine learning algorithms === == Machine learning methods == === Instance-based algorithm === K-nearest neighbors algorithm (KNN) Learning vector quantization (LVQ) Self-organizing map (SOM) === Regression analysis === Logistic regression Ordinary least squares regression (OLSR) Linear regression Stepwise regression Multivariate adaptive regression splines (MARS) Regularization algorithm Ridge regression Least Absolute Shrinkage and Selection Operator (LASSO) Elastic net Least-angle regression (LARS) Classifiers Probabilistic classifier Naive Bayes classifier Binary classifier Linear classifier Hierarchical classifier === Dimensionality reduction === Dimensionality reduction Canonical correlation analysis (CCA) Factor analysis Feature extraction Feature selection Independent component analysis (ICA) Linear discriminant analysis (LDA) Multidimensional scaling (MDS) Non-negative matrix factorization (NMF) Partial least squares regression (PLSR) Principal component analysis (PCA) Principal component regression (PCR) Projection pursuit Sammon mapping t-distributed stochastic neighbor embedding (t-SNE) === Ensemble learning === Ensemble learning AdaBoost Boosting Bootstrap aggregating (also "bagging" or "bootstrapping") Ensemble averaging Gradient boosted decision tree (GBDT) Gradient boosting Random Forest Stacked Generalization === Meta-learning === Meta-learning Inductive bias Metadata === Reinforcement learning === Reinforcement learning Q-learning State–action–reward–state–action (SARSA) Temporal difference learning (TD) Learning Automata === Supervised learning === Supervised learning Averaged one-dependence estimators (AODE) Artificial neural network Case-based reasoning Gaussian process regression Gene expression programming Group method of data handling (GMDH) Inductive logic programming Instance-based learning Lazy learning Learning Automata Learning Vector Quantization Logistic Model Tree Minimum message length (decision trees, decision graphs, etc.) Nearest Neighbor Algorithm Analogical modeling Probably approximately correct learning (PAC) learning Ripple down rules, a knowledge acquisition methodology Symbolic machine learning algorithms Support vector machines Random Forests Ensembles of classifiers Bootstrap aggregating (bagging) Boosting (meta-algorithm) Ordinal classification Conditional Random Field ANOVA Quadratic classifiers k-nearest neighbor Boosting SPRINT Bayesian networks Naive Bayes Hidden Markov models Hierarchical hidden Markov model ==== Bayesian ==== Bayesian statistics Bayesian knowledge base Naive Bayes Gaussian Naive Bayes Multinomial Naive Bayes Averaged One-Dependence Estimators (AODE) Bayesian Belief Network (BBN) Bayesian Network (BN) ==== Decision tree algorithms ==== Decision tree algorithm Decision tree Classification and regression tree (CART) Iterative Dichotomiser 3 (ID3) C4.5 algorithm C5.0 algorithm Chi-squared Automatic Interaction Detection (CHAID) Decision stump Conditional decision tree ID3 algorithm Random forest SLIQ ==== Linear classifier ==== Linear classifier Fisher's linear discriminant Linear regression Logistic regression Multinomial logistic regression Naive Bayes classifier Perceptron Support vector machine === Unsupervised learning === Unsupervised learning Expectation-maximization algorithm Vector Quantization Generative topographic map Information bottleneck method Association rule learning algorithms Apriori algorithm Eclat algorithm ==== Artificial neural networks ==== Artificial neural network Feedforward neural network Extreme learning machine Convolutional neural network Recurrent neural network Long short-term memory (LSTM) Logic learning machine Self-organizing map ==== Association rule learning ==== Association rule learning Apriori algorithm Eclat algorithm FP-growth algorithm ==== Hierarchical clustering ==== Hierarchical clustering Single-linkage clustering Conceptual clustering ==== Cluster analysis ==== Cluster analysis BIRCH DBSCAN Expectation–maximization (EM) Fuzzy clustering Hierarchical clustering k-means clustering k-medians Mean-shift OPTICS algorithm ==== Anomaly detection ==== Anomaly detection k-nearest neighbors algorithm (k-NN) Local outlier factor === Semi-supervised learning === Semi-supervised learning Active learning Generative models Low-density separation Graph-based methods Co-training Transduction === Deep learning === Deep learning Deep belief networks Deep Boltzmann machines Deep Convolutional neural networks Deep Recurrent neural networks Hierarchical temporal memory Generative Adversarial Network Style transfer Transformer Stacked Auto-Encoders === Other machine learning methods and problems === Anomaly detection Association rules Bias-variance dilemma Classification Multi-label classification Clustering Data Pre-processing Empirical risk minimization Feature engineering Feature learning Learning to rank Occam learning Online machine learning PAC learning Regression Reinforcement Learning Semi-supervised learning Statistical learning Structured prediction Graphical models Bayesian network Conditional random field (CRF) Hidden Markov model (HMM) Unsupervised learning VC theory == Machine learning research == List of artificial intelligence projects List of datasets for machine learning research == History of machine learning == History of machine learning Timeline of machine learning == Machine learning projects == Machine learning projects: DeepMind Google Brain OpenAI Meta AI Hugging Face == Machine learning organizations == === Machine learning conferences and workshops === Artificial Intelligence and Security (AISec) (co-located workshop with CCS) Conference on Neural Information Processing Systems (NIPS) ECML PKDD International Conference on Machine Learning (ICML) ML4ALL (Machine Learning For All) == Machine learning publications == === Books on machine learning === Mathematics for Machine Learning Hands-On Machine Learning Scikit-Learn, Keras, and TensorFlow The Hundred-Page Machine Learning Book === Machine learning journals === Machine Learning Journal of Machine Learning Research (JMLR) Neural Computation == Pe

    Read more →
  • Operation Serenata de Amor

    Operation Serenata de Amor

    Operation Serenata de Amor is an artificial intelligence project designed to analyze public spending in Brazil. The project has been funded by a recurrent financing campaign since September 7, 2016, and came in the wake of major scandals of misappropriation of public funds in Brazil, such as the Mensalão scandal and what was revealed in the Operation Car Wash investigations. The analysis began with data from the National Congress then expanded to other types of budget and instances of government, such as the Federal Senate. The project is built through collaboration on GitHub and using a public group with more than 600 participants on Telegram. The name "Serenata de Amor," which means "serenade of love," was taken from a popular cashew cream bonbon produced by Chocolates Garoto in Brazil. == Modules == Throughout development of the project, new modules have been newly introduced in addition to the main repository: The main repository, serenata-de-amor, serves as the starting point for investigative work. Rosie is the robot programmed to identify public funds expenses with discrepancies, starting with CEAP (Quota for Exercise of Parliamentary Activity); it analyzes each of the reimbursements requested by the deputies and senators, indicating the reasons that lead it to believe they are suspicious. From Rosie was born whistleblower, which tweets under the name of @RosieDaSerenata, distributing the results found on social media. Jarbas (Github repository) is a data visualization tool which shows a complete list of reimbursements made available by the Chamber of Deputies and mined by Rosie. Toolbox is a Python installable package that supports the development of Serenata de Amor and Rosie. == History == Operation Serenata de Amor is an Artificial intelligence project for analysis of public expenditures. It was conceived in March 2016 by data scientist Irio Musskopf, sociologist Eduardo Cuducos and entrepreneur Felipe Cabral. The project was financed collectively in the Catarse platform, where it reached 131% of the collection goal paying 3 months of project development. Ana Schwendler, also a data scientist, Pedro Vilanova "Tonny", data journalist, Bruno Pazzim, software engineer, Filipe Linhares, a frontend engineer, Leandro Devegili, an entrepreneur and André Pinho took the first steps towards constructing the platform, such as collecting and structuring the first datasets. Jessica Temporal, data scientist and Yasodara Córdova "Yaso", researcher, Tatiana Balachova "Russa", UX designer, joined the project after the financing took place. The members created a recurring financing campaign, expanding the analysis of public spending to the Federal Senate. Donors make monthly payments ranging from 5 BRL to 200 BRL to maintain group activities. The monthly amount collected is around 10,000 BRL. == Results == In January 2017, concluding the period financed by the initial campaign, the group carried out an investigation into the suspicious activities found by the data analysis system. 629 complaints were made to the Ombudsman's Office of the Chamber of Deputies, questioning expenses of 216 federal deputies. In addition, the Facebook project page has more than 25,000 followers, and users frequently cite the operation as a benchmark in transparency in the Brazilian government. One of the examples of results obtained by the operation is the case of the Deputy who had to return about 700 BRL to the House after his expenses were analyzed by the platform. The platform was able to analyze more than 3 million notes, raising about 8,000 suspected cases in public spending. The community that supports the work of the team benefits from open source repositories, with licenses open for the collaboration. So much so that the two main data scientists of the project presented it at the CivicTechFest in Taipei, obtaining several mentions even in the international press. The technical leader presented the project in Poland during DevConf2017 in Kraków. It was also presented in the Google News Lab in 2017. It was presented by Yaso, when she was the Director of the initiative, at the MIT Media Lab/Berkman Klein Center Initiative for Artificial Intelligence ethics, and at the Artificial Intelligence and Inclusion Symposium, an initiative of the Global Network of Internet & Society Centers (NoC). It was also presented both by Irio and Yaso at the Digital Harvard Kennedy School, over a lunch seminar, where the transparency of the platform and the main solutions found were discussed, so that the code and data are always available to verify its suitability. This infographic provides information about the first results of Operation Serenata de Amor, a project that analyzes open data on public spending to find discrepancies. The project was presented by Yaso to the House Audit and Control Committee of the Chamber of Deputies in August 2017, and raised the interest of House officials who work with open data. The operation has been a source of inspiration for other civic projects that aim to work with similar goals, demonstrating the broader impact of artificial intelligence also in industry in Brazil. Participation of several team members in events throughout Brazil and abroad can be found on the Internet, such as presentation at OpenDataDay, held at Calango Hackerspace in the Federal District, Campus Party Bahia, Campus Party Brasilia, Friends of Tomorrow, XIII National Meeting of Internal Control, in the event USP Talks Hackfest against corruption in João Pessoa, the latter being also highlighted in the National Press.

    Read more →
  • Artificial intelligence

    Artificial intelligence

    Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of research in engineering, mathematics and computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals. High-profile applications of AI include advanced web search engines, chatbots, virtual assistants, autonomous vehicles, and play and analysis in strategy games (e.g., chess and Go). Since the 2020s, generative AI has become widely available to generate images, audio, and videos from text prompts. The traditional goals of AI research include learning, reasoning, knowledge representation, planning, natural language processing, and perception, as well as support for robotics. To reach these goals, AI researchers have used techniques including state space search and mathematical optimization, formal logic, artificial neural networks, and methods based on statistics, operations research, and economics. AI also draws upon psychology, linguistics, philosophy, neuroscience, and other fields. Some companies, such as OpenAI, Google DeepMind and Meta, aim to create artificial general intelligence (AGI) – AI that can complete virtually any cognitive task at least as well as a human. Artificial intelligence was founded as an academic discipline in 1956, and the field went through multiple cycles of optimism throughout its history, followed by periods of disappointment and loss of funding, known as AI winters. Funding and interest increased substantially after 2012, when graphics processing units began being used to accelerate neural networks, and deep learning outperformed previous AI techniques. This growth accelerated further after 2017 with the transformer architecture. In the 2020s, an AI boom has coincided with advances in generative AI, which allowed for the creation and modification of media. In addition to AI safety and unintended consequences and harms from the use of AI, ethical concerns, AI's long-term effects, and potential existential risks have prompted discussions of AI regulation. == Goals == The general problem of simulating (or creating) intelligence has been broken into subproblems. These consist of particular traits or capabilities that researchers expect an intelligent system to display. The traits described below have received the most attention and cover the scope of AI research. === Reasoning and problem-solving === Early researchers developed algorithms that imitated step-by-step reasoning that humans use when they solve puzzles or make logical deductions. By the late 1980s and 1990s, methods were developed for dealing with uncertain or incomplete information, employing concepts from probability and economics. Many of these algorithms are insufficient for solving large reasoning problems because they experience a "combinatorial explosion": They become exponentially slower as the problems grow. Even humans rarely use the step-by-step deduction that early AI research could model. They solve most of their problems using fast, intuitive judgments. Accurate and efficient reasoning is an unsolved problem. === Knowledge representation === Knowledge representation and knowledge engineering allow AI programs to answer questions intelligently and make deductions about real-world facts. Formal knowledge representations are used in content-based indexing and retrieval, scene interpretation, clinical decision support, knowledge discovery (mining "interesting" and actionable inferences from large databases), and other areas. A knowledge base is a body of knowledge represented in a form that can be used by a program. An ontology is the set of objects, relations, concepts, and properties used by a particular domain of knowledge. Knowledge bases need to represent things such as objects, properties, categories, and relations between objects; situations, events, states, and time; causes and effects; knowledge about knowledge (what we know about what other people know); default reasoning (things that humans assume are true until they are told differently and will remain true even when other facts are changing); and many other aspects and domains of knowledge. Among the most difficult problems in knowledge representation are the breadth of commonsense knowledge (the set of atomic facts that the average person knows is enormous); and the sub-symbolic form of most commonsense knowledge (much of what people know is not represented as "facts" or "statements" that they could express verbally). There is also the difficulty of knowledge acquisition, the problem of obtaining knowledge for AI applications. === Planning and decision-making === An "agent" is any entity (artificial or not) that perceives and takes actions in the world. A rational agent has goals or preferences and takes actions to make them happen. In automated planning, the agent has a specific goal. In automated decision-making, the agent has preferences—there are some situations it would prefer to be in, and some situations it is trying to avoid. The decision-making agent assigns a number to each situation (called the "utility") that measures how much the agent prefers it. For each possible action, it can calculate the "expected utility": the utility of all possible outcomes of the action, weighted by the probability that the outcome will occur. It can then choose the action with the maximum expected utility. In classical planning, the agent knows exactly what the effect of any action will be. In most real-world problems, however, the agent may not be certain about the situation they are in (it is "unknown" or "unobservable") and it may not know for certain what will happen after each possible action (it is not "deterministic"). It must choose an action by making a probabilistic guess and then reassess the situation to see if the action worked. Alongside thorough testing and improvement based on previous decisions, having an explanation for why the agent took certain decisions is a way to build trust, especially when the decisions have to be relied upon. In some problems, the agent's preferences may be uncertain, especially if there are other agents or humans involved. These can be learned (e.g., with inverse reinforcement learning), or the agent can seek information to improve its preferences. Information value theory can be used to weigh the value of exploratory or experimental actions. The space of possible future actions and situations is typically intractably large, so the agents must take actions and evaluate situations while being uncertain of what the outcome will be. A Markov decision process has a transition model that describes the probability that a particular action will change the state in a particular way and a reward function that supplies the utility of each state and the cost of each action. A policy associates a decision with each possible state. The policy could be calculated (e.g., by iteration), be heuristic, or it can be learned. Game theory describes the rational behavior of multiple interacting agents and is used in AI programs that make decisions that involve other agents. === Learning === Machine learning is the study of programs that can improve their performance on a given task automatically. It has been a part of AI from the beginning. There are several kinds of machine learning. Unsupervised learning analyzes a stream of data and finds patterns and makes predictions without any other guidance. Supervised learning requires labeling the training data with the expected answers, and comes in two main varieties: classification (where the program must learn to predict what category the input belongs in) and regression (where the program must deduce a numeric function based on numeric input). In reinforcement learning, the agent is rewarded for good responses and punished for bad ones. The agent learns to choose responses that are classified as "good". Transfer learning is when the knowledge gained from one problem is applied to a new problem. Deep learning is a type of machine learning that runs inputs through biologically inspired artificial neural networks for all of these types of learning. Computational learning theory can assess learners by computational complexity, by sample complexity (how much data is required), or by other notions of optimization. === Natural language processing === Natural language processing (NLP) allows programs to read, write and communicate in human languages. Specific problems include speech recognition, speech synthesis, machine translation, information extraction, information retrieval and question answering. Early work, based on Noam Chomsky's generative grammar and semantic networks, had difficulty with word-sense disambiguation unless

    Read more →