AI Detector And Fixer

AI Detector And Fixer — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Tweak programming environment

    Tweak programming environment

    Tweak is a graphical user interface (GUI) layer written by Andreas Raab for the Squeak development environment, which in turn is an integrated development environment based on the Smalltalk-80 computer programming language. Tweak is an alternative to an earlier graphic user interface layer called Morphic. Development began in 2001. Applications that use the Tweak software include Sophie (version 1), a multimedia and e-book authoring system, and a family of virtual world systems: Open Cobalt, Teleplace, OpenQwaq, 3d ICC's Immersive Terf and the Croquet Project. == Influences == An experimental version of Etoys, a programming environment for children, used Tweak instead of Morphic. Etoys was a major influence on a similar Squeak-based programming environment known as Scratch.

    Read more →
  • Artificial intelligence and elections

    Artificial intelligence and elections

    As artificial intelligence (AI) has become more mainstream, there is growing concern about how this will influence elections. Potential targets of AI include election processes, election offices, election officials and election vendors. There are also global efforts to improve elections using AI. == Tactics == Generative AI capabilities allow creation of misleading content. Examples of this include text-to-video, deepfake videos, text-to-image, AI-altered images, text-to-speech, voice cloning, and text-to-text. In the context of an election, a deepfake video of a candidate may propagate information that the candidate does not endorse. Chatbots could spread misinformation related to election locations, times or voting methods. In contrast to malicious actors in the past, these techniques require little technical skill and can spread rapidly. LLM-generated messages have the capacity to persuade humans on political issues. Researchers have begun to investigate how people rate messages that LLMs generate for how persuasive they are. When it came to policy issues, the LLM-generated messages received a 2.91 compared to a 2.80 when it came to smartness between the AI and humans. The LLM-generated messages were often more technical and analytical than human-generated messages. Generative AI has been used to micro-target people during tight political elections. The generation of targeted large language models has triggered concern that they will be used to leverage readily scale microtargeting. Rephrasing inputs have been used to generate fraudulent emails and phishing websites. Rephrasing inputs in a microtargeting does not violate the terms of OpenAI usage. There are no safeguards to prevent the use of rephrasing and creation of fraudulent emails. Political campaign managers have access to this allowing for them to create targeted content. == Usage by country == === Argentina === ==== 2023 elections ==== During the 2023 Argentine primary elections, Javier Milei's team distributed AI generated images including a fabricated image of his rival Sergio Massa and drew 3 million views. The team also created an unofficial Instagram account entitled "AI for the Homeland." Sergio Massa's team also distributed AI generated images and videos. === Bangladesh === ==== 2024 elections ==== In the run up to the 2024 Bangladeshi general election, deepfake videos of female opposition politicians appeared. Rumin Farhana was pictured in a bikini while Nipun Ray was shown in a swimming pool. === Canada === ==== 2025 elections ==== In the run up to the 2025 Canadian federal election, the use of AI tools is likely to figure prominently. India, Pakistan and Iran are all expected to make efforts to subvert the national vote using disinformation campaigns to deceive voters and sway diaspora communities. In a report by the Canadian Centre for Cyber Security called "Cyber Threats to Canada's Democratic Process: 2025 Update", it states that malicious actors including China and Russia: "are most likely to use generative AI as a means of creating and spreading disinformation, designed to sow division among Canadians and push narratives conducive to the interests of foreign states". === France === ==== 2024 elections ==== In the 2024 French legislative election, deepfake videos appeared claiming: i) That they showed the family of Marine le Pen. In the videos, young women, supposedly Le Pen's nieces, are seen skiing, dancing and at the beach "while making fun of France’s racial minorities": However, the family members don't exist. On social media there were over 2 million views. ii) In a video seen on social media, a deepfake video of a France24 broadcast appeared to report that the Ukrainian leadership had "tried to lure French president Emmanuel Macron to Ukraine to assassinate him and then blame his death on Russia". === Ghana === ==== 2024 elections ==== During the months before the December 2024 Ghanaian general election, a network of at least 171 fake accounts has been used to spam social media. Posts have been used by a group identified as "@TheTPatriots" to promote the New Patriotic Party, although it is not known whether the two are connected. All the networks' posts were "highly likely" to have been generated by ChatGPT and appear to be the "first secretly partisan network using AI to influence elections in Ghana". The opposition National Democratic Congress was also criticized with its leader John Mahama being called a drunkard. === India === ==== 2024 elections ==== In the 2024 Indian general election, politicians used deepfakes in their campaign materials. These deepfakes included politicians who had died prior to the election. Mathuvel Karunanidhi's party posted with his likeness even though he had died 2018. A video The All-India Anna Dravidian Progressive Federation party posted showed an audio clip of Jayaram Jayalalithaa even though she had died in 2016. The Deepfakes Analysis Unit (DAU) is an open source platform created in March 2024 for the public to share misleading content and assess if it had been AI-generated. AI was also used to translate political speeches in real time. This translating ability was widely used to reach more voters. === Indonesia === ==== 2024 elections ==== In the 2024 Indonesian presidential election, Prabowo Subianto made extensive use of AI-generated art in his campaign, which ranged from images of himself as an adorable child to various child portrayals in his advertisements. The Indonesian Children's Protection Commission condemned these ads, labeling them as a form of misuse. Other candidates, Anies Baswedan and Ganjar Pranowo, also incorporated AI art into their campaigns. Throughout the election period, all presidential candidates faced attacks from deepfakes, both in video and audio formats. === Ireland === ==== 2024 elections ==== In the last weeks of the 2024 Irish general election a spoof election poster appeared in Dublin featuring "an AI-generated candidate with three arms". The candidate is called Aidan Irwin, but no-one stood in the election with that name. A slogan on the poster says "put matters into artificial intelligence’s hands". The convincing election poster shows a man that "has six fingers on one hand, three arms, and a distorted thumb". === New Zealand === ==== 2023 elections ==== In May 2023, ahead of the 2023 New Zealand general election in October 2023, the New Zealand National Party published a "series of AI-generated political advertisements" on its Instagram account. After confirming that the images were faked, a party spokesperson said that it was "an innovative way to drive our social media". === Pakistan === ==== 2024 elections ==== AI has been used by the imprisoned ex-Prime Minister Imran Khan and his media team in the 2024 Pakistani general election: i) An AI generated audio of his voice was added to a video clip and was broadcast at a virtual rally. ii) An op-ed in The Economist written by Khan was later claimed by himself to have been written by AI which was later denied by his team. The article was liked and shared on social media by thousands of users. === South Africa === ==== 2024 elections ==== In the 2024 South African general election, there were several uses of AI content: i) A deepfaked video of Joe Biden emerged on social media showing him saying that "The U.S. would place sanctions on SA and declare it an enemy state if the African National Congress (ANC) won". ii) In a deepfake video, Donald Trump was shown endorsing the uMkhonto weSizwe party. It was posted to social media and was viewed more than 158,000 times. iii) Less than 3 months before the elections, a deepfake video showed U.S. rapper Eminem endorsing the Economic Freedom Fighters party while criticizing the ANC. The deepfake was viewed on social media more than 173,000 times. === South Korea === ==== 2022 elections ==== In the 2022 South Korean presidential election, a committee for one presidential candidate Yoon Suk Yeol released an AI avatar 'Al Yoon Seok-yeol' that would campaign in places the candidate could not go. The other presidential candidate Lee Jae-myung introduced a chatbot that provided information about the candidate's pledges. ==== 2024 elections ==== Deepfakes were used to spread misinformation before the 2024 South Korean legislative election with one source reporting 129 deepfake violations of election laws within a two week period. Seoul hosted the 2024 Summit for Democracy, a virtual gathering of world leaders initiated by US President Joe Biden in 2021. The focus of the summit was on digital threats to democracy including artificial intelligence and deepfakes. === Taiwan === ==== 2024 elections ==== AI-generated content was used during the 2024 Taiwanese presidential election. Among the media were: i) A deepfake video of General Secretary of the Chinese Communist Party Xi Jinping which showed him supporting the presidential elections. Created on social media, the video was "widely circulated

    Read more →
  • Automated decision-making

    Automated decision-making

    Automated decision-making (ADM) is the use of data, machines and algorithms to make decisions in a range of contexts, including public administration, business, health, education, law, employment, transport, media and entertainment, with varying degrees of human oversight or intervention. ADM may involve large-scale data from a range of sources, such as databases, text, social media, sensors, images or speech, that is processed using various technologies including computer software, algorithms, machine learning, natural language processing, artificial intelligence, augmented intelligence and robotics. The increasing use of automated decision-making systems (ADMS) across a range of contexts presents many benefits and challenges to human society requiring consideration of the technical, legal, ethical, societal, educational, economic and health consequences. == Overview == There are different definitions of ADM based on the level of automation involved. Some definitions suggests ADM involves decisions made through purely technological means without human input, such as the EU's General Data Protection Regulation (Article 22). However, ADM technologies and applications can take many forms ranging from decision-support systems that make recommendations for human decision-makers to act on, sometimes known as augmented intelligence or 'shared decision-making', to fully automated decision-making processes that make decisions on behalf of individuals or organizations without human involvement. Models used in automated decision-making systems can be as simple as checklists and decision trees through to artificial intelligence and deep neural networks (DNN). Since the 1950s computers have gone from being able to do basic processing to having the capacity to undertake complex, ambiguous and highly skilled tasks such as image and speech recognition, gameplay, scientific and medical analysis and inferencing across multiple data sources. ADM is now being increasingly deployed across all sectors of society and many diverse domains from entertainment to transport. An ADM system (ADMS) may involve multiple decision points, data sets, and technologies (ADMT) and may sit within a larger administrative or technical system such as a criminal justice system or business process. == Data == Automated decision-making involves using data as input to be analyzed within a process, model, or algorithm or for learning and generating new models. ADM systems may use and connect a wide range of data types and sources depending on the goals and contexts of the system, for example, sensor data for self-driving cars and robotics, identity data for security systems, demographic and financial data for public administration, medical records in health, criminal records in law. This can sometimes involve vast amounts of data and computing power. === Data quality === The quality of the available data and its ability to be used in ADM systems is fundamental to the outcomes. It is often highly problematic for many reasons. Datasets are often highly variable; corporations or governments may control large-scale data, restricted for privacy or security reasons, incomplete, biased, limited in terms of time or coverage, measuring and describing terms in different ways, and many other issues. For machines to learn from data, large corpora are often required, which can be challenging to obtain or compute; however, where available, they have provided significant breakthroughs, for example, in diagnosing chest X-rays. == ADM technologies == Automated decision-making technologies (ADMT) are software-coded digital tools that automate the translation of input data to output data, contributing to the function of automated decision-making systems. There are a wide range of technologies in use across ADM applications and systems. ADMTs involving basic computational operations Search (includes 1-2-1, 1-2-many, data matching/merge) Matching (two different things) Mathematical Calculation (formula) ADMTs for assessment and grouping: User profiling Recommender systems Clustering Classification Feature learning Predictive analytics (includes forecasting) ADMTs relating to space and flows: Social network analysis (includes link prediction) Mapping Routing ADMTs for processing of complex data formats Image processing Audio processing Natural Language Processing (NLP) Other ADMT Business rules management systems Time series analysis Anomaly detection Modelling/Simulation === Machine learning === Machine learning (ML) involves training computer programs through exposure to large data sets and examples to learn from experience and solve problems. Machine learning can be used to generate and analyse data as well as make algorithmic calculations and has been applied to image and speech recognition, translations, text, data and simulations. While machine learning has been around for some time, it is becoming increasingly powerful due to recent breakthroughs in training deep neural networks (DNNs), and dramatic increases in data storage capacity and computational power with GPU coprocessors and cloud computing. Machine learning systems based on foundation models run on deep neural networks and use pattern matching to train a single huge system on large amounts of general data such as text and images. Early models tended to start from scratch for each new problem however since the early 2020s many are able to be adapted to new problems. Examples of these technologies include Open AI's DALL-E (an image creation program) and their various GPT language models, and Google's PaLM language model program. == Applications == ADM is being used to replace or augment human decision-making by both public and private-sector organisations for a range of reasons including to help increase consistency, improve efficiency, reduce costs and enable new solutions to complex problems. === Debate === Research and development are underway into uses of technology to assess argument quality, assess argumentative essays and judge debates. Potential applications of these argument technologies span education and society. Scenarios to consider, in these regards, include those involving the assessment and evaluation of conversational, mathematical, scientific, interpretive, legal, and political argumentation and debate. === Law === In legal systems around the world, algorithmic tools such as risk assessment instruments (RAI), are being used to supplement or replace the human judgment of judges, civil servants and police officers in many contexts. In the United States RAI are being used to generate scores to predict the risk of recidivism in pre-trial detention and sentencing decisions, evaluate parole for prisoners and to predict "hot spots" for future crime. These scores may result in automatic effects or may be used to inform decisions made by officials within the justice system. In Canada ADM has been used since 2014 to automate certain activities conducted by immigration officials and to support the evaluation of some immigrant and visitor applications. === Economics === Automated decision-making systems are used in certain computer programs to create buy and sell orders related to specific financial transactions and automatically submit the orders in the international markets. Computer programs can automatically generate orders based on predefined set of rules using trading strategies which are based on technical analyses, advanced statistical and mathematical computations, or inputs from other electronic sources. === Business === ==== Continuous auditing ==== Continuous auditing uses advanced analytical tools to automate auditing processes. It can be utilized in the private sector by business enterprises and in the public sector by governmental organizations and municipalities. As artificial intelligence and machine learning continue to advance, accountants and auditors may make use of increasingly sophisticated algorithms which make decisions such as those involving determining what is anomalous, whether to notify personnel, and how to prioritize those tasks assigned to personnel. === Media and entertainment === Digital media, entertainment platforms, and information services increasingly provide content to audiences via automated recommender systems based on demographic information, previous selections, collaborative filtering or content-based filtering. This includes music and video platforms, publishing, health information, product databases and search engines. Many recommender systems also provide some agency to users in accepting recommendations and incorporate data-driven algorithmic feedback loops based on the actions of the system user. Large-scale machine learning language models and image creation programs being developed by companies such as OpenAI and Google in the 2020s have restricted access however they are likely to have widespread application in fields such as advertising, copywriting, stock imagery and gra

    Read more →
  • Reciprocal human machine learning

    Reciprocal human machine learning

    Reciprocal Human Machine Learning (RHML) is an interdisciplinary approach to designing human-AI interaction systems. RHML aims to enable continual learning between humans and machine learning models by having them learn from each other. This approach keeps the human expert "in the loop" to oversee and enhance machine learning performance and simultaneously support the human expert continue learning. == Background == RHML emerged in the context of the rise of big data analytics and artificial intelligence for intelligent tasks like sense-making and decision-making. As machine learning advanced to take on more roles, researchers realized fully autonomous systems had limitations and needed human guidance. RHML extends the concept of human-in-the-loop systems by promoting reciprocal learning. Humans learn from their interactions with machine learning models, staying up-to-date on evolving technology. The models also learn from human feedback and oversight. This amplification of learning on both sides is a key focus of RHML. The approach draws on theories of learning in dyads from education and psychology. It also builds on human-computer interaction and human-centered design principles. Implementing RHML requires developing specialized tools and interfaces tailored to the application == Applications == RHML has been explored across diverse domains including: Cybersecurity - Software to enable reciprocal learning between experts and AI models for social media threat detection. Organizational decision-making - RHML to structure collaboration between humans and AI systems. Workplace training - Using RHML for workers to learn from AI technologies on the job. Open science - Using human and AI collaboration to promote open science. Production and logistics - turning workers and intelligent machines into teammates. RHML maintains human oversight and control over AI systems, while enabling cutting-edge machine learning performance. This collaborative approach highlights the importance of keeping the human expert involved in the loop. An example of RHML in application is Free Spirit (AFSFCV), an open-source architecture first published in early 2025 as a whitepaper, proposing a visually structured approach to intent-based human–AI interaction.

    Read more →
  • Auralization

    Auralization

    Auralization is a procedure designed to model and simulate the experience of acoustic phenomena rendered as a soundfield in a virtualized space. This is useful in configuring the soundscape of architectural structures, concert venues, and public spaces, as well as in making coherent sound environments within virtual immersion systems. == History == The English term auralization was used for the first time by Kleiner et al. in an article in the journal of the AES en 1991. The increase of computational power allowed the development of the first acoustic simulation software towards the end of the 1960s. == Principles == Auralizations are experienced through systems rendering virtual acoustic models made by convolving or mixing acoustic events recorded 'dry' (or in an anechoic chamber) projected within a virtual model of an acoustic space, the characteristics of which are determined by means of sampling its impulse response (IR). Once this h ( t ) {\displaystyle h(t)} has been determined, the simulation of the resulting soundfield s ( t ) {\displaystyle s(t)} in the target environment is obtained by convolution: r ( t ) = h ( t ) ∗ s ( t ) {\displaystyle r(t)=h(t)s(t)} The resulting sound r ( t ) {\displaystyle r(t)} is heard as it would if emitted in that acoustic space. == Binaurality == For auralizations to be perceived as realistic, it is critical to emulate the human hearing in terms of position and orientation of the listener's head with respect to the sources of sound. For IR data to be convolved convincingly, the acoustic events are captured using a dummy head where two microphones are positioned on each side of the head to record an emulation of sound arriving at the locations of human ears, or using an ambisonics microphone array and mixed down for binaurality. Head-related transfer functions (HRTF) datasets can be used to simplify the process insofar as a monaural IR can be measured or simulated, then audio content is convolved with its target acoustic space. In rendering the experience, the transfer function corresponding to the orientation of the head is applied to simulate the corresponding spatial emanation of sound.

    Read more →
  • AIXI

    AIXI

    AIXI is a theoretical mathematical formalism for artificial general intelligence. It combines Solomonoff induction with sequential decision theory. AIXI was first proposed by Marcus Hutter in 2000 and several results regarding AIXI are proved in Hutter's 2005 book Universal Artificial Intelligence. AIXI is a reinforcement learning (RL) agent. It maximizes the expected total rewards received from the environment. Intuitively, it simultaneously considers every computable hypothesis (or environment). In each time step, it looks at every possible program and evaluates how many rewards that program generates depending on the next action taken. The promised rewards are then weighted by the subjective belief that this program constitutes the true environment. This belief is computed from the length of the program: longer programs are considered less likely, in line with Occam's razor. AIXI then selects the action that has the highest expected total reward in the weighted sum of all these programs. == Etymology == According to Hutter, the word "AIXI" can have several interpretations. AIXI can stand for AI based on Solomonoff's distribution, denoted by ξ {\displaystyle \xi } (which is the Greek letter xi), or e.g. it can stand for AI "crossed" (X) with induction (I). There are other interpretations. == Definition == AIXI is a reinforcement learning agent that interacts with some stochastic and unknown but computable environment μ {\displaystyle \mu } . The interaction proceeds in time steps, from t = 1 {\displaystyle t=1} to t = m {\displaystyle t=m} , where m ∈ N {\displaystyle m\in \mathbb {N} } is the lifespan of the AIXI agent. At time step t, the agent chooses an action a t ∈ A {\displaystyle a_{t}\in {\mathcal {A}}} (e.g. a limb movement) and executes it in the environment, and the environment responds with a "percept" e t ∈ E = O × R {\displaystyle e_{t}\in {\mathcal {E}}={\mathcal {O}}\times \mathbb {R} } , which consists of an "observation" o t ∈ O {\displaystyle o_{t}\in {\mathcal {O}}} (e.g., a camera image) and a reward r t ∈ R {\displaystyle r_{t}\in \mathbb {R} } , distributed according to the conditional probability μ ( o t r t | a 1 o 1 r 1 . . . a t − 1 o t − 1 r t − 1 a t ) {\displaystyle \mu (o_{t}r_{t}|a_{1}o_{1}r_{1}...a_{t-1}o_{t-1}r_{t-1}a_{t})} , where a 1 o 1 r 1 . . . a t − 1 o t − 1 r t − 1 a t {\displaystyle a_{1}o_{1}r_{1}...a_{t-1}o_{t-1}r_{t-1}a_{t}} is the "history" of actions, observations and rewards. The environment μ {\displaystyle \mu } is thus mathematically represented as a probability distribution over "percepts" (observations and rewards) which depend on the full history, so there is no Markov assumption (as opposed to other RL algorithms). Note again that this probability distribution is unknown to the AIXI agent. Furthermore, note again that μ {\displaystyle \mu } is computable, that is, the observations and rewards received by the agent from the environment μ {\displaystyle \mu } can be computed by some program (which runs on a Turing machine), given the past actions of the AIXI agent. The only goal of the AIXI agent is to maximize ∑ t = 1 m r t {\displaystyle \sum _{t=1}^{m}r_{t}} , that is, the sum of rewards from time step 1 to m. The AIXI agent is associated with a stochastic policy π : ( A × E ) ∗ → A {\displaystyle \pi :({\mathcal {A}}\times {\mathcal {E}})^{}\rightarrow {\mathcal {A}}} , which is the function it uses to choose actions at every time step, where A {\displaystyle {\mathcal {A}}} is the space of all possible actions that AIXI can take and E {\displaystyle {\mathcal {E}}} is the space of all possible "percepts" that can be produced by the environment. The environment (or probability distribution) μ {\displaystyle \mu } can also be thought of as a stochastic policy (which is a function): μ : ( A × E ) ∗ × A → E {\displaystyle \mu :({\mathcal {A}}\times {\mathcal {E}})^{}\times {\mathcal {A}}\rightarrow {\mathcal {E}}} , where the ∗ {\displaystyle } is the Kleene star operation. In general, at time step t {\displaystyle t} (which ranges from 1 to m), AIXI, having previously executed actions a 1 … a t − 1 {\displaystyle a_{1}\dots a_{t-1}} (which is often abbreviated in the literature as a < t {\displaystyle a_{ Read more →

  • Concept drift

    Concept drift

    In predictive analytics, data science, machine learning and related fields, concept drift or drift is an evolution of data that invalidates the data model. It happens when the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes. Drift detection and drift adaptation are of paramount importance in the fields that involve dynamically changing data and data models. == Predictive model decay == In machine learning and predictive analytics this drift phenomenon is called concept drift. In machine learning, a common element of a data model are the statistical properties, such as probability distribution of the actual data. If they deviate from the statistical properties of the training data set, then the learned predictions may become invalid, if the drift is not addressed. == Data configuration decay == Another important area is software engineering, where three types of data drift affecting data fidelity may be recognized. Changes in the software environment ("infrastructure drift") may invalidate software infrastructure configuration. "Structural drift" happens when the data schema changes, which may invalidate databases. "Semantic drift" is changes in the meaning of data while the structure does not change. In many cases this may happen in complicated applications when many independent developers introduce changes without proper awareness of the effects of their changes in other areas of the software system. For many application systems, the nature of data on which they operate are subject to changes for various reasons, e.g., due to changes in business model, system updates, or switching the platform on which the system operates. In the case of cloud computing, infrastructure drift that may affect the applications running on cloud may be caused by the updates of cloud software. There are several types of detrimental effects of data drift on data fidelity. Data corrosion is passing the drifted data into the system undetected. Data loss happens when valid data are ignored due to non-conformance with the applied schema. Squandering is the phenomenon when new data fields are introduced upstream in the data processing pipeline, but somewhere downstream these data fields are absent. == Inconsistent data == "Data drift" may refer to the phenomenon when database records fail to match the real-world data due to the changes in the latter over time. This is a common problem with databases involving people, such as customers, employees, citizens, residents, etc. Human data drift may be caused by unrecorded changes in personal data, such as place of residence or name, as well as due to errors during data input. "Data drift" may also refer to inconsistency of data elements between several replicas of a database. The reasons can be difficult to identify. A simple drift detection is to run checksum regularly. However the remedy may be not so easy. == Examples == The behavior of the customers in an online shop may change over time. For example, if weekly merchandise sales are to be predicted, and a predictive model has been developed that works satisfactorily. The model may use inputs such as the amount of money spent on advertising, promotions being run, and other metrics that may affect sales. The model is likely to become less and less accurate over time – this is concept drift. In the merchandise sales application, one reason for concept drift may be seasonality, which means that shopping behavior changes seasonally. Perhaps there will be higher sales in the winter holiday season than during the summer, for example. Concept drift generally occurs when the covariates that comprise the data set begin to explain the variation of your target set less accurately — there may be some confounding variables that have emerged, and that one simply cannot account for, which renders the model accuracy to progressively decrease with time. Generally, it is advised to perform health checks as part of the post-production analysis and to re-train the model with new assumptions upon signs of concept drift. == Possible remedies == To prevent deterioration in prediction accuracy because of concept drift, reactive and tracking solutions can be adopted. Reactive solutions retrain the model in reaction to a triggering mechanism, such as a change-detection test or control charts from statistical process control, to explicitly detect concept drift as a change in the statistics of the data-generating process. When concept drift is detected, the current model is no longer up-to-date and must be replaced by a new one to restore prediction accuracy. A shortcoming of reactive approaches is that performance may decay until the change is detected. Tracking solutions seek to track the changes in the concept by continually updating the model. Methods for achieving this include online machine learning, frequent retraining on the most recently observed samples, and maintaining an ensemble of classifiers where one new classifier is trained on the most recent batch of examples and replaces the oldest classifier in the ensemble. Contextual information, when available, can be used to better explain the causes of the concept drift: for instance, in the sales prediction application, concept drift might be compensated by adding information about the season to the model. By providing information about the time of the year, the rate of deterioration of your model is likely to decrease, but concept drift is unlikely to be eliminated altogether. This is because actual shopping behavior does not follow any static, finite model. New factors may arise at any time that influence shopping behavior, the influence of the known factors or their interactions may change. Concept drift cannot be avoided for complex phenomena that are not governed by fixed laws of nature. All processes that arise from human activity, such as socioeconomic processes, and biological processes are likely to experience concept drift. Therefore, periodic retraining, also known as refreshing, of any model is necessary. === Remedy methods === DDM (Drift Detection Method): detects drift by monitoring the model's error rate over time. When the error rate passes a set threshold, it enters a warning phase, and if it passes another threshold, it enters a drift phase. EDDM (Early Drift Detection Method): improves DDM's detection rate by tracking the average distance between two errors instead of only the error rate. ADWIN (Adaptive Windowing): dynamically stores a window of recent data and warns the user if it detects a significant change between the statistics of the window's earlier data compared to more recent data. KSWIN (Kolmogorov–Smirnov Windowing): detects drift based on the Kolmogorov-Smirnov statistical test. DDM and EDDM: Concept Drift Detection online supervised methods that rely on sequential error monitoring to estimate the evolving error rate. ADWIN and KSWIN: Windowing maintain a "window", a subset of the most recent data, of the data stream, which it checks for statistical differences across the window. == Applications in security == Concept drift is a recurring issue in security analytics, especially in malware and intrusion detection. In these systems, models are often trained on past logs, binaries or network traces, but the behaviour of attackers changes over time as new malware families, obfuscation techniques and campaigns appear. When the data no longer resemble the training set, the decision boundaries learned by classifiers or anomaly detectors can become misaligned with the current threat landscape and detection performance can drop unless the models are updated or replaced. Several studies on Windows malware model detection as an evolving data stream and track how performance changes as time passes. They show that classifiers trained on a fixed time window can perform well on nearby data but deteriorate quickly when evaluated on samples collected months or years later, even when large amounts of training data are available. In order to keep up with this, security systems often use sliding or adaptive windows, which restrict training to the most recent portion of the data so that older, less relevant examples are gradually discarded. They also employ drift detectors such as ADWIN and KSWIN that monitor error rates or changes in the distribution of recent observations and signal when the statistics of the incoming stream differ significantly from the past, prompting retraining or model replacement. Related problems appear in spam filtering, fraud detection and intrusion detection, where adversaries change content, patterns of activity or network behavior to evade models trained on historical data. In these settings drift can be gradual, as new types of spam or fraud emerge, or abrupt, after a sudden shift in attack techniques. Common strategies to remain eff

    Read more →
  • Semantic folding

    Semantic folding

    Semantic folding theory describes a procedure for encoding the semantics of natural language text in a semantically grounded binary representation. This approach provides a framework for modelling how language data is processed by the neocortex. == Theory == Semantic folding theory draws inspiration from Douglas R. Hofstadter's Analogy as the Core of Cognition which suggests that the brain makes sense of the world by identifying and applying analogies. The theory hypothesises that semantic data must therefore be introduced to the neocortex in such a form as to allow the application of a similarity measure and offers, as a solution, the sparse binary vector employing a two-dimensional topographic semantic space as a distributional reference frame. The theory builds on the computational theory of the human cortex known as hierarchical temporal memory (HTM), and positions itself as a complementary theory for the representation of language semantics. A particular strength claimed by this approach is that the resulting binary representation enables complex semantic operations to be performed simply and efficiently at the most basic computational level. == Two-dimensional semantic space == Analogous to the structure of the neocortex, Semantic Folding theory posits the implementation of a semantic space as a two-dimensional grid. This grid is populated by context-vectors in such a way as to place similar context-vectors closer to each other, for instance, by using competitive learning principles. This vector space model is presented in the theory as an equivalence to the well known word space model described in the information retrieval literature. Given a semantic space (implemented as described above) a word-vector can be obtained for any given word Y by employing the following algorithm: For each position X in the semantic map (where X represents cartesian coordinates) if the word Y is contained in the context-vector at position X then add 1 to the corresponding position in the word-vector for Y else add 0 to the corresponding position in the word-vector for Y The result of this process will be a word-vector containing all the contexts in which the word Y appears and will therefore be representative of the semantics of that word in the semantic space. It can be seen that the resulting word-vector is also in a sparse distributed representation (SDR) format [Schütze, 1993] & [Sahlgreen, 2006]. Some properties of word-SDRs that are of particular interest with respect to computational semantics are: high noise resistance: As a result of similar contexts being placed closer together in the underlying map, word-SDRs are highly tolerant of false or shifted "bits". boolean logic: It is possible to manipulate word-SDRs in a meaningful way using boolean (OR, AND, exclusive-OR) and/or arithmetical (SUBtract) functions . sub-sampling: Word-SDRs can be sub-sampled to a high degree without any appreciable loss of semantic information. topological two-dimensional representation: The SDR representation maintains the topological distribution of the underlying map therefore words with similar meanings will have similar word-vectors. This suggests that a variety of measures can be applied to the calculation of semantic similarity, from a simple overlap of vector elements, to a range of distance measures such as: Euclidean distance, Hamming distance, Jaccard distance, cosine similarity, Levenshtein distance, Sørensen-Dice index, etc. == Semantic spaces == Semantic spaces in the natural language domain aim to create representations of natural language that are capable of capturing meaning. The original motivation for semantic spaces stems from two core challenges of natural language: Vocabulary mismatch (the fact that the same meaning can be expressed in many ways) and ambiguity of natural language (the fact that the same term can have several meanings). The application of semantic spaces in natural language processing (NLP) aims at overcoming limitations of rule-based or model-based approaches operating on the keyword level. The main drawback with these approaches is their brittleness, and the large manual effort required to create either rule-based NLP systems or training corpora for model learning. Rule-based and machine learning-based models are fixed on the keyword level and break down if the vocabulary differs from that defined in the rules or from the training material used for the statistical models. Research in semantic spaces dates back more than 20 years. In 1996, two papers were published that raised a lot of attention around the general idea of creating semantic spaces: latent semantic analysis from Microsoft and Hyperspace Analogue to Language from the University of California. However, their adoption was limited by the large computational effort required to construct and use those semantic spaces. A breakthrough with regard to the accuracy of modelling associative relations between words (e.g. "spider-web", "lighter-cigarette", as opposed to synonymous relations such as "whale-dolphin", "astronaut-driver") was achieved by explicit semantic analysis (ESA) in 2007. ESA was a novel (non-machine learning) based approach that represented words in the form of vectors with 100,000 dimensions (where each dimension represents an Article in Wikipedia). However practical applications of the approach are limited due to the large number of required dimensions in the vectors. More recently, advances in neural networking techniques in combination with other new approaches (tensors) led to a host of new recent developments: Word2vec from Google and GloVe from Stanford University. Semantic folding represents a novel, biologically inspired approach to semantic spaces where each word is represented as a sparse binary vector with 16,000 dimensions (a semantic fingerprint) in a 2D semantic map (the semantic universe). Sparse binary representation are advantageous in terms of computational efficiency, and allow for the storage of very large numbers of possible patterns. == Visualization == The topological distribution over a two-dimensional grid (outlined above) lends itself to a bitmap type visualization of the semantics of any word or text, where each active semantic feature can be displayed as e.g. a pixel. As can be seen in the images shown here, this representation allows for a direct visual comparison of the semantics of two (or more) linguistic items. Image 1 clearly demonstrates that the two disparate terms "dog" and "car" have, as expected, very obviously different semantics. Image 2 shows that only one of the meaning contexts of "jaguar", that of "Jaguar" the car, overlaps with the meaning of Porsche (indicating partial similarity). Other meaning contexts of "jaguar" e.g. "jaguar" the animal clearly have different non-overlapping contexts. The visualization of semantic similarity using Semantic Folding bears a strong resemblance to the fMRI images produced in a research study conducted by A.G. Huth et al., where it is claimed that words are grouped in the brain by meaning. voxels, little volume segments of the brain, were found to follow a pattern were semantic information is represented along the boundary of the visual cortex with visual and linguistic categories represented on posterior and anterior side respectively.

    Read more →
  • Reparameterization trick

    Reparameterization trick

    The reparameterization trick (aka "reparameterization gradient estimator") is a technique used in statistical machine learning, particularly in variational inference, variational autoencoders, and stochastic optimization. It allows for the efficient computation of gradients through random variables, enabling the optimization of parametric probability models using stochastic gradient descent, and the variance reduction of estimators. It was developed in the 1980s in operations research, under the name of "pathwise gradients", or "stochastic gradients". Its use in variational inference was proposed in 2013. == Mathematics == Let z {\displaystyle z} be a random variable with distribution q ϕ ( z ) {\displaystyle q_{\phi }(z)} , where ϕ {\displaystyle \phi } is a vector containing the parameters of the distribution. === REINFORCE estimator === Consider an objective function of the form: L ( ϕ ) = E z ∼ q ϕ ( z ) [ f ( z ) ] {\displaystyle L(\phi )=\mathbb {E} _{z\sim q_{\phi }(z)}[f(z)]} Without the reparameterization trick, estimating the gradient ∇ ϕ L ( ϕ ) {\displaystyle \nabla _{\phi }L(\phi )} can be challenging, because the parameter appears in the random variable itself. In more detail, we have to statistically estimate: ∇ ϕ L ( ϕ ) = ∇ ϕ ∫ d z q ϕ ( z ) f ( z ) {\displaystyle \nabla _{\phi }L(\phi )=\nabla _{\phi }\int dz\;q_{\phi }(z)f(z)} The REINFORCE estimator, widely used in reinforcement learning and especially policy gradient, uses the following equality: ∇ ϕ L ( ϕ ) = ∫ d z q ϕ ( z ) ∇ ϕ ( ln ⁡ q ϕ ( z ) ) f ( z ) = E z ∼ q ϕ ( z ) [ ∇ ϕ ( ln ⁡ q ϕ ( z ) ) f ( z ) ] {\displaystyle \nabla _{\phi }L(\phi )=\int dz\;q_{\phi }(z)\nabla _{\phi }(\ln q_{\phi }(z))f(z)=\mathbb {E} _{z\sim q_{\phi }(z)}[\nabla _{\phi }(\ln q_{\phi }(z))f(z)]} This allows the gradient to be estimated: ∇ ϕ L ( ϕ ) ≈ 1 N ∑ i = 1 N ∇ ϕ ( ln ⁡ q ϕ ( z i ) ) f ( z i ) {\displaystyle \nabla _{\phi }L(\phi )\approx {\frac {1}{N}}\sum _{i=1}^{N}\nabla _{\phi }(\ln q_{\phi }(z_{i}))f(z_{i})} The REINFORCE estimator has high variance, and many methods were developed to reduce its variance. === Reparameterization estimator === The reparameterization trick expresses z {\displaystyle z} as: z = g ϕ ( ϵ ) , ϵ ∼ p ( ϵ ) {\displaystyle z=g_{\phi }(\epsilon ),\quad \epsilon \sim p(\epsilon )} Here, g ϕ {\displaystyle g_{\phi }} is a deterministic function parameterized by ϕ {\displaystyle \phi } , and ϵ {\displaystyle \epsilon } is a noise variable drawn from a fixed distribution p ( ϵ ) {\displaystyle p(\epsilon )} . This gives: L ( ϕ ) = E ϵ ∼ p ( ϵ ) [ f ( g ϕ ( ϵ ) ) ] {\displaystyle L(\phi )=\mathbb {E} _{\epsilon \sim p(\epsilon )}[f(g_{\phi }(\epsilon ))]} Now, the gradient can be estimated as: ∇ ϕ L ( ϕ ) = E ϵ ∼ p ( ϵ ) [ ∇ ϕ f ( g ϕ ( ϵ ) ) ] ≈ 1 N ∑ i = 1 N ∇ ϕ f ( g ϕ ( ϵ i ) ) {\displaystyle \nabla _{\phi }L(\phi )=\mathbb {E} _{\epsilon \sim p(\epsilon )}[\nabla _{\phi }f(g_{\phi }(\epsilon ))]\approx {\frac {1}{N}}\sum _{i=1}^{N}\nabla _{\phi }f(g_{\phi }(\epsilon _{i}))} == Examples == For some common distributions, the reparameterization trick takes specific forms: Normal distribution: For z ∼ N ( μ , σ 2 ) {\displaystyle z\sim {\mathcal {N}}(\mu ,\sigma ^{2})} , we can use: z = μ + σ ϵ , ϵ ∼ N ( 0 , 1 ) {\displaystyle z=\mu +\sigma \epsilon ,\quad \epsilon \sim {\mathcal {N}}(0,1)} Exponential distribution: For z ∼ Exp ( λ ) {\displaystyle z\sim {\text{Exp}}(\lambda )} , we can use: z = − 1 λ log ⁡ ( ϵ ) , ϵ ∼ Uniform ( 0 , 1 ) {\displaystyle z=-{\frac {1}{\lambda }}\log(\epsilon ),\quad \epsilon \sim {\text{Uniform}}(0,1)} Discrete distribution can be reparameterized by the Gumbel distribution (Gumbel-softmax trick or "concrete distribution") and diffusion models. In general, any distribution that is differentiable with respect to its parameters can be reparameterized by inverting the multivariable CDF function, then apply the implicit method. See for an exposition and application to the Gamma, Beta, Dirichlet, and von Mises distributions. == Applications == === Variational autoencoder === In Variational Autoencoders (VAEs), the VAE objective function, known as the Evidence Lower Bound (ELBO), is given by: ELBO ( ϕ , θ ) = E z ∼ q ϕ ( z | x ) [ log ⁡ p θ ( x | z ) ] − D KL ( q ϕ ( z | x ) | | p ( z ) ) {\displaystyle {\text{ELBO}}(\phi ,\theta )=\mathbb {E} _{z\sim q_{\phi }(z|x)}[\log p_{\theta }(x|z)]-D_{\text{KL}}(q_{\phi }(z|x)||p(z))} where q ϕ ( z | x ) {\displaystyle q_{\phi }(z|x)} is the encoder (recognition model), p θ ( x | z ) {\displaystyle p_{\theta }(x|z)} is the decoder (generative model), and p ( z ) {\displaystyle p(z)} is the prior distribution over latent variables. The gradient of ELBO with respect to θ {\displaystyle \theta } is simply E z ∼ q ϕ ( z | x ) [ ∇ θ log ⁡ p θ ( x | z ) ] ≈ 1 L ∑ l = 1 L ∇ θ log ⁡ p θ ( x | z l ) {\displaystyle \mathbb {E} _{z\sim q_{\phi }(z|x)}[\nabla _{\theta }\log p_{\theta }(x|z)]\approx {\frac {1}{L}}\sum _{l=1}^{L}\nabla _{\theta }\log p_{\theta }(x|z_{l})} but the gradient with respect to ϕ {\displaystyle \phi } requires the trick. Express the sampling operation z ∼ q ϕ ( z | x ) {\displaystyle z\sim q_{\phi }(z|x)} as: z = μ ϕ ( x ) + σ ϕ ( x ) ⊙ ϵ , ϵ ∼ N ( 0 , I ) {\displaystyle z=\mu _{\phi }(x)+\sigma _{\phi }(x)\odot \epsilon ,\quad \epsilon \sim {\mathcal {N}}(0,I)} where μ ϕ ( x ) {\displaystyle \mu _{\phi }(x)} and σ ϕ ( x ) {\displaystyle \sigma _{\phi }(x)} are the outputs of the encoder network, and ⊙ {\displaystyle \odot } denotes element-wise multiplication. Then we have ∇ ϕ ELBO ( ϕ , θ ) = E ϵ ∼ N ( 0 , I ) [ ∇ ϕ log ⁡ p θ ( x | z ) + ∇ ϕ log ⁡ q ϕ ( z | x ) − ∇ ϕ log ⁡ p ( z ) ] {\displaystyle \nabla _{\phi }{\text{ELBO}}(\phi ,\theta )=\mathbb {E} _{\epsilon \sim {\mathcal {N}}(0,I)}[\nabla _{\phi }\log p_{\theta }(x|z)+\nabla _{\phi }\log q_{\phi }(z|x)-\nabla _{\phi }\log p(z)]} where z = μ ϕ ( x ) + σ ϕ ( x ) ⊙ ϵ {\displaystyle z=\mu _{\phi }(x)+\sigma _{\phi }(x)\odot \epsilon } . This allows us to estimate the gradient using Monte Carlo sampling: ∇ ϕ ELBO ( ϕ , θ ) ≈ 1 L ∑ l = 1 L [ ∇ ϕ log ⁡ p θ ( x | z l ) + ∇ ϕ log ⁡ q ϕ ( z l | x ) − ∇ ϕ log ⁡ p ( z l ) ] {\displaystyle \nabla _{\phi }{\text{ELBO}}(\phi ,\theta )\approx {\frac {1}{L}}\sum _{l=1}^{L}[\nabla _{\phi }\log p_{\theta }(x|z_{l})+\nabla _{\phi }\log q_{\phi }(z_{l}|x)-\nabla _{\phi }\log p(z_{l})]} where z l = μ ϕ ( x ) + σ ϕ ( x ) ⊙ ϵ l {\displaystyle z_{l}=\mu _{\phi }(x)+\sigma _{\phi }(x)\odot \epsilon _{l}} and ϵ l ∼ N ( 0 , I ) {\displaystyle \epsilon _{l}\sim {\mathcal {N}}(0,I)} for l = 1 , … , L {\displaystyle l=1,\ldots ,L} . This formulation enables backpropagation through the sampling process, allowing for end-to-end training of the VAE model using stochastic gradient descent or its variants. === Variational inference === More generally, the trick allows using stochastic gradient descent for variational inference. Let the variational objective (ELBO) be of the form: ELBO ( ϕ ) = E z ∼ q ϕ ( z ) [ log ⁡ p ( x , z ) − log ⁡ q ϕ ( z ) ] {\displaystyle {\text{ELBO}}(\phi )=\mathbb {E} _{z\sim q_{\phi }(z)}[\log p(x,z)-\log q_{\phi }(z)]} Using the reparameterization trick, we can estimate the gradient of this objective with respect to ϕ {\displaystyle \phi } : ∇ ϕ ELBO ( ϕ ) ≈ 1 L ∑ l = 1 L ∇ ϕ [ log ⁡ p ( x , g ϕ ( ϵ l ) ) − log ⁡ q ϕ ( g ϕ ( ϵ l ) ) ] , ϵ l ∼ p ( ϵ ) {\displaystyle \nabla _{\phi }{\text{ELBO}}(\phi )\approx {\frac {1}{L}}\sum _{l=1}^{L}\nabla _{\phi }[\log p(x,g_{\phi }(\epsilon _{l}))-\log q_{\phi }(g_{\phi }(\epsilon _{l}))],\quad \epsilon _{l}\sim p(\epsilon )} === Dropout === The reparameterization trick has been applied to reduce the variance in dropout, a regularization technique in neural networks. The original dropout can be reparameterized with Bernoulli distributions: y = ( W ⊙ ϵ ) x , ϵ i j ∼ Bernoulli ( α i j ) {\displaystyle y=(W\odot \epsilon )x,\quad \epsilon _{ij}\sim {\text{Bernoulli}}(\alpha _{ij})} where W {\displaystyle W} is the weight matrix, x {\displaystyle x} is the input, and α i j {\displaystyle \alpha _{ij}} are the (fixed) dropout rates. More generally, other distributions can be used than the Bernoulli distribution, such as the gaussian noise: y i = μ i + σ i ⊙ ϵ i , ϵ i ∼ N ( 0 , I ) {\displaystyle y_{i}=\mu _{i}+\sigma _{i}\odot \epsilon _{i},\quad \epsilon _{i}\sim {\mathcal {N}}(0,I)} where μ i = m i ⊤ x {\displaystyle \mu _{i}=\mathbf {m} _{i}^{\top }x} and σ i 2 = v i ⊤ x 2 {\displaystyle \sigma _{i}^{2}=\mathbf {v} _{i}^{\top }x^{2}} , with m i {\displaystyle \mathbf {m} _{i}} and v i {\displaystyle \mathbf {v} _{i}} being the mean and variance of the i {\displaystyle i} -th output neuron. The reparameterization trick can be applied to all such cases, resulting in the variational dropout method.

    Read more →
  • Google Research

    Google Research

    Google Research (also known as Research at Google) is the research division of Google, a subsidiary of Alphabet Inc.. According to its official website, Google Research publishes findings, releases open-source software, and applies research results within Google products and services as well as within the wider scientific community. == Notable contributions == The 2017 landmark paper Attention Is All You Need, which introduced the Transformer architecture, which has subsequently been used to build modern large language models. Advances in neural machine translation powering Google Translate. Time series forecasting. Development of scalable learning systems and infrastructure for large-model training. Flood forecasting. Research into computational discovery via Google Accelerated Science including demonstrating the first below-threshold quantum calculations.

    Read more →
  • Tensor (machine learning)

    Tensor (machine learning)

    In machine learning, the term tensor informally refers to two different concepts: (i) a way of organizing data and (ii) a multilinear (tensor) transformation. Data may be organized in a multidimensional array (M-way array), informally referred to as a "data tensor"; however, in the strict mathematical sense, a tensor is a multilinear mapping over a set of domain vector spaces to a range vector space. Observations, such as images, movies, volumes, sounds, and relationships among words and concepts, stored in an M-way array ("data tensor"), may be analyzed either by artificial neural networks or tensor methods. Tensor decomposition factors data tensors into smaller tensors. Operations on data tensors can be expressed in terms of matrix multiplication and the Kronecker product. The computation of gradients, a crucial aspect of backpropagation, can be performed using software libraries such as PyTorch and TensorFlow. Computations are often performed on graphics processing units (GPUs) using CUDA, and on dedicated hardware such as Google's Tensor Processing Unit or Nvidia's Tensor core. These developments have greatly accelerated neural network architectures, and increased the size and complexity of models that can be trained. == History == A tensor is by definition a multilinear map. In mathematics, this may express a multilinear relationship between sets of algebraic objects. In physics, tensor fields, considered as tensors at each point in space, are useful in expressing mechanics such as stress or elasticity. In machine learning, the exact use of tensors depends on the statistical approach being used. In 2001, the field of signal processing and statistics were making use of tensor methods. Pierre Comon surveys the early adoption of tensor methods in the fields of telecommunications, radio surveillance, chemometrics and sensor processing. Linear tensor rank methods (such as, Parafac/CANDECOMP) analyzed M-way arrays ("data tensors") composed of higher order statistics that were employed in blind source separation problems to compute a linear model of the data. He noted several early limitations in determining the tensor rank and efficient tensor rank decomposition. In the early 2000s, multilinear tensor methods crossed over into computer vision, computer graphics and machine learning with papers by Vasilescu or in collaboration with Terzopoulos, such as Human Motion Signatures, TensorFaces TensorTextures and Multilinear Projection. Multilinear algebra, the algebra of higher-order tensors, is a suitable and transparent framework for analyzing the multifactor structure of an ensemble of observations and for addressing the difficult problem of disentangling the causal factors based on second order or higher order statistics associated with each causal factor. Tensor (multilinear) factor analysis disentangles and reduces the influence of different causal factors with multilinear subspace learning. When treating an image or a video as a 2- or 3-way array, i.e., "data matrix/tensor", tensor methods reduce spatial or time redundancies as demonstrated by Wang and Ahuja. Yoshua Bengio, Geoff Hinton and their collaborators briefly discuss the relationship between deep neural networks and tensor factor analysis beyond the use of M-way arrays ("data tensors") as inputs. One of the early uses of tensors for neural networks appeared in natural language processing. A single word can be expressed as a vector via Word2vec. Thus a relationship between two words can be encoded in a matrix. However, for more complex relationships such as subject-object-verb, it is necessary to build higher-dimensional networks. In 2009, the work of Sutskever introduced Bayesian Clustered Tensor Factorization to model relational concepts while reducing the parameter space. From 2014 to 2015, tensor methods become more common in convolutional neural networks (CNNs). Tensor methods organize neural network weights in a "data tensor", analyze and reduce the number of neural network weights. Lebedev et al. accelerated CNN networks for character classification (the recognition of letters and digits in images) by using 4D kernel tensors. == Definition == Let F {\displaystyle \mathbb {F} } be a field (such as the real numbers R {\displaystyle \mathbb {R} } or the complex numbers C {\displaystyle \mathbb {C} } ). A tensor T ∈ F I 1 × I 2 × … × I C {\displaystyle {\mathcal {T}}\in {\mathbb {F} }^{I_{1}\times I_{2}\times \ldots \times I_{C}}} is a multilinear transformation from a set of domain vector spaces to a range vector space: T : { F I 1 × F I 2 × … F I C } ↦ F I 0 {\displaystyle {\mathcal {T}}:\{{\mathbb {F} }^{I_{1}}\times {\mathbb {F} }^{I_{2}}\times \ldots {\mathbb {F} }^{I_{C}}\}\mapsto {\mathbb {F} }^{I_{0}}} Here, C {\displaystyle C} and I 0 , I 1 , … , I C {\displaystyle I_{0},I_{1},\ldots ,I_{C}} are positive integers, and ( C + 1 ) {\displaystyle (C+1)} is the number of modes of a tensor (also known as the number of ways of a multi-way array). The dimensionality of mode c {\displaystyle c} is I c {\displaystyle I_{c}} , for 0 ≤ c ≤ C {\displaystyle 0\leq c\leq C} . In statistics and machine learning, an image is vectorized when viewed as a single observation, and a collection of vectorized images is organized as a "data tensor". For example, a set of facial images { d i p , i e , i l , i v ∈ R I X } {\displaystyle \{{\mathbb {d} }_{i_{p},i_{e},i_{l},i_{v}}\in {\mathbb {R} }^{I_{X}}\}} with I X {\displaystyle I_{X}} pixels that are the consequences of multiple causal factors, such as a facial geometry i p ( 1 ≤ i p ≤ I P ) {\displaystyle i_{p}(1\leq i_{p}\leq I_{P})} , an expression i e ( 1 ≤ i e ≤ I E ) {\displaystyle i_{e}(1\leq i_{e}\leq I_{E})} , an illumination condition i l ( 1 ≤ i l ≤ I L ) {\displaystyle i_{l}(1\leq i_{l}\leq I_{L})} , and a viewing condition i v ( 1 ≤ i v ≤ I V ) {\displaystyle i_{v}(1\leq i_{v}\leq I_{V})} may be organized into a data tensor (ie. multiway array) D ∈ R I X × I P × I E × I L × V {\displaystyle {\mathcal {D}}\in {\mathbb {R} }^{I_{X}\times I_{P}\times I_{E}\times I_{L}\times V}} where I P {\displaystyle I_{P}} are the total number of facial geometries, I E {\displaystyle I_{E}} are the total number of expressions, I L {\displaystyle I_{L}} are the total number of illumination conditions, and I V {\displaystyle I_{V}} are the total number of viewing conditions. Tensor factorizations methods such as TensorFaces and multilinear (tensor) independent component analysis factorizes the data tensor into a set of vector spaces that span the causal factor representations, where an image is the result of tensor transformation T {\displaystyle {\mathcal {T}}} that maps a set of causal factor representations to the pixel space. Another approach to using tensors in machine learning is to embed various data types directly. For example, a grayscale image, commonly represented as a discrete 2-way array D ∈ R I R X × I C X {\displaystyle {\mathbf {D} }\in {\mathbb {R} }^{I_{RX}\times I_{CX}}} with dimensionality I R X × I C X {\displaystyle I_{RX}\times I_{CX}} where I R X {\displaystyle I_{RX}} are the number of rows and I C X {\displaystyle I_{CX}} are the number of columns. When an image is treated as 2-way array or 2nd order tensor (i.e. as a collection of column/row observations), tensor factorization methods compute the image column space, the image row space and the normalized PCA coefficients or the ICA coefficients. Similarly, a color image with RGB channels, D ∈ R N × M × 3 . {\displaystyle {\mathcal {D}}\in \mathbb {R} ^{N\times M\times 3}.} may be viewed as a 3rd order data tensor or 3-way array.-------- In natural language processing, a word might be expressed as a vector v {\displaystyle v} via the Word2vec algorithm. Thus v {\displaystyle v} becomes a mode-1 tensor v ↦ A ∈ R N . {\displaystyle v\mapsto {\mathcal {A}}\in \mathbb {R} ^{N}.} The embedding of subject-object-verb semantics requires embedding relationships among three words. Because a word is itself a vector, subject-object-verb semantics could be expressed using mode-3 tensors v a × v b × v c ↦ A ∈ R N × N × N . {\displaystyle v_{a}\times v_{b}\times v_{c}\mapsto {\mathcal {A}}\in \mathbb {R} ^{N\times N\times N}.} In practice the neural network designer is primarily concerned with the specification of embeddings, the connection of tensor layers, and the operations performed on them in a network. Modern machine learning frameworks manage the optimization, tensor factorization and backpropagation automatically. === As unit values === Tensors may be used as the unit values of neural networks which extend the concept of scalar, vector and matrix values to multiple dimensions. The output value of single layer unit y m {\displaystyle y_{m}} is the sum-product of its input units and the connection weights filtered through the activation function f {\displaystyle f} : y m = f ( ∑ n x n u m , n ) , {\displaystyle y_{m}=f\left(\sum _{n}x_{n}u_{m,n}\right),} where y m ∈ R .

    Read more →
  • Mountain car problem

    Mountain car problem

    Mountain Car, a standard testing domain in Reinforcement learning, is a problem in which an under-powered car must drive up a steep hill. Since gravity is stronger than the car's engine, even at full throttle, the car cannot simply accelerate up the steep slope. The car is situated in a valley and must learn to leverage potential energy by driving up the opposite hill before the car is able to make it to the goal at the top of the rightmost hill. The domain has been used as a test bed in various reinforcement learning papers. == Introduction == The mountain car problem, although fairly simple, is commonly applied because it requires a reinforcement learning agent to learn on two continuous variables: position and velocity. For any given state (position and velocity) of the car, the agent is given the possibility of driving left, driving right, or not using the engine at all. In the standard version of the problem, the agent receives a negative reward at every time step when the goal is not reached; the agent has no information about the goal until an initial success. == History == The mountain car problem appeared first in Andrew Moore's PhD thesis (1990). It was later more strictly defined in Singh and Sutton's reinforcement learning paper with eligibility traces. The problem became more widely studied when Sutton and Barto added it to their book Reinforcement Learning: An Introduction (1998). Throughout the years many versions of the problem have been used, such as those which modify the reward function, termination condition, and the start state. == Techniques used to solve mountain car == Q-learning and similar techniques for mapping discrete states to discrete actions need to be extended to be able to deal with the continuous state space of the problem. Approaches often fall into one of two categories, state space discretization or function approximation. === Discretization === In this approach, two continuous state variables are pushed into discrete states by bucketing each continuous variable into multiple discrete states. This approach works with properly tuned parameters but a disadvantage is information gathered from one state is not used to evaluate another state. Tile coding can be used to improve discretization and involves continuous variables mapping into sets of buckets offset from one another. Each step of training has a wider impact on the value function approximation because when the offset grids are summed, the information is diffused. === Function approximation === Function approximation is another way to solve the mountain car. By choosing a set of basis functions beforehand, or by generating them as the car drives, the agent can approximate the value function at each state. Unlike the step-wise version of the value function created with discretization, function approximation can more cleanly estimate the true smooth function of the mountain car domain. === Eligibility traces === One aspect of the problem involves the delay of actual reward. The agent is not able to learn about the goal until a successful completion. Given a naive approach for each trial the car can only backup the reward of the goal slightly. This is a problem for naive discretization because each discrete state will only be backed up once, taking a larger number of episodes to learn the problem. This problem can be alleviated via the mechanism of eligibility traces, which will automatically backup the reward given to states before, dramatically increasing the speed of learning. Eligibility traces can be viewed as a bridge from temporal difference learning methods to Monte Carlo methods. == Technical details == The mountain car problem has undergone many iterations. This section focuses on the standard well-defined version from Sutton (2008). === State variables === Two-dimensional continuous state space. V e l o c i t y = ( − 0.07 , 0.07 ) {\displaystyle Velocity=(-0.07,0.07)} P o s i t i o n = ( − 1.2 , 0.6 ) {\displaystyle Position=(-1.2,0.6)} === Actions === One-dimensional discrete action space. m o t o r = ( l e f t , n e u t r a l , r i g h t ) {\displaystyle motor=(left,neutral,right)} === Reward === For every time step: r e w a r d = − 1 {\displaystyle reward=-1} === Update function === For every time step: A c t i o n = [ − 1 , 0 , 1 ] {\displaystyle Action=[-1,0,1]} V e l o c i t y = V e l o c i t y + ( A c t i o n ) ∗ 0.001 + cos ⁡ ( 3 ∗ P o s i t i o n ) ∗ ( − 0.0025 ) {\displaystyle Velocity=Velocity+(Action)0.001+\cos(3Position)(-0.0025)} P o s i t i o n = P o s i t i o n + V e l o c i t y {\displaystyle Position=Position+Velocity} === Starting condition === Optionally, many implementations include randomness in both parameters to show better generalized learning. P o s i t i o n = − 0.5 {\displaystyle Position=-0.5} V e l o c i t y = 0.0 {\displaystyle Velocity=0.0} === Termination condition === End the simulation when: P o s i t i o n ≥ 0.6 {\displaystyle Position\geq 0.6} == Variations == There are many versions of the mountain car which deviate in different ways from the standard model. Variables that vary include but are not limited to changing the constants (gravity and steepness) of the problem so specific tuning for specific policies become irrelevant and altering the reward function to affect the agent's ability to learn in a different manner. An example is changing the reward to be equal to the distance from the goal, or changing the reward to zero everywhere and one at the goal. Additionally, a 3D mountain car can be used, with a 4D continuous state space.

    Read more →
  • CloudPassage

    CloudPassage

    CloudPassage is a company that provides an automation platform, delivered via software as a service, that improves security for private, public, and hybrid cloud computing environments. CloudPassage is headquartered in San Francisco. == History == CloudPassage was founded by Carson Sweet, Talli Somekh, and Vitaliy Geraymovych in 2010. The company used cloud computing and big data analytics to implement security monitoring and control in a platform called Halo. CloudPassage spent a year in stealth developing the Halo technology, coming out of stealth mode to a closed beta in January 2011. In June 2012, the company launched the commercial product that included configuration security monitoring, network microsegmentation, and two-factor authentication for privileged access management. By 2013, CloudPassage expanded Halo to support large enterprises with advanced security and compliance requirements with a product called Halo Enterprise. The first round of venture funding for the company raised $6.5 million. In April 2012, CloudPassage raised $14 million. The financing round was led by Tenaya Capital. In February 2014, CloudPassage announced that it had raised $25.5 million in funding led by Shasta Ventures. In total, the company has invested over $30 million in its technology and raised approximately $88 million in capital. == Product == The CloudPassage platform provides cloud workload security and compliance for systems hosted in public or private cloud infrastructure environments, including hybrid cloud and multi-cloud workload hosting models. The flagship product the company offers is called Halo. Halo secures virtual servers in public, private, and hybrid cloud infrastructures and provides file integrity monitoring (FIM) while also administering firewall automation, vulnerability monitoring, network access control, security event alerting, and assessment. The Halo platform also provides security applications such as privileged access management, software vulnerability scanning, multifactor authentication, and log-based IDS. In December 2013, CloudPassage set up six servers with Microsoft Windows and Linux operating systems and combinations of popular programs and invited hackers to attempt to hack into the servers. The top prize was $5,000 and the winning hacker was a novice that completed the task in four hours. CloudPassage programmed the servers to use basic default security settings to show how vulnerable cloud computing programs can be to security threats. == Awards and recognition == In May 2011, Gigaom named CloudPassage in its list of the Top 50 Cloud Innovators. That same month, eWeek recognized CloudPassage as one of 16 Hot Startup Companies Flying Under the Radar. SC Magazine named CloudPassage an Industry Innovator in the Virtualization and Cloud Security category in 2012. Also in 2012, The Wall Street Journal named CloudPassage a runner-up in the Information Security category of its Technology Innovation Awards. The CloudPassage large-scale security program, Halo, won Best Security Solution in 2014 at the SIIA Codie awards.

    Read more →
  • Automated decision-making

    Automated decision-making

    Automated decision-making (ADM) is the use of data, machines and algorithms to make decisions in a range of contexts, including public administration, business, health, education, law, employment, transport, media and entertainment, with varying degrees of human oversight or intervention. ADM may involve large-scale data from a range of sources, such as databases, text, social media, sensors, images or speech, that is processed using various technologies including computer software, algorithms, machine learning, natural language processing, artificial intelligence, augmented intelligence and robotics. The increasing use of automated decision-making systems (ADMS) across a range of contexts presents many benefits and challenges to human society requiring consideration of the technical, legal, ethical, societal, educational, economic and health consequences. == Overview == There are different definitions of ADM based on the level of automation involved. Some definitions suggests ADM involves decisions made through purely technological means without human input, such as the EU's General Data Protection Regulation (Article 22). However, ADM technologies and applications can take many forms ranging from decision-support systems that make recommendations for human decision-makers to act on, sometimes known as augmented intelligence or 'shared decision-making', to fully automated decision-making processes that make decisions on behalf of individuals or organizations without human involvement. Models used in automated decision-making systems can be as simple as checklists and decision trees through to artificial intelligence and deep neural networks (DNN). Since the 1950s computers have gone from being able to do basic processing to having the capacity to undertake complex, ambiguous and highly skilled tasks such as image and speech recognition, gameplay, scientific and medical analysis and inferencing across multiple data sources. ADM is now being increasingly deployed across all sectors of society and many diverse domains from entertainment to transport. An ADM system (ADMS) may involve multiple decision points, data sets, and technologies (ADMT) and may sit within a larger administrative or technical system such as a criminal justice system or business process. == Data == Automated decision-making involves using data as input to be analyzed within a process, model, or algorithm or for learning and generating new models. ADM systems may use and connect a wide range of data types and sources depending on the goals and contexts of the system, for example, sensor data for self-driving cars and robotics, identity data for security systems, demographic and financial data for public administration, medical records in health, criminal records in law. This can sometimes involve vast amounts of data and computing power. === Data quality === The quality of the available data and its ability to be used in ADM systems is fundamental to the outcomes. It is often highly problematic for many reasons. Datasets are often highly variable; corporations or governments may control large-scale data, restricted for privacy or security reasons, incomplete, biased, limited in terms of time or coverage, measuring and describing terms in different ways, and many other issues. For machines to learn from data, large corpora are often required, which can be challenging to obtain or compute; however, where available, they have provided significant breakthroughs, for example, in diagnosing chest X-rays. == ADM technologies == Automated decision-making technologies (ADMT) are software-coded digital tools that automate the translation of input data to output data, contributing to the function of automated decision-making systems. There are a wide range of technologies in use across ADM applications and systems. ADMTs involving basic computational operations Search (includes 1-2-1, 1-2-many, data matching/merge) Matching (two different things) Mathematical Calculation (formula) ADMTs for assessment and grouping: User profiling Recommender systems Clustering Classification Feature learning Predictive analytics (includes forecasting) ADMTs relating to space and flows: Social network analysis (includes link prediction) Mapping Routing ADMTs for processing of complex data formats Image processing Audio processing Natural Language Processing (NLP) Other ADMT Business rules management systems Time series analysis Anomaly detection Modelling/Simulation === Machine learning === Machine learning (ML) involves training computer programs through exposure to large data sets and examples to learn from experience and solve problems. Machine learning can be used to generate and analyse data as well as make algorithmic calculations and has been applied to image and speech recognition, translations, text, data and simulations. While machine learning has been around for some time, it is becoming increasingly powerful due to recent breakthroughs in training deep neural networks (DNNs), and dramatic increases in data storage capacity and computational power with GPU coprocessors and cloud computing. Machine learning systems based on foundation models run on deep neural networks and use pattern matching to train a single huge system on large amounts of general data such as text and images. Early models tended to start from scratch for each new problem however since the early 2020s many are able to be adapted to new problems. Examples of these technologies include Open AI's DALL-E (an image creation program) and their various GPT language models, and Google's PaLM language model program. == Applications == ADM is being used to replace or augment human decision-making by both public and private-sector organisations for a range of reasons including to help increase consistency, improve efficiency, reduce costs and enable new solutions to complex problems. === Debate === Research and development are underway into uses of technology to assess argument quality, assess argumentative essays and judge debates. Potential applications of these argument technologies span education and society. Scenarios to consider, in these regards, include those involving the assessment and evaluation of conversational, mathematical, scientific, interpretive, legal, and political argumentation and debate. === Law === In legal systems around the world, algorithmic tools such as risk assessment instruments (RAI), are being used to supplement or replace the human judgment of judges, civil servants and police officers in many contexts. In the United States RAI are being used to generate scores to predict the risk of recidivism in pre-trial detention and sentencing decisions, evaluate parole for prisoners and to predict "hot spots" for future crime. These scores may result in automatic effects or may be used to inform decisions made by officials within the justice system. In Canada ADM has been used since 2014 to automate certain activities conducted by immigration officials and to support the evaluation of some immigrant and visitor applications. === Economics === Automated decision-making systems are used in certain computer programs to create buy and sell orders related to specific financial transactions and automatically submit the orders in the international markets. Computer programs can automatically generate orders based on predefined set of rules using trading strategies which are based on technical analyses, advanced statistical and mathematical computations, or inputs from other electronic sources. === Business === ==== Continuous auditing ==== Continuous auditing uses advanced analytical tools to automate auditing processes. It can be utilized in the private sector by business enterprises and in the public sector by governmental organizations and municipalities. As artificial intelligence and machine learning continue to advance, accountants and auditors may make use of increasingly sophisticated algorithms which make decisions such as those involving determining what is anomalous, whether to notify personnel, and how to prioritize those tasks assigned to personnel. === Media and entertainment === Digital media, entertainment platforms, and information services increasingly provide content to audiences via automated recommender systems based on demographic information, previous selections, collaborative filtering or content-based filtering. This includes music and video platforms, publishing, health information, product databases and search engines. Many recommender systems also provide some agency to users in accepting recommendations and incorporate data-driven algorithmic feedback loops based on the actions of the system user. Large-scale machine learning language models and image creation programs being developed by companies such as OpenAI and Google in the 2020s have restricted access however they are likely to have widespread application in fields such as advertising, copywriting, stock imagery and gra

    Read more →
  • Percept (artificial intelligence)

    Percept (artificial intelligence)

    A percept is the input that an intelligent agent is perceiving at any given moment. It is essentially the same concept as a percept in psychology, except that it is being perceived not by the brain but by the agent. A percept is detected by a sensor, often a camera, processed accordingly, and acted upon by an actuator. Each percept is added to a "percept sequence", which is a complete history of each percept ever detected. The agent's action at any instant point may depend on the entire percept sequence up to that particular instant point. An intelligent agent chooses how to act not only based on the current percept, but the percept sequence. The next action is chosen by the agent function, which maps every percept to an action. For example, if a camera were to record a gesture, the agent would process the percepts, calculate the corresponding spatial vectors, examine its percept history, and use the agent program (the application of the agent function) to act accordingly. == Examples == Examples of percepts include inputs from touch sensors, cameras, infrared sensors, sonar, microphones, mice, and keyboards. A percept can also be a higher-level feature of the data, such as lines, depth, objects, faces, or gestures.

    Read more →