Robot learning

Robot learning

Robot learning is a research field at the intersection of machine learning and robotics. It studies techniques allowing a robot to acquire novel skills or adapt to its environment through learning algorithms. The embodiment of the robot, situated in a physical embedding, provides at the same time specific difficulties (e.g. high-dimensionality, real time constraints for collecting data and learning) and opportunities for guiding the learning process (e.g. sensorimotor synergies, motor primitives). Example of skills that are targeted by learning algorithms include sensorimotor skills such as locomotion, grasping, active object categorization, as well as interactive skills such as joint manipulation of an object with a human peer, and linguistic skills such as the grounded and situated meaning of human language. Learning can happen either through autonomous self-exploration or through guidance from a human teacher, like for example in robot learning by imitation. Robot learning can be closely related to adaptive control, reinforcement learning as well as developmental robotics which considers the problem of autonomous lifelong acquisition of repertoires of skills. While machine learning is frequently used by computer vision algorithms employed in the context of robotics, these applications are usually not referred to as "robot learning". == Imitation learning == Many research groups are developing techniques where robots learn by imitating. This includes various techniques for learning from demonstration (sometimes also referred to as "programming by demonstration") and observational learning. == Sharing learned skills and knowledge == In Tellex's "Million Object Challenge", the goal is robots that learn how to spot and handle simple items and upload their data to the cloud to allow other robots to analyze and use the information. RoboBrain is a knowledge engine for robots which can be freely accessed by any device wishing to carry out a task. The database gathers new information about tasks as robots perform them, by searching the Internet, interpreting natural language text, images, and videos, object recognition as well as interaction. The project is led by Ashutosh Saxena at Stanford University. RoboEarth is a project that has been described as a "World Wide Web for robots" − it is a network and database repository where robots can share information and learn from each other and a cloud for outsourcing heavy computation tasks. The project brings together researchers from five major universities in Germany, the Netherlands and Spain and is backed by the European Union. Google Research, DeepMind, and Google X have decided to allow their robots share their experiences. == Vision-language-action model == Research groups and companies are developing vision-language-action models, foundation models that allow robotic control through the combination of vision and language. Google DeepMind, Figure AI and Hugging Face are actively working on that.

Semantic interpretation

Semantic interpretation is an important component in dialog systems. It is related to natural language understanding, but mostly it refers to the last stage of understanding. The goal of interpretation is binding the user utterance to concept, or something the system can understand. Typically it is creating a database query based on user utterance.

Artificial intelligence and elections

As artificial intelligence (AI) has become more mainstream, there is growing concern about how this will influence elections. Potential targets of AI include election processes, election offices, election officials and election vendors. There are also global efforts to improve elections using AI. == Tactics == Generative AI capabilities allow creation of misleading content. Examples of this include text-to-video, deepfake videos, text-to-image, AI-altered images, text-to-speech, voice cloning, and text-to-text. In the context of an election, a deepfake video of a candidate may propagate information that the candidate does not endorse. Chatbots could spread misinformation related to election locations, times or voting methods. In contrast to malicious actors in the past, these techniques require little technical skill and can spread rapidly. LLM-generated messages have the capacity to persuade humans on political issues. Researchers have begun to investigate how people rate messages that LLMs generate for how persuasive they are. When it came to policy issues, the LLM-generated messages received a 2.91 compared to a 2.80 when it came to smartness between the AI and humans. The LLM-generated messages were often more technical and analytical than human-generated messages. Generative AI has been used to micro-target people during tight political elections. The generation of targeted large language models has triggered concern that they will be used to leverage readily scale microtargeting. Rephrasing inputs have been used to generate fraudulent emails and phishing websites. Rephrasing inputs in a microtargeting does not violate the terms of OpenAI usage. There are no safeguards to prevent the use of rephrasing and creation of fraudulent emails. Political campaign managers have access to this allowing for them to create targeted content. == Usage by country == === Argentina === ==== 2023 elections ==== During the 2023 Argentine primary elections, Javier Milei's team distributed AI generated images including a fabricated image of his rival Sergio Massa and drew 3 million views. The team also created an unofficial Instagram account entitled "AI for the Homeland." Sergio Massa's team also distributed AI generated images and videos. === Bangladesh === ==== 2024 elections ==== In the run up to the 2024 Bangladeshi general election, deepfake videos of female opposition politicians appeared. Rumin Farhana was pictured in a bikini while Nipun Ray was shown in a swimming pool. === Canada === ==== 2025 elections ==== In the run up to the 2025 Canadian federal election, the use of AI tools is likely to figure prominently. India, Pakistan and Iran are all expected to make efforts to subvert the national vote using disinformation campaigns to deceive voters and sway diaspora communities. In a report by the Canadian Centre for Cyber Security called "Cyber Threats to Canada's Democratic Process: 2025 Update", it states that malicious actors including China and Russia: "are most likely to use generative AI as a means of creating and spreading disinformation, designed to sow division among Canadians and push narratives conducive to the interests of foreign states". === France === ==== 2024 elections ==== In the 2024 French legislative election, deepfake videos appeared claiming: i) That they showed the family of Marine le Pen. In the videos, young women, supposedly Le Pen's nieces, are seen skiing, dancing and at the beach "while making fun of France’s racial minorities": However, the family members don't exist. On social media there were over 2 million views. ii) In a video seen on social media, a deepfake video of a France24 broadcast appeared to report that the Ukrainian leadership had "tried to lure French president Emmanuel Macron to Ukraine to assassinate him and then blame his death on Russia". === Ghana === ==== 2024 elections ==== During the months before the December 2024 Ghanaian general election, a network of at least 171 fake accounts has been used to spam social media. Posts have been used by a group identified as "@TheTPatriots" to promote the New Patriotic Party, although it is not known whether the two are connected. All the networks' posts were "highly likely" to have been generated by ChatGPT and appear to be the "first secretly partisan network using AI to influence elections in Ghana". The opposition National Democratic Congress was also criticized with its leader John Mahama being called a drunkard. === India === ==== 2024 elections ==== In the 2024 Indian general election, politicians used deepfakes in their campaign materials. These deepfakes included politicians who had died prior to the election. Mathuvel Karunanidhi's party posted with his likeness even though he had died 2018. A video The All-India Anna Dravidian Progressive Federation party posted showed an audio clip of Jayaram Jayalalithaa even though she had died in 2016. The Deepfakes Analysis Unit (DAU) is an open source platform created in March 2024 for the public to share misleading content and assess if it had been AI-generated. AI was also used to translate political speeches in real time. This translating ability was widely used to reach more voters. === Indonesia === ==== 2024 elections ==== In the 2024 Indonesian presidential election, Prabowo Subianto made extensive use of AI-generated art in his campaign, which ranged from images of himself as an adorable child to various child portrayals in his advertisements. The Indonesian Children's Protection Commission condemned these ads, labeling them as a form of misuse. Other candidates, Anies Baswedan and Ganjar Pranowo, also incorporated AI art into their campaigns. Throughout the election period, all presidential candidates faced attacks from deepfakes, both in video and audio formats. === Ireland === ==== 2024 elections ==== In the last weeks of the 2024 Irish general election a spoof election poster appeared in Dublin featuring "an AI-generated candidate with three arms". The candidate is called Aidan Irwin, but no-one stood in the election with that name. A slogan on the poster says "put matters into artificial intelligence’s hands". The convincing election poster shows a man that "has six fingers on one hand, three arms, and a distorted thumb". === New Zealand === ==== 2023 elections ==== In May 2023, ahead of the 2023 New Zealand general election in October 2023, the New Zealand National Party published a "series of AI-generated political advertisements" on its Instagram account. After confirming that the images were faked, a party spokesperson said that it was "an innovative way to drive our social media". === Pakistan === ==== 2024 elections ==== AI has been used by the imprisoned ex-Prime Minister Imran Khan and his media team in the 2024 Pakistani general election: i) An AI generated audio of his voice was added to a video clip and was broadcast at a virtual rally. ii) An op-ed in The Economist written by Khan was later claimed by himself to have been written by AI which was later denied by his team. The article was liked and shared on social media by thousands of users. === South Africa === ==== 2024 elections ==== In the 2024 South African general election, there were several uses of AI content: i) A deepfaked video of Joe Biden emerged on social media showing him saying that "The U.S. would place sanctions on SA and declare it an enemy state if the African National Congress (ANC) won". ii) In a deepfake video, Donald Trump was shown endorsing the uMkhonto weSizwe party. It was posted to social media and was viewed more than 158,000 times. iii) Less than 3 months before the elections, a deepfake video showed U.S. rapper Eminem endorsing the Economic Freedom Fighters party while criticizing the ANC. The deepfake was viewed on social media more than 173,000 times. === South Korea === ==== 2022 elections ==== In the 2022 South Korean presidential election, a committee for one presidential candidate Yoon Suk Yeol released an AI avatar 'Al Yoon Seok-yeol' that would campaign in places the candidate could not go. The other presidential candidate Lee Jae-myung introduced a chatbot that provided information about the candidate's pledges. ==== 2024 elections ==== Deepfakes were used to spread misinformation before the 2024 South Korean legislative election with one source reporting 129 deepfake violations of election laws within a two week period. Seoul hosted the 2024 Summit for Democracy, a virtual gathering of world leaders initiated by US President Joe Biden in 2021. The focus of the summit was on digital threats to democracy including artificial intelligence and deepfakes. === Taiwan === ==== 2024 elections ==== AI-generated content was used during the 2024 Taiwanese presidential election. Among the media were: i) A deepfake video of General Secretary of the Chinese Communist Party Xi Jinping which showed him supporting the presidential elections. Created on social media, the video was "widely circulated

PagedAttention

PagedAttention is an attention algorithm for efficient serving of large language models (LLMs). It was introduced in 2023 by Woosuk Kwon and colleagues in the paper Efficient Memory Management for Large Language Model Serving with PagedAttention, alongside the vLLM serving engine. The method stores the key–value cache used during autoregressive decoding in fixed-size blocks that can be mapped to non-contiguous physical memory, borrowing ideas from virtual memory, paging, and operating system design. == Background == In transformer inference, the key–value cache grows with sequence length and the number of concurrent requests. Kwon et al. argued that earlier serving systems typically reserved contiguous cache regions in advance, which caused reserved space, internal fragmentation, and external fragmentation. In their experiments, the paper reported that the effective memory utilization of previous systems could fall as low as 20.4%. == Description == PagedAttention partitions the cache of each sequence into fixed-size KV blocks. A request's cache is represented as a sequence of logical blocks, while a block table maps those logical blocks to physical GPU-memory blocks. As a result, neighboring logical blocks do not need to be contiguous in physical memory, and new blocks can be allocated on demand as generation proceeds. The design also makes it easier to share cache state across related decoding paths. In vLLM, physical blocks can be reference-counted and shared among requests or branches, with block-granularity copy-on-write used when a shared block must be modified. The original paper applied this design to parallel sampling, beam search, and prompts with shared prefixes. == Mathematical formulation == For a query token i {\displaystyle i} in causal self-attention, the standard attention output can be written as a i j = exp ⁡ ( q i ⊤ k j / d ) ∑ t = 1 i exp ⁡ ( q i ⊤ k t / d ) , o i = ∑ j = 1 i a i j v j {\displaystyle a_{ij}={\frac {\exp(\mathbf {q} _{i}^{\top }\mathbf {k} _{j}/{\sqrt {d}})}{\sum _{t=1}^{i}\exp(\mathbf {q} _{i}^{\top }\mathbf {k} _{t}/{\sqrt {d}})}},\;\mathbf {o} _{i}=\sum _{j=1}^{i}a_{ij}\mathbf {v} _{j}} where q i {\displaystyle \mathbf {q} _{i}} , k j {\displaystyle \mathbf {k} _{j}} , and v j {\displaystyle \mathbf {v} _{j}} are the query, key, and value vectors, and d {\displaystyle d} is the attention dimension. If the cache is partitioned into blocks of size B {\displaystyle B} , the key and value blocks may be written as K j = ( k ( j − 1 ) B + 1 , … , k j B ) , V j = ( v ( j − 1 ) B + 1 , … , v j B ) {\displaystyle \mathbf {K} _{j}=(\mathbf {k} _{(j-1)B+1},\ldots ,\mathbf {k} _{jB}),\;\mathbf {V} _{j}=(\mathbf {v} _{(j-1)B+1},\ldots ,\mathbf {v} _{jB})} PagedAttention then performs the computation blockwise: A i j = exp ⁡ ( q i ⊤ K j / d ) ∑ t = 1 ⌈ i / B ⌉ exp ⁡ ( q i ⊤ K t / d ) , o i = ∑ j = 1 ⌈ i / B ⌉ V j A i j ⊤ {\displaystyle \mathbf {A} _{ij}={\frac {\exp(\mathbf {q} _{i}^{\top }\mathbf {K} _{j}/{\sqrt {d}})}{\sum _{t=1}^{\lceil i/B\rceil }\exp(\mathbf {q} _{i}^{\top }\mathbf {K} _{t}/{\sqrt {d}})}},\;\mathbf {o} _{i}=\sum _{j=1}^{\lceil i/B\rceil }\mathbf {V} _{j}\mathbf {A} _{ij}^{\top }} where A i j {\displaystyle \mathbf {A} _{ij}} is the vector of attention scores for the j {\displaystyle j} -th KV block. In the formulation given by Kwon et al., this preserves the causal attention calculation while allowing the key and value blocks to reside in non-contiguous physical memory. == Performance and use == The vLLM paper reported that, on its evaluated workloads, the use of PagedAttention and the associated memory-management design improved serving throughput by 2–4× over the compared baselines, including FasterTransformer and Orca, while preserving model outputs. In experiments on OPT-13B with the Alpaca trace, the paper also reported memory savings of 6.1–9.8% for parallel sampling and 37.6–55.2% for beam search through KV-block sharing. A 2024 survey of LLM serving systems described PagedAttention as having become an industry norm in LLM serving frameworks, citing support in TGI, vLLM, and TensorRT-LLM. == Limitations and alternatives == Subsequent work has described trade-offs in the approach. The 2025 vAttention paper argued that PagedAttention requires attention kernels to be rewritten to support paging and increases software complexity, portability issues, redundancy, and execution overhead, proposing instead a memory manager that keeps the cache contiguous in virtual memory while relying on demand paging for physical allocation. === vAttention === Unlike PagedAttention, vAttention does not introduce a different attention rule; it retains the standard attention computation Attention ⁡ ( q i , K , V ) = softmax ⁡ ( q i K ⊤ s c a l e ) V . {\displaystyle \operatorname {Attention} (q_{i},K,V)=\operatorname {softmax} \left({\frac {q_{i}K^{\top }}{\mathrm {scale} }}\right)V.} In the notation of Prabhu et al., the key and value tensors for a request seen so far are K , V ∈ R L ′ × ( H × D ) {\displaystyle K,V\in \mathbb {R} ^{L'\times (H\times D)}} , where L ′ {\displaystyle L'} is the context length seen so far, H {\displaystyle H} is the number of KV heads on a worker, and D {\displaystyle D} is the dimension of each KV head. In systems prior to PagedAttention, the K cache (or V cache) at each layer of a worker is typically allocated as a 4D tensor of shape [ B , L , H , D ] , {\displaystyle [B,L,H,D],} where B {\displaystyle B} is batch size and L {\displaystyle L} is the maximum context length supported by the model. vAttention preserves this contiguous virtual-memory view while deferring physical-memory allocation to runtime. A serving framework maintains separate K and V tensors for each layer, so vAttention reserves 2 N {\displaystyle 2N} virtual-memory buffers on a worker, where N {\displaystyle N} is the number of layers managed by that worker. The maximum size of one virtual-memory buffer is B S = B × S , {\displaystyle BS=B\times S,} where S {\displaystyle S} is the maximum size of a single request's per-layer K cache (or V cache) on a worker. The paper defines S = L × H × D × P , {\displaystyle S=L\times H\times D\times P,} where P {\displaystyle P} is the number of bytes needed to store one element. In this formulation, vAttention keeps the KV cache contiguous in virtual memory and relies on demand paging for physical allocation, rather than modifying the attention kernel to operate over non-contiguous KV-cache blocks.

Inductive programming

Inductive programming (IP) is a special area of automatic programming, covering research from artificial intelligence and programming, which addresses learning of typically declarative (logic or functional) and often recursive programs from incomplete specifications, such as input/output examples or constraints. Depending on the programming language used, there are several kinds of inductive programming. Inductive functional programming, which uses functional programming languages such as Lisp or Haskell, and most especially inductive logic programming, which uses logic programming languages such as Prolog and other logical representations such as description logics, have been more prominent, but other (programming) language paradigms have also been used, such as constraint programming or probabilistic programming. == Definition == Inductive programming incorporates all approaches which are concerned with learning programs or algorithms from incomplete (formal) specifications. Possible inputs in an IP system are a set of training inputs and corresponding outputs or an output evaluation function, describing the desired behavior of the intended program, traces or action sequences which describe the process of calculating specific outputs, constraints for the program to be induced concerning its time efficiency or its complexity, various kinds of background knowledge such as standard data types, predefined functions to be used, program schemes or templates describing the data flow of the intended program, heuristics for guiding the search for a solution or other biases. Output of an IP system is a program in some arbitrary programming language containing conditionals and loop or recursive control structures, or any other kind of Turing-complete representation language. In many applications the output program must be correct with respect to the examples and partial specification, and this leads to the consideration of inductive programming as a special area inside automatic programming or program synthesis, usually opposed to 'deductive' program synthesis, where the specification is usually complete. In other cases, inductive programming is seen as a more general area where any declarative programming or representation language can be used and we may even have some degree of error in the examples, as in general machine learning, the more specific area of structure mining or the area of symbolic artificial intelligence. A distinctive feature is the number of examples or partial specification needed. Typically, inductive programming techniques can learn from just a few examples. The diversity of inductive programming usually comes from the applications and the languages that are used: apart from logic programming and functional programming, other programming paradigms and representation languages have been used or suggested in inductive programming, such as functional logic programming, constraint programming, probabilistic programming, abductive logic programming, modal logic, action languages, agent languages and many types of imperative languages. == History == The early works of Plotkin, and his "relative least general generalization (rlgg)", had an enormous impact in inductive logic programming. There were some encouraging results on learning recursive Prolog programs such as quicksort from examples together with suitable background knowledge, for example with GOLEM. However, after initial success, the community got disappointed by limited progress about the induction of recursive programs with ILP less and less focusing on recursive programs and leaning more and more towards a machine learning setting with applications in relational data mining and knowledge discovery. In parallel to work in ILP, Koza proposed genetic programming in the early 1990s as a generate-and-test based approach to learning programs. The idea of genetic programming was further developed into the inductive programming system ADATE and the systematic-search-based system MagicHaskeller. Here again, functional programs are learned from sets of positive examples together with an output evaluation (fitness) function which specifies the desired input/output behavior of the program to be learned. The early work in grammar induction (also known as grammatical inference) is related to inductive programming, as rewriting systems or logic programs can be used to represent production rules. In fact, early works in inductive inference considered grammar induction and Lisp program inference as basically the same problem. The results in terms of learnability were related to classical concepts, such as identification-in-the-limit, as introduced in the seminal work of Gold. More recently, the language learning problem was addressed by the inductive programming community. In the recent years, the classical approaches have been resumed and advanced with great success. Therefore, the synthesis problem has been reformulated on the background of constructor-based term rewriting systems taking into account modern techniques of functional programming, as well as moderate use of search-based strategies and usage of background knowledge as well as automatic invention of subprograms. Many new and successful applications have recently appeared beyond program synthesis, most especially in the area of data manipulation, programming by example and cognitive modelling (see below). Other ideas have also been explored with the common characteristic of using declarative languages for the representation of hypotheses. For instance, the use of higher-order features, schemes or structured distances have been advocated for a better handling of recursive data types and structures; abstraction has also been explored as a more powerful approach to cumulative learning and function invention. One powerful paradigm that has been recently used for the representation of hypotheses in inductive programming (generally in the form of generative models) is probabilistic programming (and related paradigms, such as stochastic logic programs and Bayesian logic programming). == Application areas == The first workshop on Approaches and Applications of Inductive Programming (AAIP) Archived 2016-03-03 at the Wayback Machine held in conjunction with ICML 2005 identified all applications where "learning of programs or recursive rules are called for, [...] first in the domain of software engineering where structural learning, software assistants and software agents can help to relieve programmers from routine tasks, give programming support for end users, or support of novice programmers and programming tutor systems. Further areas of application are language learning, learning recursive control rules for AI-planning, learning recursive concepts in web-mining or for data-format transformations". Since then, these and many other areas have shown to be successful application niches for inductive programming, such as end-user programming, the related areas of programming by example and programming by demonstration, and intelligent tutoring systems. Other areas where inductive inference has been recently applied are knowledge acquisition, artificial general intelligence, reinforcement learning and theory evaluation, and cognitive science in general. There may also be prospective applications in intelligent agents, games, robotics, personalisation, ambient intelligence and human interfaces.

Easyrec

easyrec is an open-source program that provides personalized recommendations using RESTful Web services to be integrated into Web enabled applications. It is distributed under the GNU General Public License by the Studio Smart Agent Technologies and hosted at SourceForge. It is written in Java, uses a MySQL database and comes with an administration tool. == History == The development of easyrec, an implementation of the Adaptive Personalization approach, started in the course of several research and development projects conducted by the Studio Smart Agent Technologies in close cooperation with international companies. During the year of 2008 the core functionality of easyrec was developed forming the basis of research prototypes focusing on the music domain (e.g. MusicExplorer). In June 2009 a beta version of easyrec, containing basic administration features, was integrated into a movie streaming portal for evaluation purposes. Furthermore, in September 2009 easyrec was awarded a special recognition in the category “Award for Innovations – IT Innovations for an economic upswing” by the jury of the Austrian state prize for multimedia and e-business. After a comprehensive refactoring phase and the integration of the evaluation results easyrec was published on SourceForge on 18 February 2010. In course of the CeBIT tradeshow 2011 in Hanover easyrec has been awarded the German “INNOVATIONSPREIS-IT 2011”. == Principles == The following five primary goals guided the development of easyrec. It should be a ready-to-use application, not another algorithmic framework It should be easy to use, concerning installation, integration and administration It should be robust and scalable for serving real world applications It should be free of charge, so that anyone can profit from personalization features It should rely on a community-driven development == Uses == Although easyrec is a domain-agnostic, general purpose personalization system, the current Web service API is customized for providing online shops with item recommendations. Especially for small and medium enterprises, easyrec provides a low barrier entrance to personalization. == Features == A major feature of easyrec is a set of usage statistics and other business relevant information presented via an administration and management interface. Furthermore, the easyrec administrator is supported by a variety of administration and configuration functions including the manual import or adaptation of business rules. Integrators or developers benefit from the lightweight Web service APIs (REST and SOAP) as well as from the guided installation wizard. Concerning personalization functionality easyrec is providing the following services unpersonalized recommendations of the form "other users also bought/viewed/...", etc. personalized recommendation depending on individual preferences rankings such as "most bought items", "most viewed...", etc. Additionally, as an integration showcase, a MediaWiki extension was developed and is bundled with the application. Currently additional features like further recommender algorithms and a plugin-system are evaluated and prepared for integration into the easyrec system. == Architecture == The underlying architecture of easyrec is designed to be robust and scalable—separating time-consuming computations from the task of online assembling of recommendations. easyrec is designed as a multi-layer system consisting of a database layer as storage of user actions and pre-calculated business rules an application layer for hosting online and offline recommendation services and an API layer for various Web service interfaces. Moreover, the generator server contains different item association generators which create business rules that define a relation between two items.

Dataset shift

Dataset shift is a phenomenon in machine learning and statistics in which the joint distribution of input variables and target labels is different in the training phase and the deployment or test phase (i.e., P t r a i n ( X , Y ) ≠ P t e s t ( X , Y ) {\displaystyle P_{train}(X,Y)\neq P_{test}(X,Y)} ). This happens when the statistical properties of data used to train a model are no longer representative of the data encountered in real-world use, often resulting in degraded predictive performance and diminished generalization ability. Dataset shift is a generic term for a number of particular types of distributional change. Covariate shift is when the distribution of the input features changes, but the conditional relationship between inputs and outputs remains constant . Prior probability shift (or label shift) happens when the distribution of target labels changes, but the conditional distribution of inputs given labels stays the same. Concept shift (also known as concept drift) is the change of the conditional relationship between inputs and outputs that renders previously learned patterns invalid over time. A key challenge for deploying machine learning systems is dataset shift, in particular in dynamic environments where the data distributions change over time. Detecting and mitigating such shifts is an active area of research, e.g., drift detection, domain adaptation, continual learning.