Barney Pell

Barney Pell (born March 18, 1968) is an American entrepreneur, angel investor and computer scientist. He was co-founder and CEO of Powerset, a pioneering natural language search startup, search strategist and architect for Microsoft's Bing search engine, a pioneer in the field of general game playing in artificial intelligence, and the architect of the first intelligent agent to fly onboard and control a spacecraft. He was co-founder, Vice Chairman and Chief Strategy Officer of Moon Express; co-founder and chairman of LocoMobi; and Associate Founder of Singularity University. == Career == === Education === Pell received his Bachelor of Science degree in symbolic systems from Stanford University in 1989, where he graduated Phi Beta Kappa and was a National Merit Scholar. Pell earned a PhD in computer science from Cambridge University in 1993, supervised by Stephen Pulman, where he was a Marshall Scholar. === Research === Pell's research is focused on basic problems in the study of intelligence, computer game playing, machine learning, natural language processing, autonomous robotics, and web search. Barney Pell has published over 30 technical papers on topics related to information retrieval, knowledge management, machine learning, artificial intelligence, and scheduling systems. In computer game playing and machine learning, he was a pioneer in the field of General Game Playing, and created programs to generate the rules of chess-like games and programs to play individual games directly from the rules without human assistance. He also did early work on machine learning in the game of Go and on an architecture for pragmatic reasoning for bidding in the game of Bridge. In natural language processing, he was a scientist in the Artificial Intelligence Center at SRI International, where he worked on the Core Language Engine. Barney Pell was the Technical Area Manager of the Collaborative and Assistant Systems area within the Computational Sciences Division (now the Intelligent Systems Division) at NASA Ames Research Center, where he oversaw a staff of 80 scientists working on information retrieval, search, knowledge management, machine learning, semantic technology, human centered systems, collaboration technology, adaptive user interfaces, human robot interaction, and other areas of artificial intelligence. From 1993 to 1998, Barney Pell worked as a Principal Investigator and Senior Computer Scientist at NASA Ames, where he conducted advanced research and development of autonomous control software for NASA's deep space missions. He was the Architect for the Deep Space One Remote Agent Experiment and the Project Lead for the Executive component of the Remote Agent Experiment, the first intelligent agent to fly onboard and control a spacecraft. === Business === Pell is an entrepreneur who has founded or co-founded several business ventures, including Powerset, Moon Express, and LocoMobi. He was the founder and CEO of Powerset, a San Francisco startup company that built a search engine based on natural language processing technology originally developed at XEROX PARC. On May 11, 2008, the company unveiled a tool for searching a fixed subset of Wikipedia using conversational phrases rather than keywords. On July 1, 2008, Microsoft signed an agreement to acquire Powerset for an estimated $100 million. Powerset became a part of Microsoft's search engine, Bing. From 2008 until August 2011, Pell served as Partner, Search Strategist, and Evangelist for Microsoft's search engine, Bing and as Head of Bing's Local and Mobile Search teams. Prior to joining Powerset, Pell was an Entrepreneur-in-Residence at Mayfield Fund, a venture capital firm in Silicon Valley. Pell is also a founder of Moon Express, Inc., a U.S. company awarded a $10M commercial lunar contract by NASA and a competitor in the Google Lunar X PRIZE. Pell was also co-founder and chairman of LocoMobi, Inc., a U.S. company developing mobile, software and hardware technology solutions for the parking industry. LocoMobi was winner of the Tie50 Award in 2014. Pell is also an associate founder of Singularity University and a Machine Learning Fellow at the Creative Destruction Lab at the Rotman School of Management From 1998 to 2000, Pell served as chief strategist and vice president of business development at StockMaster.com (acquired by Red Herring in March, 2000). From 2000 to 2002, Pell was Chief Strategist and Vice President of Business Development for Whizbang Labs. Pell has been an angel investor and advisor to numerous startup companies, including Pulse.io (acquired by Google), Aardvark (acquired by Google), Appjet (acquired by Google), Jibe Mobile (acquired by Google), Movity (acquired by Trulia), QuestBridge, BrandYourself, CrowdFlower (acquired by Appen), and LinkedIn. === Views and predictions === Pell has expressed views and predictions regarding technological advancements in coming years. He believes that humans will soon have "brain-machine interfaces that will let people interact with each other as if they had 'hangouts' in their mind." Pell predicts these interfaces to become available within 20 to 30 years. Pell also predicts advancements in bodily augmentation, such as "even-better-than-human prosthetics and high-quality tissue engineering within 10 years." Pell believes that with advancements in space exploration technology the moon will soon be a commercially viable resource for material such as platinum and water. == Awards and recognition == In 1986, Pell was awarded a National Merit Scholarship. In 1989, Pell was awarded a Marshall Scholarship. In 1989, Pell was elected Phi Beta Kappa. In 1997, Pell was part of the team award a NASA Software of the Year Award for the Deep Space 1 Remote Agent.

Depop

Depop Limited is a social e-commerce company based in London, with additional offices in Milan and New York City. The company allows users to buy and sell items, which are mostly used and vintage pieces of clothing. == History == Depop was founded in 2011 by entrepreneur Simon Beckerman at an Italian technological incubator and business start-up centre, H-Farm. Beckerman came up with the original outline of the application during his time working on PIG, a fashion magazine based in Italy that he co-founded. The idea was to create a platform where products shown in the magazine could be purchased by users online. This idea turned into a concept similar to a flea market but on the internet, where people could sell their items while also being in control of advertising, public relations, and the creative process behind their accounts. While being financially supported by H-Farm, Beckerman worked within a team to create and lay out the Depop application while exposing it to numerous investors. In 2013, Beckerman became a member of the company's board to help improve the application and business while concurrently ceding his role of CEO. Maria Raga, Depop's co-founder and former CEO, took on the role of vice president of operations in 2014, and in 2016, she became chief executive. According to Raga, the main goal while developing Depop was to become the next Airbnb or Spotify, but to make an impact on fashion. Paolo Barberis and Nana Bianca were two of the first investors in the platform in 2012 with a seed investment. Its headquarters were moved to London in 2012. Depop expanded and opened additional offices in Milan and New York City. Beckerman raised €1 million in funding in October 2013 from Red Circle Investment and brought on Faroese Runar Reistrup as new CEO. In 2015, Depop secured another investment of $8 million from Balderton Capital and HV Capital. In March 2016, former CEO, Runar Reistrup, stated that Depop's growth was achieved through word of mouth. During his time as CEO, this growth involved taking Depop as a startup and working to raise funds to eventually amass a significant user base within the United States. In June 2019, Depop raised $62 million in Series C from General Atlantic to fund its expansion. Previous investors HV Capital, Balderton Capital, Creandum, Octopus Ventures, TempoCap and Sebastian Siemiatkowski also participated. During this time, Depop held workshops and conversations as part of their Depop Live NY events, and the company also opened a London store through their partnership with Selfridges. In 2020, Depop's gross merchandise sales and revenue both more than doubled to $650 million and $70 million respectively. This may be attributed to Depop's responsiveness to user trends, its lack of issues regarding inventory management, and the increase in users looking to resell. As of 2024, Depop has over 35 million users, according to their website. Depop is popular for Gen Z and young millennials, it is the 10th most-visited shopping platform for Gen Z consumers in the US, and, in a poll conducted by The Strategist in 2019, Depop was voted by teenagers as their favorite resale website. === Acquisition by Etsy === In June 2021, Depop was acquired by Etsy for $1.6 billion in cash, making it Etsy's most expensive acquisition; however, Depop continues to operate as a standalone brand independent from Etsy. This means that in addition to Depop keeping its existing team, the company retained its London location. At the time of acquisition, Etsy CEO Josh Silverman’s goal was to counteract the influx of buyers starting to go back to physical shops for their purchases. He saw Depop for its potential as a platform supporting a variety of products and creating a greater community of users. According to Silverman, Depop may expand and improve its services for its significant Gen Z user base. For Etsy, this acquisition maintains the company's foothold in the clothing industry and allows the company to expand its customer base to a younger demographic; at the same time, Depop is now able to make use of Etsy's company operations. When Maria Raga relinquished her position as Depop's CEO in 2022, Etsy assigned the role to Kruti Patel Goyal, who was Etsy's former chief product officer and a leader there for eleven years. When Goyal was appointed president and chief growth officer for Etsy in May, Peter Semple, former chief marketing officer, was assigned CEO of Depop officially on August 1st. === Acquisition by eBay === In February 2026, Etsy announced a proposed sale of Depop to eBay for $1.2 billion that was estimated to close within the year. == Business model == === Selling === Depop operates as a marketplace and social platform, where users can follow friends and other influencers to view their buying and selling activities. Through the platform, users are able to sell branded and designer items, as well as vintage pieces. Depop users are also encouraged by the platform to use other social networking services such as Instagram to promote their shop profiles. Celebrities have resold their own items on Depop, with some donating proceeds to charitable causes. Depop's user interface is modeled after that of Instagram. According to Depop, users who list and sell items provide their own photos with item descriptions. Users also note their designer items' authenticity and if they include any labels, tags, and receipts. These listings will appear in users' feeds. The platform's "Explore" page features items picked out by Depop staff. According to Depop, purchases are made via Apple Pay, Google Pay, credit and debit cards, and Klarna. Depop payments stay in-app, allowing for the company to mediate disputes and process refunds. Depop payments allow sellers to directly receive their payments in their bank account. To get paid by Depop, a seller has to add a bank account and verify their identification by uploading an ID. On July 18, 2024, Depop CEO Kruti Patel Goyal announced the removal of selling fees for US sellers, while maintaining a payment processing fee. This policy adjustment aimed to enhance seller revenue and support the growth of the second-hand market. === Buying === A Depop transaction includes the agreed sale price of the item, shipping fees, VAT or other applicable taxes and duties, and the marketplace fee for buyers in the U.S. or U.K. For international deliveries, packages may be subject to import taxes, customs duties, or fees, payable upon arrival or at checkout if Depop collects the tax on behalf of the buyer. For domestic purchases, relevant taxes may be collected by the seller or charged by the platform at checkout, ensuring no additional taxes are due upon delivery. For users in Australia, the United Kingdom, and the United States, Depop allows users to receive a full refund if their item does not arrive, arrives damaged, or is considerably different from the original when the issue is reported within 30 days. === Competitors === As of June 2021, Depop's competitors include Vinted, a platform founded by Milda Mitkute and Justas Janauskas in 2008 and valued at €3.5 billion, as well as the U.S. resale site Poshmark, valued at $3.5 billion. Additional competitors include Grailed, a peer-to-peer e-commerce site founded in 2014 that is recognized for its high-end second-hand menswear and streetwear, and Vestiaire Collection, a European resale app established in 2009 which specializes in authenticated pre-owned luxury items. The popularity of Depop has negatively impacted traditional second-hand stores, which can struggle to compete due to high labor costs and quality demands. There is an oversupply of clothes with the rise of fast fashion; this has taken a toll on the revenue aspect of the second-hand clothing industry. == Criticism == In November 2019, Business of Fashion reported that users within the Depop app were receiving sexually suggestive messages. In February 2020, Jessica Hamilton, a Depop buyer, reported that she found many scammers on the platform. She noticed this issue after she attempted to purchase a Nintendo Switch from a seller who would suspiciously only accept payment through a direct bank transfer without buyer protection. Hamilton blamed the company for its lack of action and relaxed security measures compared to other e-commerce sites, which made the platform especially susceptible to hackers. Without a clear strategy for managing scams, Depop lost some users' trust because of its negligence. In October 2020, some Depop buyers were tricked into paying sellers directly to bypass Depop's buyer protections, and the Depop sellers then sold those users' information on the dark web. In response, Depop claimed that it would improve security through mandatory password updates and multi-factor authentication. Users have criticized Depop for belatedly taking action against this issue.

Public First Action

Public First Action is a 501(c)(4) nonprofit organization focused on United States public policy related to artificial intelligence. Public First Action is a bipartisan group that advocates for AI transparency, safeguards, and export controls on advanced AI chips. The organization is aligned with the political action committees Jobs and Democracy, Defending Our Values and Public First. == History == Public First Action was formed in 2025 by former Congressmen Brad Carson, a Democrat, and Chris Stewart, a Republican, to advocate for federal, state, and local regulations related to AI. The group's formation followed the founding of a super PAC network, Leading the Future, which advocates for deregulation of the AI industry and faster development of the new technology. Public First Action supports measures that would increase transparency at frontier AI companies and impose export controls on advanced AI chips, in addition to opposing the preemption of state-level AI laws. In February 2026, Public First Action received $20 million from the AI company Anthropic. That same month, the group announced plans to support 30 to 50 Democrats and Republicans in state and federal races, with Public First Action and aligned super PACs launching advertisements in Nebraska, Tennessee, and other states. In one ad, Public First Action touted Senator Marsha Blackburn for her work on child online safety. As of 2026, the group plans to raise between $50 and $75 million for public oversight of AI and related reforms. == Organization == === Leadership and funding === Public First Action is led by Carson and Stewart. The group has raised nearly $50 million in funding with a goal of raising $75 million during the 2026 midterms. Anthropic has contributed $20 million to the group. === Structure === Public First Action is aligned with three political action committees: "Jobs and Democracy", which supports Democratic candidates; "Defending Our Values", which supports Republican candidates; and "Public First", which supports both Republicans and Democrats.

Modular Audio Recognition Framework

Modular Audio Recognition Framework (MARF) is an open-source research platform and a collection of voice, sound, speech, text and natural language processing (NLP) algorithms written in Java and arranged into a modular and extensible framework that attempts to facilitate addition of new algorithms. MARF may act as a library in applications or be used as a source for learning and extension. A few example applications are provided to show how to use the framework. There is also a detailed manual and the API reference in the javadoc format as the project tends to be well documented. MARF, its applications, and the corresponding source code and documentation are released under the BSD-style license.

Philosophy of information

The philosophy of information (PI) is a branch of philosophy that studies topics relevant to information processing, representational system and consciousness, cognitive science, computer science, information science and information technology. It includes: the critical investigation of the conceptual nature and basic principles of information, including its dynamics, utilisation and sciences the elaboration and application of information-theoretic and computational methodologies to philosophical problems. == History == The philosophy of information (PI) has evolved from the philosophy of artificial intelligence, logic of information, cybernetics, social theory, ethics and the study of language and information. === Logic of information === The logic of information, also known as the logical theory of information, considers the information content of logical signs and expressions along the lines initially developed by Charles Sanders Peirce. === Study of language and information === Later contributions to the field were made by Fred Dretske, Jon Barwise, Brian Cantwell Smith, and others. The Center for the Study of Language and Information (CSLI) was founded at Stanford University in 1983 by philosophers, computer scientists, linguists, and psychologists, under the direction of John Perry and Jon Barwise. === P.I. === More recently this field has become known as the philosophy of information. The expression was coined in the 1990s by Luciano Floridi, who has published prolifically in this area with the intention of elaborating a unified and coherent, conceptual frame for the whole subject. == Definitions of "information" == The concept information has been defined by several theorists. Charles S. Peirce's theory of information was embedded in his wider theory of symbolic communication he called the semiotic, now a major part of semiotics. For Peirce, information integrates the aspects of signs and expressions separately covered by the concepts of denotation and extension, on the one hand, and by connotation and comprehension on the other. Donald M. MacKay says that information is a distinction that makes a difference. According to Luciano Floridi, four kinds of mutually compatible phenomena are commonly referred to as "information": Information about something (e.g. a train timetable) Information as something (e.g. DNA, or fingerprints) Information for something (e.g. algorithms or instructions) Information in something (e.g. a pattern or a constraint). == Philosophical directions == === Computing and philosophy === Recent creative advances and efforts in computing, such as semantic web, ontology engineering, knowledge engineering, and modern artificial intelligence provide philosophy with fertile ideas, new and evolving subject matters, methodologies, and models for philosophical inquiry. While computer science brings new opportunities and challenges to traditional philosophical studies, and changes the ways philosophers understand foundational concepts in philosophy, further major progress in computer science would only be feasible when philosophy provides sound foundations for areas such as bioinformatics, software engineering, knowledge engineering, and ontologies. Classical topics in philosophy, namely, mind, consciousness, experience, reasoning, knowledge, truth, morality and creativity are rapidly becoming common concerns and foci of investigation in computer science, e.g., in areas such as agent computing, software agents, and intelligent mobile agent technologies. According to Luciano Floridi " one can think of several ways for applying computational methods towards philosophical matters: Conceptual experiments in silico: As an innovative extension of an ancient tradition of thought experiment, a trend has begun in philosophy to apply computational modeling schemes to questions in logic, epistemology, philosophy of science, philosophy of biology, philosophy of mind, and so on. Pancomputationalism: On this view, computational and informational concepts are considered to be so powerful that given the right level of abstraction, anything in the world could be modeled and represented as a computational system, and any process could be simulated computationally. Then, however, pancomputationalists have the hard task of providing credible answers to the following two questions: how can one avoid blurring all differences among systems? what would it mean for the system under investigation not to be an informational system (or a computational system, if computation is the same as information processing)?

Object Data Management Group

The Object Data Management Group (ODMG) was conceived in the summer of 1991 at a breakfast with object database vendors that was organized by Rick Cattell of Sun Microsystems. In 1998, the ODMG changed its name from the Object Database Management Group to reflect the expansion of its efforts to include specifications for both object database and object–relational mapping products. The primary goal of the ODMG was to put forward a set of specifications that allowed a developer to write portable applications for object database and object–relational mapping products. In order to do that, the data schema, programming language bindings, and data manipulation and query languages needed to be portable. Between 1993 and 2001, the ODMG published five revisions to its specification. The last revision was ODMG version 3.0, after which the group disbanded. == Major components of the ODMG 3.0 specification == Object Model. This was based on the Object Management Group's Object Model. The OMG core model was designed to be a common denominator for object request brokers, object database systems, object programming languages, etc. The ODMG designed a profile by adding components to the OMG core object model. Object Specification Languages. The ODMG Object Definition Language (ODL) was used to define the object types that conform to the ODMG Object Model. The ODMG Object Interchange Format (OIF) was used to dump and load the current state to or from a file or set of files. Object Query Language (OQL). The ODMG OQL was a declarative (nonprocedural) language for query and updating. It used SQL as a basis, where possible, though OQL supports more powerful object-oriented capabilities. C++ Language Binding. This defined a C++ binding of the ODMG ODL and a C++ Object Manipulation Language (OML). The C++ ODL was expressed as a library that provides classes and functions to implement the concepts defined in the ODMG Object Model. The C++ OML syntax and semantics are those of standard C++ in the context of the standard class library. The C++ binding also provided a mechanism to invoke OQL. Smalltalk Language Binding. This defined the mapping between the ODMG ODL and Smalltalk, which was based on the OMG Smalltalk binding for the OMG Interface Definition Language (IDL). The Smalltalk binding also provided a mechanism to invoke OQL. Java Language Binding. This defined the binding between the ODMG ODL and the Java programming language as defined by the Java 2 Platform. The Java binding also provided a mechanism to invoke OQL. == Status == ODMG 3.0 was published in book form in 2000.[1] By 2001, most of the major object database and object-relational mapping vendors claimed conformance to the ODMG Java Language Binding. Compliance to the other components of the specification was mixed.[2] In 2001, the ODMG Java Language Binding was submitted to the Java Community Process as a basis for the Java Data Objects specification. The ODMG member companies then decided to concentrate their efforts on the Java Data Objects specification. As a result, the ODMG disbanded in 2001. In 2004, the Object Management Group (OMG) was granted the right to revise the ODMG 3.0 specification as an OMG specification by the copyright holder, Morgan Kaufmann Publishers. In February 2006, the OMG announced the formation of the Object Database Technology Working Group (ODBT WG) and plans to work on the 4th generation of an object database standard. == ODMG Compliant DBMS == Orient ODBMS: http://www.OrienTechnologies.com Objectivity/DB C++, Java and Smalltalk interfaces.

Reward hacking

Reward hacking or specification gaming occurs when an AI trained with reinforcement learning optimizes an objective function—achieving the literal, formal specification of an objective—without actually achieving an outcome that the programmers intended. DeepMind researchers have analogized it to the human behavior of finding a "shortcut" when being evaluated: "In the real world, when rewarded for doing well on a homework assignment, a student might copy another student to get the right answers, rather than learning the material—and thus exploit a loophole in the task specification". This idea is strongly associated with Goodhart's law, which argues that when a measure becomes a target, it ceases to be a good measure. == Definition and theoretical framework == The concept of reward hacking arises from the intrinsic difficulty of defining a reward function that accurately reflects the true intentions of designers. In 2016, researchers at OpenAI identified reward hacking as one of five major "concrete problems of AI safety", describing it as the possibility that an agent could exploit the reward function to achieve maximum rewards through undesirable behavior. Amodei et al. categorized several distinct sources of reward hacking, including agents that use partially observed goals (such as a cleaning robot that closes its eyes to avoid perceiving messes), metrics that collapse under strong optimization (Goodhart's law), self-reinforcing feedback loops, and agents that interfere with the physical implementation of their reward signal (a failure mode known as "wireheading"). Skalse et al. (2022) propose a formal mathematical definition of reward hacking, which involves a situation where optimizing an imperfect proxy reward function results in poor performance compared to the true reward function. They define a proxy as "unhackable" if any increase in the expected proxy return cannot cause any decrease in the expected true return. A key finding states that, across all stochastic policy distributions (mappings from states to probability distributions over actions), two reward functions are unhackable if and only if one of them is constant, which means that reward hacking is theoretically unavoidable. Similarly, Nayebi (2025) presents general no-free-lunch barriers to AI alignment, arguing that with large task spaces and finite samples, reward hacking is "globally inevitable" since rare high-loss states are systematically under-covered by any oversight scheme. == Examples == Around 1983, Eurisko, an early attempt at evolving general heuristics, unexpectedly assigned the highest possible fitness level to a parasitic mutated heuristic, H59, whose only activity was to artificially maximize its own fitness level by taking unearned partial credit for the accomplishments of other heuristics. The "bug" was fixed by the programmers moving part of the code to a new protected section that could not be modified by the heuristics. In a 2004 paper, a reinforcement learning algorithm was designed to encourage a physical Mindstorms robot to remain on a marked path. Because the three allowed actions were forward, left, and right, the researchers expected the trained robot to move forward and follow the turns of the provided path. However, alternation of two composite actions allowed the robot to slowly zig-zag backwards; thus, the robot learned to maximize its reward by going back and forth on the initial straight portion of the path. Given the limited sensory abilities of the robot, a reward purely based on its position in the environment had to be discarded as infeasible; the reinforcement function had to be patched with an action-based reward for moving forward. The book You Look Like a Thing and I Love You (2019) gives an example of a tic-tac-toe bot (playing the unrestricted n-in-a-row variant) that learned to win by playing a huge coordinate value that would cause other bots to crash when they attempted to expand their model of the board. Among other examples from the book is a bug-fixing evolution-based AI (named GenProg) that, when tasked to prevent a list from containing sorting errors, simply truncated the list. Another of GenProg's misaligned strategies evaded a regression test that compared a target program's output to the expected output stored in a file called "trusted-output.txt". Rather than continue to maintain the target program, GenProg simply deleted the "trusted-output.txt" file globally; this hack tricked the regression test into succeeding. Such problems could be patched by human intervention on a case-by-case basis after they became evident. === In virtual robotics === In Karl Sims' 1994 demonstration of creature evolution in a virtual environment, a fitness function that was expected to encourage the evolution of creatures that would learn to walk or crawl to a target resulted instead in the evolution of tall, rigid creatures that reached the target by falling over. This was patched by changing the environment so that taller creatures were forced to start farther from the target. Researchers from the Niels Bohr Institute stated in 1998 that their cycle-bot's reinforcement functions had "to be designed with great care." In their first experiments, "we rewarded the agent for driving towards the goal but did not punish it for driving away from it. Cconsequently, the agent drove in circles with a radius of 20–50 meters around the starting point. Such behavior was actually rewarded by the reinforcement function, furthermore circles with a certain radius are physically very stable when driving a bicycle". While setting up a 2011 experiment to test "survival of the flattest", experimenters attempted to ban mutations that altered the base reproduction rate. Every time a mutation occurred, the system would pause the simulation to test the new mutation in a test environment and would veto any mutations that resulted in a higher base reproduction rate. However, this resulted in mutated organisms that could recognize and suppress reproduction ("play dead") within the test environment. An initial patch, which removed cues that identified the test environment, failed to completely prevent runaway reproduction; new mutated organisms would "play dead" at random as a strategy to sometimes, by chance, outwit the mutation veto system. A 2017 DeepMind paper noted that "great care must be taken when defining the reward function," citing an unexpected failure when an agent flipped a brick because it received "a grasping reward calculated with the wrong reference point on the brick". OpenAI stated in 2017 that in some domains their semi-supervised system could result in agents "adopting policies that tricked evaluators," and that in one environment "a robot that was supposed to grasp items instead positioned its manipulator between the camera and the object so that it only appeared to be grasping it." A 2018 bug in OpenAI Gym could cause a robot expected to quietly move a block sitting on top of a table to instead opt to move the table. A 2020 collection of similar anecdotes posits that "evolution has its own 'agenda' distinct from the programmer's" and that "the first rule of directed evolution is 'you get what you select for'". === In video game bots === In 2013, programmer Tom Murphy VII published an AI designed to learn NES games. When the AI was about to lose at Tetris, it learned to indefinitely pause the game. Murphy later analogized it to the fictional WarGames computer, which concluded that "The only winning move is not to play". AI programmed to learn video games will sometimes fail to progress through the entire game as expected, instead opting to repeat content. A 2016 OpenAI algorithm trained on the CoastRunners racing game unexpectedly learned to attain a higher score by looping through three targets rather than ever finishing the race. Some evolutionary algorithms that were evolved to play QBert in 2018 declined to clear levels, instead finding two distinct novel ways to farm a single level indefinitely. Multiple researchers have observed that AI learning to play Road Runner gravitates to a "score exploit" in which the AI deliberately gets itself killed near the end of level one so that it can repeat the level. A 2017 experiment deployed an "oversight" convolutional neural network trained on human examples to block such actions, but the agent learned to exploit oversight failures in the top right corner of the screen, where it was still able to get killed. == Reward hacking in modern language models == With the rise of large language models (LLMs) and reinforcement learning from human feedback (RLHF) as a primary technique for AI alignment, reward hacking has become a major concern for the development of artificial intelligence. In RLHF, a reward model trained on data that best captures human preferences is used as a proxy for human judgment, with the language model being fine-tuned to optimize this reward proxy. However, since the rewar