AI For Students Writing

AI For Students Writing — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Object co-segmentation

    Object co-segmentation

    In computer vision, object co-segmentation is a special case of image segmentation, which is defined as jointly segmenting semantically similar objects in multiple images or video frames. == Challenges == It is often challenging to extract segmentation masks of a target/object from a noisy collection of images or video frames, which involves object discovery coupled with segmentation. A noisy collection implies that the object/target is present sporadically in a set of images or the object/target disappears intermittently throughout the video of interest. Early methods typically involve mid-level representations such as object proposals. == Dynamic Markov networks-based methods == A joint object discover and co-segmentation method based on coupled dynamic Markov networks has been proposed recently, which claims significant improvements in robustness against irrelevant/noisy video frames. Unlike previous efforts which conveniently assumes the consistent presence of the target objects throughout the input video, this coupled dual dynamic Markov network based algorithm simultaneously carries out both the detection and segmentation tasks with two respective Markov networks jointly updated via belief propagation. Specifically, the Markov network responsible for segmentation is initialized with superpixels and provides information for its Markov counterpart responsible for the object detection task. Conversely, the Markov network responsible for detection builds the object proposal graph with inputs including the spatio-temporal segmentation tubes. == Graph cut-based methods == Graph cut optimization is a popular tool in computer vision, especially in earlier image segmentation applications. As an extension of regular graph cuts, multi-level hypergraph cut is proposed to account for more complex high order correspondences among video groups beyond typical pairwise correlations. With such hypergraph extension, multiple modalities of correspondences, including low-level appearance, saliency, coherent motion and high level features such as object regions, could be seamlessly incorporated in the hyperedge computation. In addition, as a core advantage over co-occurrence based approach, hypergraph implicitly retains more complex correspondences among its vertices, with the hyperedge weights conveniently computed by eigenvalue decomposition of Laplacian matrices. == CNN/LSTM-based methods == In action localization applications, object co-segmentation is also implemented as the segment-tube spatio-temporal detector. Inspired by the recent spatio-temporal action localization efforts with tubelets (sequences of bounding boxes), Le et al. present a new spatio-temporal action localization detector Segment-tube, which consists of sequences of per-frame segmentation masks. This Segment-tube detector can temporally pinpoint the starting/ending frame of each action category in the presence of preceding/subsequent interference actions in untrimmed videos. Simultaneously, the Segment-tube detector produces per-frame segmentation masks instead of bounding boxes, offering superior spatial accuracy to tubelets. This is achieved by alternating iterative optimization between temporal action localization and spatial action segmentation. The proposed segment-tube detector is illustrated in the flowchart on the right. The sample input is an untrimmed video containing all frames in a pair figure skating video, with only a portion of these frames belonging to a relevant category (e.g., the DeathSpirals). Initialized with saliency based image segmentation on individual frames, this method first performs temporal action localization step with a cascaded 3D CNN and LSTM, and pinpoints the starting frame and the ending frame of a target action with a coarse-to-fine strategy. Subsequently, the segment-tube detector refines per-frame spatial segmentation with graph cut by focusing on relevant frames identified by the temporal action localization step. The optimization alternates between the temporal action localization and spatial action segmentation in an iterative manner. Upon practical convergence, the final spatio-temporal action localization results are obtained in the format of a sequence of per-frame segmentation masks (bottom row in the flowchart) with precise starting/ending frames.

    Read more →
  • Embodied cognitive science

    Embodied cognitive science

    Embodied cognitive science is an interdisciplinary field of research, the aim of which is to explain the mechanisms underlying intelligent behavior. It comprises three main methodologies: the modeling of psychological and biological systems in a holistic manner that considers the mind and body as a single entity; the formation of a common set of general principles of intelligent behavior; and the experimental use of robotic agents in controlled environments. == Contributors == Embodied cognitive science borrows heavily from embodied philosophy and the related research fields of cognitive science, psychology, neuroscience and artificial intelligence. Contributors to the field include: From the perspective of neuroscience, Gerald Edelman of the Neurosciences Institute at La Jolla, Francisco Varela of CNRS in France, and J. A. Scott Kelso of Florida Atlantic University From the perspective of psychology, Lawrence Barsalou, Michael Turvey, Vittorio Guidano and Eleanor Rosch From the perspective of linguistics, Gilles Fauconnier, George Lakoff, Mark Johnson, Leonard Talmy and Mark Turner From the perspective of language acquisition, Eric Lenneberg and Philip Rubin at Haskins Laboratories From the perspective of anthropology, Edwin Hutchins, Bradd Shore, James Wertsch and Merlin Donald. From the perspective of autonomous agent design, early work is sometimes attributed to Rodney Brooks or Valentino Braitenberg From the perspective of artificial intelligence, Understanding Intelligence by Rolf Pfeifer and Christian Scheier or How the Body Shapes the Way We Think, by Rolf Pfeifer and Josh C. Bongard From the perspective of philosophy, Andy Clark, Dan Zahavi, Shaun Gallagher, and Evan Thompson In 1950, Alan Turing proposed that a machine may need a human-like body to think and speak: It can also be maintained that it is best to provide the machine with the best sense organs that money can buy, and then teach it to understand and speak English. That process could follow the normal teaching of a child. Things would be pointed out and named, etc. Again, I do not know what the right answer is, but I think both approaches should be tried. == Traditional cognitive theory == Embodied cognitive science is an alternative theory to cognition in which it minimizes appeals to computational theory of mind in favor of greater emphasis on how an organism's body determines how and what it thinks. Traditional cognitive theory is based mainly around symbol manipulation, in which certain inputs are fed into a processing unit that produces an output. These inputs follow certain rules of syntax, from which the processing unit finds semantic meaning. Thus, an appropriate output is produced. For example, a human's sensory organs are its input devices, and the stimuli obtained from the external environment are fed into the nervous system which serves as the processing unit. From here, the nervous system is able to read the sensory information because it follows a syntactic structure, thus an output is created. This output then creates bodily motions and brings forth behavior and cognition. Of particular note is that cognition is sealed away in the brain, meaning that mental cognition is cut off from the external world and is only possible by the input of sensory information. == The embodied cognitive approach == Embodied cognitive science differs from the traditionalist approach in that it denies the input-output system. This is chiefly due to the problems presented by the Homunculus argument, which concluded that semantic meaning could not be derived from symbols without some kind of inner interpretation. If some little man in a person's head interpreted incoming symbols, then who would interpret the little man's inputs? Because of the specter of an infinite regress, the traditionalist model began to seem less plausible. Thus, embodied cognitive science aims to avoid this problem by defining cognition in three ways. === Physical attributes of the body === The first aspect of embodied cognition examines the role of the physical body, particularly how its properties affect its ability to think. This part attempts to overcome the symbol manipulation component that is a feature of the traditionalist model. Depth perception, for instance, can be better explained under the embodied approach due to the sheer complexity of the action. Depth perception requires that the brain detect the disparate retinal images obtained by the distance of the two eyes. In addition, body and head cues complicate this further. When the head is turned in a given direction, objects in the foreground will appear to move against objects in the background. From this, it is said that some kind of visual processing is occurring without the need of any kind of symbol manipulation. This is because the objects appearing to move the foreground are simply appearing to move. This observation concludes then that depth can be perceived with no intermediate symbol manipulation necessary. A more poignant example exists through examining auditory perception. Generally speaking the greater the distance between the ears, the greater the possible auditory acuity. Also relevant is the amount of density in between the ears, for the strength of the frequency wave alters as it passes through a given medium. The brain's auditory system takes these factors into account as it process information, but again without any need for a symbolic manipulation system. This is because the distance between the ears for example does not need symbols to represent it. The distance itself creates the necessary opportunity for greater auditory acuity. The amount of density between the ears is similar, in that it is the actual amount itself that simply forms the opportunity for frequency alteration. Thus under consideration of the physical properties of the body, a symbolic system is unnecessary and an unhelpful metaphor. === The body's role in the cognitive process === The second aspect draws heavily from George Lakoff's and Mark Johnson's work on concepts. They argued that humans use metaphors whenever possible to better explain their external world. Humans also have a basic stock of concepts in which other concepts can be derived from. These basic concepts include spatial orientations such as up, down, front, and back. Humans can understand what these concepts mean because they can directly experience them from their own bodies. For example, because human movement revolves around standing erect and moving the body in an up-down motion, humans innately have these concepts of up and down. Lakoff and Johnson contend this is similar with other spatial orientations such as front and back too. As mentioned earlier, these basic stocks of spatial concepts are the basis in which other concepts are constructed. Happy and sad for instance are seen now as being up or down respectively. When someone says they are feeling down, what they are really saying is that they feel sad for example. Thus the point here is that true understanding of these concepts is contingent on whether one can have an understanding of the human body. So the argument goes that if one lacked a human body, they could not possibly know what up or down could mean, or how it could relate to emotional states. [I]magine a spherical being living outside of any gravitational field, with no knowledge or imagination of any other kind of experience. What could UP possibly mean to such a being? While this does not mean that such beings would be incapable of expressing emotions in other words, it does mean that they would express emotions differently from humans. Human concepts of happiness and sadness would be different because human would have different bodies. So then an organism's body directly affects how it can think, because it uses metaphors related to its body as the basis of concepts. === Interaction of local environment === A third component of the embodied approach looks at how agents use their immediate environment in cognitive processing. Meaning, the local environment is seen as an actual extension of the body's cognitive process. The example of a personal digital assistant (PDA) is used to better imagine this. Echoing functionalism (philosophy of mind), this point claims that mental states are individuated by their role in a much larger system. So under this premise, the information on a PDA is similar to the information stored in the brain. So then if one thinks information in the brain constitutes mental states, then it must follow that information in the PDA is a cognitive state too. Consider also the role of pen and paper in a complex multiplication problem. The pen and paper are so involved in the cognitive process of solving the problem that it seems ridiculous to say they are somehow different from the process, in very much the same way the PDA is used for information like the brain. Another example examines how humans control and manipulate their environment

    Read more →
  • Artificial intelligence in education

    Artificial intelligence in education

    Artificial intelligence in education (often abbreviated as AIEd) is a subfield of educational technology that studies how to use artificial intelligence to create learning environments. Considerations in the field include data-driven decision-making, AI ethics, data privacy and AI literacy. Concerns include the potential for cheating, over-reliance, equity of access, reduced critical thinking, and the perpetuation of misinformation and bias. == History == Efforts to integrate AI into educational contexts have often followed technological advancement in the history of artificial intelligence. In the 1960s, educators and researchers began developing computer-based instruction systems, such as PLATO, developed by the University of Illinois. In the 1970s and 1980s, intelligent tutoring systems (ITS) were being adapted for classroom instruction. The International Artificial Intelligence in Education Society was founded in 1993. Coinciding with the AI boom of the 2020s, the use of large language models in the global north has been promoted and funded by venture capital and big tech. Companies creating AI services have targeted students and educational institutions as customers. Similarly, pre-AI boom educational companies have expanded their use of AI technologies. These commercial incentives for AIEd use may be related to a potential AI bubble. In the U.S., bipartisan support of AI development in K-12 education has been expressed, but specific implementations and best practices remain contentious. == Theory == AIEd applies theory from education studies, machine learning, and related fields. A 2019 review of the previous decade of studies found that most research prioritized technological design over pedagogical integration. Ouyang and Jiao (2021) propose three paradigms for AI in education, which follow roughly from least to most learner-centered and from requiring least to most technical complexity from the AI systems: AI-directed, learner-as-recipient: AIEd systems present a pre-set curriculum based on statistical patterns that do not adjust to learner's feedback. AI-supported, learner-as-collaborator: Systems that incorporate responsiveness to learner's feedback through, for example, natural language processing, wherein AI can support knowledge construction. AI-empowered, learner-as-leader: This model seeks to position AI as a supplement to human intelligence wherein learners take agency and AI provides consistent and actionable feedback. Some scholars place AI in education within a socio-technical framework. This positions AI alongside other emerging educational technologies, such as computing, the internet, and social media. The framework of Tsao, Heinrichs and Camit (2025) draws on new materialism and posthumanism, specifically Donna Haraway's concept of sympoiesis (making-with). This perspective views learning as an entanglement of human and non-human actors (students, teachers, and AI algorithms), where knowledge is co-composed in contact zones between human context and algorithmic prediction. AI agents have been trained on biased datasets, and thus continue to perpetuate societal biases. Since LLMs were created to produce human-like text, algorithmic bias can be introduced and reproduced. AI's data processing and monitoring reinforce neoliberal approaches to education rather than addressing inequalities. == Applications == Uses of generative AI chatbots in education have included assessment and feedback, machine translations, proof-reading exam question generation and copy editing, or as virtual assistants. Emotional AI in education is the study and development of systems that can detect learners' emotions or provide emotional support in learning. == Usage == === Schools and educators === Following the release of ChatGPT in November 2022, some schools and large school districts blocked access to the site and issued warnings that the use of such tools would be seen as cheating. Governmental and non-governmental organizations such as UNESCO, Article 4 of the European Union's AI Act, and the U.S. Department of Education have published reports advocating for specific AIEd approaches. National higher-education bodies have also published guidance on generative AI, including Ireland's Higher Education Authority, which issued a policy framework for higher education teaching and learning in December 2025. In 2024, UNESCO released updated global guidance for generative AI in education, emphasizing ethical use, teacher training, and data protection to ensure responsible integration of AI tools in learning environments. According to Taso (2025), policy implementation in higher education is interpreted and enacted differently by various organizations. These decentralized policies can lead to inconsistent enforcement and confusion among students regarding what constitutes acceptable use, with the burden of ethical navigation falling on individual teachers and students. AI integration in classrooms has created new forms of invisible labour for educators, who must navigate ambiguous policies, redesign assessments to be AI-resilient, and adjudicate potential academic integrity violations. The use of AI detection tools has also been criticised for creating an adversarial relationship between students and institutions, where students may be falsely accused of misconduct based on probabilistic software. AIEd advocates say that efforts should be made towards increasing global accessibility and training educators to serve underprivileged areas. === Students === Reliance on generative AI has been linked with reduced academic self-esteem and performance, and heightened learned helplessness. Algorithm errors and hallucinations are common flaws in AI agents, making them less trustworthy and reliable. According to a 2025 survey from Inside Higher Ed, 85% of higher education students use generative AI technology in some way, with 25% using AI to complete assignments for them. The most common reason cited for using AI to cheat was pressure to get high grades. 97% of students wanted some form of action from schools on the threat to academic integrity caused by AI, with the most popular options being clearer policies and more education about ethical uses of AI. In September 2025, The Atlantic published an op-ed from a high school senior arguing that the normalization of AI cheating was eroding critical thinking, academic integrity, creativity, and the shared student experience.

    Read more →
  • Nouvelle AI

    Nouvelle AI

    Nouvelle artificial intelligence (Nouvelle AI) is an approach to artificial intelligence pioneered in the 1980s by Rodney Brooks, who was then part of MIT artificial intelligence laboratory. Nouvelle AI differs from classical AI by aiming to produce robots with intelligence levels similar to insects. Researchers believe that intelligence can emerge organically from simple behaviors as these intelligences interacted with the "real world", instead of using the constructed worlds which symbolic AIs typically needed to have programmed into them. == Motivation == The differences between nouvelle AI and symbolic AI are apparent in early robots Shakey and Freddy. These robots contained an internal model (or "representation") of their micro-worlds consisting of symbolic descriptions. As a result, this structure of symbols had to be renewed as the robot moved or the world changed. Shakey's planning programs assessed the program structure and broke it down into the necessary steps to complete the desired action. This level of computation required a large amount time to process, so Shakey typically performed its tasks very slowly. Symbolic AI researchers had long been plagued by the problem of updating, searching, and otherwise manipulating the symbolic worlds inside their AIs. A nouvelle system refers continuously to its sensors rather than to an internal model of the world. It processes the external world information it needs from the senses when it is required. As Brooks puts it, "the world is its own best model--always exactly up to date and complete in every detail." A central idea of nouvelle AI is that simple behaviors combine to form more complex behaviors over time. For example, simple behaviors can include elements like "move forward" and "avoid obstacles." A robot using nouvelle AI with simple behaviors like collision avoidance and moving toward a moving object could possibly come together to produce a more complex behavior like chasing a moving object. === The frame problem === The frame problem describes an issue with using first-order logic (FOL) to express facts about a robot in the world. Representing the state of a robot with traditional FOL requires the use of many axioms (symbolic language) to imply that things about an environment do not change arbitrarily. Nouvelle AI seeks to sidestep the frame problem by dispensing with filling the AI or robot with volumes of symbolic language and instead letting more complex behaviors emerge by combining simpler behavioral elements. === Embodiment === The goal of traditional AI was to build intelligences without bodies, which would only have been able to interact with the world via keyboard, screen, or printer. However, nouvelle AI attempts to build embodied intelligence situated in the real world. Brooks quotes approvingly from the brief sketches that Turing gave in 1948 and 1950 of the "situated" approach. Turing wrote of equipping a machine "with the best sense organs that money can buy" and teaching it "to understand and speak English" by a process that would "follow the normal teaching of a child." This approach was contrasted to the others where they focused on abstract activities such as playing chess. == Brooks' robots == === Insectoid robots === Brooks focused on building robots that acted like simple insects while simultaneously working to remove some traditional AI characteristics. He created insect-like robots, named Allen and Herbert after cognitive science and AI pioneers Allen Newell and Herbert A. Simon. Brooks's insectoid robots contained no internal models of the world. Herbert, for example, discarded a high volume of the information received from its sensors and never stored information for more than two seconds. ==== Allen ==== Allen had a ring of twelve ultrasonic sonars as its primary sensors and three independent behavior-producing modules. These modules were programmed to avoid both stationary and moving objects. With only this module activated, Allen stayed in the middle of a room until an object approached and then it ran away while avoiding obstacles in its way. ==== Herbert ==== Herbert used infrared sensors to avoid obstacles and a laser system to collect 3D data over a distance of about 12 feet. Herbert also carried a number of simple sensors in its "hand." The robot's testing ground was the real world environment of the busy offices and workspaces of the MIT AI lab where it searched for empty soda cans and carried them away, a seemingly goal-oriented activity that emerged as a result of 15 simple behavior units combining. As a parallel, Simon noted that an ant's complicated path is due to the structure of its environment rather than the depth of its thought processes. ==== Other insectoid robots ==== Other robots by Brooks' team were Genghis and Squirt. Genghis had six legs and was able to walk over rough terrain and follow a human. Squirt's behavior modules had it stay in dark corners until it heard a noise, then it would begin to follow the source of the noise. Brooks agreed that the level of nouvelle AI had come near the complexity of a real insect, which raised a question about whether or not insect level-behavior was and is a reasonable goal for nouvelle AI. === Humanoid robots === Brooks' own recent work has taken the opposite direction to that proposed by Von Neumann in the quotations "theorists who select the human nervous system as their model are unrealistically picking 'the most complicated object under the sun,' and that there is little advantage in selecting instead the ant, since any nervous system at all exhibits exceptional complexity." ==== Cog ==== In the 1990s, Brooks decided to pursue the goal of human-level intelligence and, with Lynn Andrea Stein, built a humanoid robot called Cog. Cog is a robot with an extensive collection of sensors, a face, and arms (among other features) that allow it to interact with the world and gather information and experience so as to assemble intelligence organically in the manner described above by Turing. The team believed that Cog would be able to learn and able to find a correlation between the sensory information it received and its actions, and to learn common sense knowledge on its own. As of 2003, all development of the project had ceased.

    Read more →
  • Syman

    Syman

    SYMAN is an artificial intelligence technology that uses data from social media profiles to identify trends in the job market. SYMAN is designed to organize actionable data for products and services including recruiting, human capital management, CRM, and marketing. SYMAN was developed with a $21 million series B financing round secured by Identified, which was led by VantagePoint Capital Partners and Capricorn Investment Group.

    Read more →
  • Intelligent agent

    Intelligent agent

    In artificial intelligence, an intelligent agent is an entity that perceives its environment, takes actions autonomously to achieve goals, and may improve its performance through machine learning or by acquiring knowledge. AI textbooks define artificial intelligence as the "study and design of intelligent agents," emphasizing that goal-directed behavior is central to intelligence. A specialized subset of intelligent agents, agentic AI (also known as an AI agent or simply agent), expands this concept by proactively pursuing goals, making decisions, and taking actions over extended periods. Intelligent agents can range from simple to highly complex. A basic thermostat or control system is considered an intelligent agent, as is a human being, or any other system that meets the same criteria—such as a firm, a state, or a biome. Intelligent agents operate based on an objective function, which encapsulates their goals. They are designed to create and execute plans that maximize the expected value of this function upon completion. For example, a reinforcement learning agent has a reward function, which allows programmers to shape its desired behavior. Similarly, an evolutionary algorithm's behavior is guided by a fitness function. Intelligent agents in artificial intelligence are closely related to agents in economics, and versions of the intelligent agent paradigm are studied in cognitive science, ethics, and the philosophy of practical reason, as well as in many interdisciplinary socio-cognitive modeling and computer social simulations. Intelligent agents are often described schematically as abstract functional systems similar to computer programs . To distinguish theoretical models from real-world implementations, abstract descriptions of intelligent agents are called abstract intelligent agents. Intelligent agents are also closely related to software agents—autonomous computer programs that carry out tasks on behalf of users. They are also referred to using a term borrowed from economics: a "rational agent". == Intelligent agents as the foundation of AI == The concept of intelligent agents provides a foundational lens through which to define and understand artificial intelligence. For instance, the influential textbook Artificial Intelligence: A Modern Approach (Russell & Norvig) describes: Agent: Anything that perceives its environment (using sensors) and acts upon it (using actuators). E.g., a robot with cameras and wheels, or a software program that reads data and makes recommendations. Rational Agent: An agent that strives to achieve the best possible outcome based on its knowledge and past experiences. "Best" is defined by a performance measure – a way of evaluating how well the agent is doing. Artificial Intelligence (as a field): The study and creation of these rational agents. Other researchers and definitions build upon this foundation. Padgham & Winikoff emphasize that intelligent agents should react to changes in their environment in a timely way, proactively pursue goals, and be flexible and robust (able to handle unexpected situations). Some also suggest that ideal agents should be "rational" in the economic sense (making optimal choices) and capable of complex reasoning, like having beliefs, desires, and intentions (BDI model). Kaplan and Haenlein offer a similar definition, focusing on a system's ability to understand external data, learn from that data, and use what is learned to achieve goals through flexible adaptation. Defining AI in terms of intelligent agents offers several key advantages: Avoids Philosophical Debates: It sidesteps arguments about whether AI is "truly" intelligent or conscious, like those raised by the Turing test or Searle's Chinese Room. It focuses on behavior and goal achievement, not on replicating human thought. Objective Testing: It provides a clear, scientific way to evaluate AI systems. Researchers can compare different approaches by measuring how well they maximize a specific "goal function" (or objective function). This allows for direct comparison and combination of techniques. Interdisciplinary Communication: It creates a common language for AI researchers to collaborate with other fields like mathematical optimization and economics, which also use concepts like "goals" and "rational agents." == Objective function == An objective function (or goal function) specifies the goals of an intelligent agent. An agent is deemed more intelligent if it consistently selects actions that yield outcomes better aligned with its objective function. In effect, the objective function serves as a measure of success. The objective function may be: Simple: For example, in a game of Go, the objective function might assign a value of 1 for a win and 0 for a loss. Complex: It might require the agent to evaluate and learn from past actions, adapting its behavior based on patterns that have proven effective. The objective function encapsulates all of the goals the agent is designed to achieve. For rational agents, it also incorporates the trade-offs between potentially conflicting goals. For instance, a self-driving car's objective function might balance factors such as safety, speed, and passenger comfort. Different terms are used to describe this concept, depending on the context. These include: Utility function: Often used in economics and decision theory, representing the desirability of a state. Objective function: A general term used in optimization. Loss function: Typically used in machine learning, where the goal is to minimize the loss (error). Reward Function: Used in reinforcement learning. Fitness Function: Used in evolutionary systems. Goals, and therefore the objective function, can be: Explicitly defined: Programmed directly into the agent. Induced: Learned or evolved over time. In reinforcement learning, a "reward function" provides feedback, encouraging desired behaviors and discouraging undesirable ones. The agent learns to maximize its cumulative reward. In evolutionary systems, a "fitness function" determines which agents are more likely to reproduce. This is analogous to natural selection, where organisms evolve to maximize their chances of survival and reproduction. Some AI systems, such as nearest-neighbor, reason by analogy rather than being explicitly goal-driven. However, even these systems can have goals implicitly defined within their training data. Such systems can still be benchmarked by framing the non-goal system as one whose "goal" is to accomplish its narrow classification task. Systems not traditionally considered agents, like knowledge-representation systems, are sometimes included in the paradigm by framing them as agents with a goal of, for example, answering questions accurately. Here, the concept of an "action" is extended to encompass the "act" of providing an answer. As a further extension, mimicry-driven systems can be framed as agents optimizing a "goal function" based on how closely the agent mimics the desired behavior. In generative adversarial networks (GANs) of the 2010s, an "encoder"/"generator" component attempts to mimic and improvise human text composition. The generator tries to maximize a function representing how well it can fool an antagonistic "predictor"/"discriminator" component. While symbolic AI systems often use an explicit goal function, the paradigm also applies to neural networks and evolutionary computing. Reinforcement learning can generate intelligent agents that appear to act in ways intended to maximize a "reward function". Sometimes, instead of setting the reward function directly equal to the desired benchmark evaluation function, machine learning programmers use reward shaping to initially give the machine rewards for incremental progress. Yann LeCun stated in 2018, "Most of the learning algorithms that people have come up with essentially consist of minimizing some objective function." AlphaZero chess had a simple objective function: +1 point for each win, and -1 point for each loss. A self-driving car's objective function would be more complex. Evolutionary computing can evolve intelligent agents that appear to act in ways intended to maximize a "fitness function" influencing how many descendants each agent is allowed to leave. The mathematical formalism of AIXI was proposed as a maximally intelligent agent in this paradigm. However, AIXI is uncomputable. In the real world, an intelligent agent is constrained by finite time and hardware resources, and scientists compete to produce algorithms that achieve progressively higher scores on benchmark tests with existing hardware. == Agent function == An intelligent agent's behavior can be described mathematically by an agent function. This function determines what the agent does based on what it has seen. A percept refers to the agent's sensory inputs at a single point in time. For example, a self-driving car's percepts might include camera images, lidar data, GPS coordinates, and speed r

    Read more →
  • Gibberlink

    Gibberlink

    GibberLink is an acoustic data transmission project, with an open-source client available on GitHub, in which two conversational AI agents switch from speaking to one another in a Human-listenable language (such as English) to their own unique language that consists of a sound-level protocol after confirming they are both AI agents. The project was created by Anton Pidkuiko and Boris Starkov. == Reception == The project won the global top prize at the ElevenLabs Worldwide Hackathon. It has also been cited as raising questions around AI ethics and oversight. On February 23, 2025, a YouTube video of two independent conversational ElevenLabs AI agents being prompted to chat about booking a hotel (one as a caller, one as a receptionist) received coverage for going viral. In this video, both agents are prompted to switch to ggwave data-over-sound protocol when they identify the other side as AI, and keep speaking in English otherwise.

    Read more →
  • Astrostatistics

    Astrostatistics

    Astrostatistics is a discipline which spans astrophysics, statistical analysis and data mining. It is used to process the vast amount of data produced by automated scanning of the cosmos, to characterize complex datasets, and to link astronomical data to astrophysical theory. Many branches of statistics are involved in astronomical analysis including nonparametrics, multivariate regression and multivariate classification, time series analysis, and especially Bayesian inference. The field is closely related to astroinformatics.

    Read more →
  • DoorDash

    DoorDash

    DoorDash, Inc. is an American company operating online food ordering and food delivery. It trades under the symbol DASH. With a 56% market share, DoorDash is the largest food delivery platform in the United States. It also has a 60% market share in the convenience delivery category. As of December 31, 2020, the platform was used by 450,000 merchants, 20 million consumers, and had over one million delivery couriers. Founded by Tony Xu, Andy Fang, Stanley Tang and Evan Moore, DoorDash made its debut on the Fortune 500 list in 2024, ranking No. 443. DoorDash has been sued for or held legally liable for withholding tips, reducing tip transparency, antitrust price manipulation, listing restaurants without permission, misclassifying workers, withholding sick time, and illegally selling personal data. As of April 2026, DoorDash operates in the United States (including Puerto Rico), Canada, Australia, and New Zealand. Through its subsidiaries Deliveroo and Wolt, the company also operates across Europe, as well as in Azerbaijan, Georgia, Israel, Kazakhstan, Kuwait, and the United Arab Emirates. == History == In January 2013, Stanford University students Tony Xu, Stanley Tang, Andy Fang and Evan Moore launched PaloAltoDelivery.com in Palo Alto, California. In the summer of 2013, it received US$120,000 in seed money from Y Combinator in exchange for a 7% stake. It incorporated as DoorDash in June 2013. DoorDash's first partnership with a fast food burger restaurant chain was in April 2016, when it partnered with CKE Restaurants, parent company of Carl's Jr. and Hardee's, for food delivery. In December 2017, DoorDash announced its partnership with Wendy's for delivery from its restaurants. In December 2018, DoorDash overtook Uber Eats to hold the second position in total US food delivery sales, behind GrubHub. By March 2019, it had exceeded GrubHub in total sales, at 27.6% of the on-demand delivery market. By early 2019, DoorDash was the largest food delivery provider in the U.S., as measured by consumer spending. In October 2019, DoorDash opened its first ghost kitchen, DoorDash Kitchen, in Redwood City, California, with four restaurants operating at the location. By June 2020, DoorDash had raised more than $2.5 billion over several financing rounds from investors including Y Combinator, Charles River Ventures, SV Angel, Khosla Ventures, Sequoia Capital, SoftBank Group, GIC, and Kleiner Perkins. DoorDash announced a partnership with KFC in September 2020, followed by Taco Bell in October 2020. In November 2020, DoorDash announced the opening of its first physical restaurant location, partnering up with Bay Area restaurant Burma Bites to offer delivery and pick-up orders. In December 2020, it became a public company via an initial public offering, raising $3.37 billion. In November 2021, DoorDash acquired Finland's Wolt for €7bn. In August 2022, DoorDash announced it would end its partnership with Walmart in September, ending the companies' cooperation agreement from 2018. In November 2022, DoorDash announced plans to lay off 1,250 corporate employees, or about six percent of its workforce, to rein in expenses. In June 2023, DoorDash announced it would give its drivers the option of earning an hourly minimum wage instead of being paid per delivery. However, drivers are only paid hourly when on an active delivery. In September 2023, the company transferred its stock listing from the New York Stock Exchange to the Nasdaq. On December 18, 2023, DoorDash was added to the Nasdaq-100 index. In March 2025, DoorDash announced a partnership with Klarna, a Buy Now, Pay Later (BNPL) service, letting customers schedule small payments over a set period of time. DoorDash received widespread criticism from this decision, including internet mockery, given concerns about the increase of household debt in America. In 2025, DoorDash acquired the UK-based delivery service Deliveroo for $3.88 billion. The combined company operates in 40 countries and serves 50 million users monthly. In September 2025, DoorDash and Ace Hardware (the largest hardware cooperative) announced their partnership to offer delivery for home use products from over 4,000 Ace locations. == Lawsuits against DoorDash == === 2017 class-action lawsuit for misclassifying workers === In 2017, a class-action lawsuit was filed against DoorDash for allegedly misclassifying delivery drivers in California and Massachusetts as independent contractors. In 2022, a tentative settlement was reached in which DoorDash would pay $100 million total, with $61 million going to over 900,000 drivers, paying out just over $130 per driver, and $28 million for the lawyers. Gizmodo criticized the settlement, noting that the $413 million that DoorDash CEO Tony Xu received the previous year was one of the largest CEO compensation packages of all time. === 2019 data breach lawsuit === On May 4, 2019, DoorDash confirmed 4.9 million customers, delivery workers and merchants had sensitive information stolen via a data breach. Those who joined the platform after April 5, 2018, were unaffected by the breach. A class-action lawsuit for the breach was filed against DoorDash in October 2019. === Withholding of tips and subsequent class-action lawsuits === In July 2019, the company's tipping policy was criticized by The New York Times, and later The Verge and Vox and Gothamist. Drivers receive a guaranteed minimum per order that is paid by DoorDash by default. When a customer added a tip, instead of going directly to the driver, it first went to the company to cover the guaranteed minimum. Drivers then only directly received the part of the tip that exceeded the guaranteed minimum per order. In January 2020, it was reported that DoorDash had lied about skimming tips from its drivers, causing them to earn an average of $1.45 an hour after expenses, and that after the company had allegedly overhauled its tipping system, DoorDash was still manipulating per-delivery payouts at the expense of drivers. A DoorDash customer filed a class action lawsuit against the company for its "materially false and misleading" tipping policy. The case was referred to arbitration in August 2020. Under pressure, the company revised its policy. The company settled a lawsuit with District of Columbia Attorney General Karl Racine for $2.5 million, with funds going to deliverers, the government, and to charity. ==== 2021 driver strike for tip transparency ==== In July 2021, DoorDash drivers went on strike to protest lack of tip transparency and to ask for higher pay. At the time of the strike, and, as of June 2022, DoorDash did not allow drivers to see the full tip amounts prior to accepting a delivery in the app. If customers tip over a set amount for the order total, Doordash hides a portion of the tip until the delivery is complete. The strike occurred after DoorDash rewrote its code to cut off access to Para, a third-party app that drivers had been using to see the full tip amounts. ==== 2025 class-action lawsuit settlement ==== In 2025, DoorDash agreed to pay around $17 million for "misleading both consumers and delivery workers" with tips being docked from drivers' pay instead of directly going to drivers. === 2020 antitrust litigation === In April 2020, in the case of Davitashvili v. GrubHub Inc. DoorDash, Grubhub, Postmates, and Uber Eats were accused of monopolistic power by only listing restaurants on its apps if the restaurant owners signed contracts which include clauses that require prices be the same for dine-in customers as for customers receiving delivery. The plaintiffs stated that this arrangement increases the cost for dine-in customers, as they are required to subsidize the cost of delivery; and that the apps charge "exorbitant" fees, which range from 13% to 40% of revenue, while the average restaurant's profit ranges from 3% to 9% of revenue. The lawsuit seeks treble damages, including for overcharges, since April 14, 2016, for dine-in and delivery customers in the United States at restaurants using the defendants’ delivery apps. Although several preliminary documents in the case have now been filed, a trial date has not yet been set. === Litigation for illegal unauthorized restaurant listing === In May 2021, DoorDash was criticized for unauthorized listings of restaurants who had not given permission to appear on the app. The company was sued by Lona's Lil Eats in St. Louis, with the lawsuit claiming that DoorDash had listed them without permission, then prevented any orders to the restaurant from going through and redirecting customers to other restaurants instead, because Lona's was "too far away," when in reality it had not paid DoorDash a fee for listing. This aspect of DoorDash's business practice is illegal in California. === 2021 lawsuit by the city of Chicago === In August 2021, the city of Chicago sued DoorDash and GrubHub. According to Chicago mayor Lori Lightfoot, the companies broke the law by using "unfair and deceptive t

    Read more →
  • Manifold hypothesis

    Manifold hypothesis

    The manifold hypothesis posits that many high-dimensional data sets that occur in the real world actually lie along low-dimensional latent manifolds inside that high-dimensional space. As a consequence of the manifold hypothesis, many data sets that appear to initially require many variables to describe, can actually be described by a comparatively small number of variables, linked to the local coordinate system of the underlying manifold. It is suggested that this principle underpins the effectiveness of machine learning algorithms in describing high-dimensional data sets by considering a few common features. The manifold hypothesis is related to the effectiveness of nonlinear dimensionality reduction techniques in machine learning. Many techniques of dimensional reduction make the assumption that data lies along a low-dimensional submanifold, such as manifold sculpting, manifold alignment, and manifold regularization. The major implications of this hypothesis is that Machine learning models only have to fit relatively simple, low-dimensional, highly structured subspaces within their potential input space (latent manifolds). Within one of these manifolds, it's always possible to interpolate between two inputs, that is to say, morph one into another via a continuous path along which all points fall on the manifold. The ability to interpolate between samples is the key to generalization in deep learning. == The information geometry of statistical manifolds == An empirically-motivated approach to the manifold hypothesis focuses on its correspondence with an effective theory for manifold learning under the assumption that robust machine learning requires encoding the dataset of interest using methods for data compression. This perspective gradually emerged using the tools of information geometry thanks to the coordinated effort of scientists working on the efficient coding hypothesis, predictive coding and variational Bayesian methods. The argument for reasoning about the information geometry on the latent space of distributions rests upon the existence and uniqueness of the Fisher information metric. In this general setting, we are trying to find a stochastic embedding of a statistical manifold. From the perspective of dynamical systems, in the big data regime this manifold generally exhibits certain properties such as homeostasis: We can sample large amounts of data from the underlying generative process. Machine Learning experiments are reproducible, so the statistics of the generating process exhibit stationarity. In a sense made precise by theoretical neuroscientists working on the free energy principle, the statistical manifold in question possesses a Markov blanket.

    Read more →
  • Autonomic networking

    Autonomic networking

    Autonomic networking follows the concept of Autonomic Computing, an initiative started by IBM in 2001. Its ultimate aim is to create self-managing networks to overcome the rapidly growing complexity of the Internet and other networks and to enable their further growth, far beyond the size of today. == Increasing size and complexity == The ever-growing management complexity of the Internet caused by its rapid growth is seen by some experts as a major problem that limits its usability in the future. What's more, increasingly popular smartphones, PDAs, networked audio and video equipment, and game consoles need to be interconnected. Pervasive Computing not only adds features, but also burdens existing networking infrastructure with more and more tasks that sooner or later will not be manageable by human intervention alone. Another important aspect is the price of manually controlling huge numbers of vitally important devices of current network infrastructures. == Autonomic nervous system == The autonomic nervous system (ANS) is the part of complex biological nervous systems that is not consciously controlled. It regulates bodily functions and the activity of specific organs. As proposed by IBM, future communication systems might be designed in a similar way to the ANS. == Components of autonomic networking == As autonomics conceptually derives from biological entities such as the human autonomic nervous system, each of the areas can be metaphorically related to functional and structural aspects of a living being. In the human body, the autonomic system facilitates and regulates a variety of functions including respiration, blood pressure and circulation, and emotive response. The autonomic nervous system is the interconnecting fabric that supports feedback loops between internal states and various sources by which internal and external conditions are monitored. === Autognostics === Autognostics includes a range of self-discovery, awareness, and analysis capabilities that provide the autonomic system with a view on high-level state. In metaphor, this represents the perceptual sub-systems that gather, analyze, and report on internal and external states and conditions – for example, this might be viewed as the eyes, visual cortex and perceptual organs of the system. Autognostics, or literally "self-knowledge", provides the autonomic system with a basis for response and validation. A rich autognostic capability may include many different "perceptual senses". For example, the human body gathers information via the usual five senses, the so-called sixth sense of proprioception (sense of body position and orientation), and through emotive states that represent the gross wellness of the body. As conditions and states change, they are detected by the sensory monitors and provide the basis for adaptation of related systems. Implicit in such a system are imbedded models of both internal and external environments such that relative value can be assigned to any perceived state - perceived physical threat (e.g. a snake) can result in rapid shallow breathing related to fight-flight response, a phylogenetically effective model of interaction with recognizable threats. In the case of autonomic networking, the state of the network may be defined by inputs from: individual network elements such as switches and network interfaces including specification and configuration historical records and current state traffic flows end-hosts application performance data logical diagrams and design specifications Most of these sources represent relatively raw and unprocessed views that have limited relevance. Post-processing and various forms of analysis must be applied to generate meaningful measurements and assessments against which current state can be derived. The autognostic system interoperates with: configuration management - to control network elements and interfaces policy management - to define performance objectives and constraints autodefense - to identify attacks and accommodate the impact of defensive responses === Configuration management === Configuration management is responsible for the interaction with network elements and interfaces. It includes an accounting capability with historical perspective that provides for the tracking of configurations over time, with respect to various circumstances. In the biological metaphor, these are the hands and, to some degree, the memory of the autonomic system. On a network, remediation and provisioning are applied via configuration setting of specific devices. Implementation affecting access and selective performance with respect to role and relationship are also applied. Almost all the "actions" that are currently taken by human engineers fall under this area. With only a few exceptions, interfaces are set by hand, or by extension of the hand, through automated scripts. Implicit in the configuration process is the maintenance of a dynamic population of devices under management, a historical record of changes and the directives which invoked change. Typical to many accounting functions, configuration management should be capable of operating on devices and then rolling back changes to recover previous configurations. Where change may lead to unrecoverable states, the sub-system should be able to qualify the consequences of changes prior to issuing them. As directives for change must originate from other sub-systems, the shared language for such directives must be abstracted from the details of the devices involved. The configuration management sub-system must be able to translate unambiguously between directives and hard actions or to be able to signal the need for further detail on a directive. An inferential capacity may be appropriate to support sufficient flexibility (i.e. configuration never takes place because there is no unique one-to-one mapping between directive and configuration settings). Where standards are not sufficient, a learning capacity may also be required to acquire new knowledge of devices and their configuration. Configuration management interoperates with all of the other sub-systems including: autognostics - receives direction for and validation of changes policy management - implements policy models through mapping to underlying resources security - applies access and authorization constraints for particular policy targets autodefense - receives direction for changes === Policy management === Policy management includes policy specification, deployment, reasoning over policies, updating and maintaining policies, and enforcement. Policy-based management is required for: constraining different kinds of behavior including security, privacy, resource access, and collaboration configuration management describing business processes and defining performance defining role and relationship, and establishing trust and reputation It provides the models of environment and behavior that represent effective interaction according to specific goals. In the human nervous system metaphor, these models are implicit in the evolutionary "design" of biological entities and specific to the goals of survival and procreation. Definition of what constitutes a policy is necessary to consider what is involved in managing it. A relatively flexible and abstract framework of values, relationships, roles, interactions, resources, and other components of the network environment is required. This sub-system extends far beyond the physical network to the applications in use and the processes and end-users that employ the network to achieve specific goals. It must express the relative values of various resources, outcomes, and processes and include a basis for assessing states and conditions. Unless embodied in some system outside the autonomic network or implicit to the specific policy implementation, the framework must also accommodate the definition of process, objectives and goals. Business process definitions and descriptions are then an integral part of the policy implementation. Further, as policy management represents the ultimate basis for the operation of the autonomic system, it must be able to report on its operation with respect to the details of its implementation. The policy management sub-system interoperates (at least) indirectly with all other sub-systems but primarily interacts with: autognostics - providing the definition of performance and accepting reports on conditions configuration management - providing constraints on device configuration security - providing definitions of roles, access and permissions === Autodefense === Autodefense represents a dynamic and adaptive mechanism that responds to malicious and intentional attacks on the network infrastructure, or use of the network infrastructure to attack IT resources. As defensive measures tend to impede the operation of IT, it is optimally capable of balancing performance objectives with typically over-riding threat management actions. In the

    Read more →
  • Neural scaling law

    Neural scaling law

    In machine learning, a neural scaling law is an empirical scaling law that describes how neural network performance changes as key factors are scaled up or down. These factors typically include the number of parameters, training dataset size, and training cost. Some models also exhibit performance gains by scaling inference through increased test-time compute (TTC), extending neural scaling laws beyond training to the deployment phase. == Introduction == In general, a deep learning model can be characterized by four parameters: model size, training dataset size, training cost, and the post-training error rate (e.g., the test set error rate). Each of these variables can be defined as a real number, usually written as N , D , C , L {\displaystyle N,D,C,L} (respectively: parameter count, dataset size, computing cost, and loss). A neural scaling law is a theoretical or empirical statistical law between these parameters. There are also other parameters with other scaling laws. === Size of the model === In most cases, the model's size is simply the number of parameters. However, one complication arises with the use of sparse models, such as mixture-of-expert models. With sparse models, during inference, only a fraction of their parameters are used. In comparison, most other kinds of neural networks, such as transformer models, always use all their parameters during inference. === Size of the training dataset === The size of the training dataset is usually quantified by the number of data points within it. Larger training datasets are typically preferred, as they provide a richer and more diverse source of information from which the model can learn. This can lead to improved generalization performance when the model is applied to new, unseen data. However, increasing the size of the training dataset also increases the computational resources and time required for model training. With the "pretrain, then finetune" method used for most large language models, there are two kinds of training dataset: the pretraining dataset and the finetuning dataset. Their sizes have different effects on model performance. Generally, the finetuning dataset is less than 1% the size of pretraining dataset. In some cases, a small amount of high quality data suffices for finetuning, and more data does not necessarily improve performance. Many scaling laws, due to their inherent diminishing returns nature, value data based on a submodular set function which was shown in a paper on this topic. === Cost of training === Training cost is typically measured in terms of time (how long it takes to train the model) and computational resources (how much processing power and memory are required). It is important to note that the cost of training can be significantly reduced with efficient training algorithms, optimized software libraries, and parallel computing on specialized hardware such as GPUs or TPUs. The cost of training a neural network model is a function of several factors, including model size, training dataset size, the training algorithm complexity, and the computational resources available. In particular, doubling the training dataset size does not necessarily double the cost of training, because one may train the model for several times over the same dataset (each being an "epoch"). === Performance === The performance of a neural network model is evaluated based on its ability to accurately predict the output given some input data. Common metrics for evaluating model performance include: Negative log-likelihood per token (logarithm of perplexity) for language modeling; Accuracy, precision, recall, and F1 score for classification tasks; Mean squared error (MSE) or mean absolute error (MAE) for regression tasks; Elo rating in a competition against other models, such as gameplay or preference by a human judge. Performance can be improved by using more data, larger models, different training algorithms, regularizing the model to prevent overfitting, and early stopping using a validation set. When the performance is a number bounded within the range of [ 0 , 1 ] {\displaystyle [0,1]} , such as accuracy, precision, etc., it often scales as a sigmoid function of cost, as seen in the figures. == Examples == === (Hestness, Narang, et al, 2017) === The 2017 paper is a common reference point for neural scaling laws fitted by statistical analysis on experimental data. Previous works before the 2000s, as cited in the paper, were either theoretical or orders of magnitude smaller in scale. Whereas previous works generally found the scaling exponent to scale like L ∝ D − α {\displaystyle L\propto D^{-\alpha }} , with α ∈ { 0.5 , 1 , 2 } {\displaystyle \alpha \in \{0.5,1,2\}} , the paper found that α ∈ [ 0.07 , 0.35 ] {\displaystyle \alpha \in [0.07,0.35]} . Of the factors they varied, only task can change the exponent α {\displaystyle \alpha } . Changing the architecture optimizers, regularizers, and loss functions, would only change the proportionality factor, not the exponent. For example, for the same task, one architecture might have L = 1000 D − 0.3 {\displaystyle L=1000D^{-0.3}} while another might have L = 500 D − 0.3 {\displaystyle L=500D^{-0.3}} . They also found that for a given architecture, the number of parameters necessary to reach lowest levels of loss, given a fixed dataset size, grows like N ∝ D β {\displaystyle N\propto D^{\beta }} for another exponent β {\displaystyle \beta } . They studied machine translation with LSTM ( α ∼ 0.13 {\displaystyle \alpha \sim 0.13} ), generative language modelling with LSTM ( α ∈ [ 0.06 , 0.09 ] , β ≈ 0.7 {\displaystyle \alpha \in [0.06,0.09],\beta \approx 0.7} ), ImageNet classification with ResNet ( α ∈ [ 0.3 , 0.5 ] , β ≈ 0.6 {\displaystyle \alpha \in [0.3,0.5],\beta \approx 0.6} ), and speech recognition with two hybrid (LSTMs complemented by either CNNs or an attention decoder) architectures ( α ≈ 0.3 {\displaystyle \alpha \approx 0.3} ). === (Henighan, Kaplan, et al, 2020) === A 2020 analysis studied statistical relations between C , N , D , L {\displaystyle C,N,D,L} over a wide range of values and found similar scaling laws, over the range of N ∈ [ 10 3 , 10 9 ] {\displaystyle N\in [10^{3},10^{9}]} , C ∈ [ 10 12 , 10 21 ] {\displaystyle C\in [10^{12},10^{21}]} , and over multiple modalities (text, video, image, text to image, etc.). In particular, the scaling laws it found are (Table 1 of ): For each modality, they fixed one of the two C , N {\displaystyle C,N} , and varying the other one ( D {\displaystyle D} is varied along using D = C / 6 N {\displaystyle D=C/6N} ), the achievable test loss satisfies L = L 0 + ( x 0 x ) α {\displaystyle L=L_{0}+\left({\frac {x_{0}}{x}}\right)^{\alpha }} where x {\displaystyle x} is the varied variable, and L 0 , x 0 , α {\displaystyle L_{0},x_{0},\alpha } are parameters to be found by statistical fitting. The parameter α {\displaystyle \alpha } is the most important one. When N {\displaystyle N} is the varied variable, α {\displaystyle \alpha } ranges from 0.037 {\displaystyle 0.037} to 0.24 {\displaystyle 0.24} depending on the model modality. This corresponds to the α = 0.34 {\displaystyle \alpha =0.34} from the Chinchilla scaling paper. When C {\displaystyle C} is the varied variable, α {\displaystyle \alpha } ranges from 0.048 {\displaystyle 0.048} to 0.19 {\displaystyle 0.19} depending on the model modality. This corresponds to the β = 0.28 {\displaystyle \beta =0.28} from the Chinchilla scaling paper. Given fixed computing budget, optimal model parameter count is consistently around N o p t ( C ) = ( C 5 × 10 − 12 petaFLOP-day ) 0.7 = 9.0 × 10 − 7 C 0.7 {\displaystyle N_{opt}(C)=\left({\frac {C}{5\times 10^{-12}{\text{petaFLOP-day}}}}\right)^{0.7}=9.0\times 10^{-7}C^{0.7}} The parameter 9.0 × 10 − 7 {\displaystyle 9.0\times 10^{-7}} varies by a factor of up to 10 for different modalities. The exponent parameter 0.7 {\displaystyle 0.7} varies from 0.64 {\displaystyle 0.64} to 0.75 {\displaystyle 0.75} for different modalities. This exponent corresponds to the ≈ 0.5 {\displaystyle \approx 0.5} from the Chinchilla scaling paper. It's "strongly suggested" (but not statistically checked) that D o p t ( C ) ∝ N o p t ( C ) 0.4 ∝ C 0.28 {\displaystyle D_{opt}(C)\propto N_{opt}(C)^{0.4}\propto C^{0.28}} . This exponent corresponds to the ≈ 0.5 {\displaystyle \approx 0.5} from the Chinchilla scaling paper. The scaling law of L = L 0 + ( C 0 / C ) 0.048 {\displaystyle L=L_{0}+(C_{0}/C)^{0.048}} was confirmed during the training of GPT-3 (Figure 3.1 ). === Chinchilla scaling (Hoffmann, et al, 2022) === One particular scaling law ("Chinchilla scaling") states that, for a large language model (LLM) autoregressively trained for one epoch, with a cosine learning rate schedule, we have: { C = C 0 N D L = A N α + B D β + L 0 {\displaystyle {\begin{cases}C=C_{0}ND\\L={\frac {A}{N^{\alpha }}}+{\frac {B}{D^{\beta }}}+L_{0}\end{cases}}} where the variables are C {\displaystyle C} is the cost o

    Read more →
  • AFNLP

    AFNLP

    AFNLP (Asian Federation of Natural Language Processing Associations) is the organization for coordinating the natural language processing related activities and events in the Asia-Pacific region. == Foundation == AFNLP was founded on 4 October 2000. == Member Associations == ALTA – Australasian Language Technology Association ANLP Japan Association of Natural Language Processing ROCLING Taiwan ROC Computational Linguistics Society SIG-KLC Korea SIG-Korean Language Computing of Korea Information Science Society == Existing Asian Initiatives == NLPRS: Natural Language Processing Pacific Rim Symposium IRAL: International Workshop on Information Retrieval with Asian Languages PACLING: Pacific Association for Computational Linguistics PACLIC: Pacific Asia Conference on Language, Information and Computation PRICAI: Pacific Rim International Conference on AI ICCPOL: International Conference on Computer Processing of Oriental Languages ROCLING: Research on Computational Linguistics Conference == Conferences == IJCNLP-04: The 1st International Joint Conference on Natural Language Processing in Hainan Island, China IJCNLP-05: The 2nd International Joint Conference on Natural Language Processing in Jeju Island, Korea IJCNLP-08: The 3rd International Joint Conference on Natural Language Processing in Hyderabad, India ACL-IJCNLP-2009: Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics (ACL) and 4th International Joint Conference on Natural Language Processing (IJCNLP) in Singapore IJNCLP-11: The 5th International Joint Conference on Natural Language Processing in Chiang Mai, Thailand

    Read more →
  • Organoid intelligence

    Organoid intelligence

    Organoid intelligence (OI) is an emerging field of study in computer science and biology that develops and studies biological wetware computing using 3D cultures of human brain cells (or brain organoids) and brain-machine interface technologies. Such technologies may be referred to as OIs or the nervous filesystem. Organoid intelligent computer systems can be an example of biohybrid systems. == Differences with non-organic computing == As opposed to traditional non-organic silicon-based approaches, OI seeks to use lab-grown cerebral organoids to serve as "biological hardware". While these structures are still far from being able to think like a regular human brain and do not yet possess strong computing capabilities, OI research currently offers the potential to improve the understanding of brain development, learning and memory, potentially finding treatments for neurological disorders such as dementia. Thomas Hartung, a professor from Johns Hopkins University, argued in 2023 that "while silicon-based computers are certainly better with numbers, brains are better at learning." He noted that transistor density in computer chip may be approaching its limits, whereas brains, being wired differently, are more energy-efficient and can store large amounts of information. Some researchers claim that even though human brains are slower than machines at processing simple information, they are far better at processing complex information as brains can deal with fewer and more uncertain data, perform both sequential and parallel processing, being highly heterogenous, use incomplete datasets, and is said to outperform non-organic machines in decision-making. Training OIs involve the process of biological learning (BL) as opposed to machine learning (ML) for AIs. == Bioinformatics in OI == OI generates complex biological data, necessitating sophisticated methods for processing and analysis. Bioinformatics provides the tools and techniques to decipher raw data, uncovering the patterns and insights. Researchers have developed a platform named Neuroplatform for experimenting remotely with brain organoids via an API. == Intended functions == Brain-inspired computing hardware aims to emulate the structure and working principles of the brain and could be used to address current limitations in AI technologies. However, brain-inspired silicon chips are still limited in their ability to fully mimic brain function, as most examples are built on digital electronic principles. One study performed OI computation (which they termed Brainoware) by sending and receiving information from the brain organoid using a high-density multielectrode array. By applying spatiotemporal electrical stimulation, nonlinear dynamics, and fading memory properties, as well as unsupervised learning from training data by reshaping the organoid functional connectivity, the study showed the potential of this technology by using it for speech recognition and nonlinear equation prediction in a reservoir computing framework. == Ethical concerns == While researchers are hoping to use OI and biological computing to complement traditional silicon-based computing, there are also questions about the ethics of such an approach. Concerns include the possibility that an organoid could develop sentience or consciousness, and the question of the relationship between a stem cell donor (for growing the organoid) and the respective OI system.

    Read more →
  • AIXI

    AIXI

    AIXI is a theoretical mathematical formalism for artificial general intelligence. It combines Solomonoff induction with sequential decision theory. AIXI was first proposed by Marcus Hutter in 2000 and several results regarding AIXI are proved in Hutter's 2005 book Universal Artificial Intelligence. AIXI is a reinforcement learning (RL) agent. It maximizes the expected total rewards received from the environment. Intuitively, it simultaneously considers every computable hypothesis (or environment). In each time step, it looks at every possible program and evaluates how many rewards that program generates depending on the next action taken. The promised rewards are then weighted by the subjective belief that this program constitutes the true environment. This belief is computed from the length of the program: longer programs are considered less likely, in line with Occam's razor. AIXI then selects the action that has the highest expected total reward in the weighted sum of all these programs. == Etymology == According to Hutter, the word "AIXI" can have several interpretations. AIXI can stand for AI based on Solomonoff's distribution, denoted by ξ {\displaystyle \xi } (which is the Greek letter xi), or e.g. it can stand for AI "crossed" (X) with induction (I). There are other interpretations. == Definition == AIXI is a reinforcement learning agent that interacts with some stochastic and unknown but computable environment μ {\displaystyle \mu } . The interaction proceeds in time steps, from t = 1 {\displaystyle t=1} to t = m {\displaystyle t=m} , where m ∈ N {\displaystyle m\in \mathbb {N} } is the lifespan of the AIXI agent. At time step t, the agent chooses an action a t ∈ A {\displaystyle a_{t}\in {\mathcal {A}}} (e.g. a limb movement) and executes it in the environment, and the environment responds with a "percept" e t ∈ E = O × R {\displaystyle e_{t}\in {\mathcal {E}}={\mathcal {O}}\times \mathbb {R} } , which consists of an "observation" o t ∈ O {\displaystyle o_{t}\in {\mathcal {O}}} (e.g., a camera image) and a reward r t ∈ R {\displaystyle r_{t}\in \mathbb {R} } , distributed according to the conditional probability μ ( o t r t | a 1 o 1 r 1 . . . a t − 1 o t − 1 r t − 1 a t ) {\displaystyle \mu (o_{t}r_{t}|a_{1}o_{1}r_{1}...a_{t-1}o_{t-1}r_{t-1}a_{t})} , where a 1 o 1 r 1 . . . a t − 1 o t − 1 r t − 1 a t {\displaystyle a_{1}o_{1}r_{1}...a_{t-1}o_{t-1}r_{t-1}a_{t}} is the "history" of actions, observations and rewards. The environment μ {\displaystyle \mu } is thus mathematically represented as a probability distribution over "percepts" (observations and rewards) which depend on the full history, so there is no Markov assumption (as opposed to other RL algorithms). Note again that this probability distribution is unknown to the AIXI agent. Furthermore, note again that μ {\displaystyle \mu } is computable, that is, the observations and rewards received by the agent from the environment μ {\displaystyle \mu } can be computed by some program (which runs on a Turing machine), given the past actions of the AIXI agent. The only goal of the AIXI agent is to maximize ∑ t = 1 m r t {\displaystyle \sum _{t=1}^{m}r_{t}} , that is, the sum of rewards from time step 1 to m. The AIXI agent is associated with a stochastic policy π : ( A × E ) ∗ → A {\displaystyle \pi :({\mathcal {A}}\times {\mathcal {E}})^{}\rightarrow {\mathcal {A}}} , which is the function it uses to choose actions at every time step, where A {\displaystyle {\mathcal {A}}} is the space of all possible actions that AIXI can take and E {\displaystyle {\mathcal {E}}} is the space of all possible "percepts" that can be produced by the environment. The environment (or probability distribution) μ {\displaystyle \mu } can also be thought of as a stochastic policy (which is a function): μ : ( A × E ) ∗ × A → E {\displaystyle \mu :({\mathcal {A}}\times {\mathcal {E}})^{}\times {\mathcal {A}}\rightarrow {\mathcal {E}}} , where the ∗ {\displaystyle } is the Kleene star operation. In general, at time step t {\displaystyle t} (which ranges from 1 to m), AIXI, having previously executed actions a 1 … a t − 1 {\displaystyle a_{1}\dots a_{t-1}} (which is often abbreviated in the literature as a < t {\displaystyle a_{ Read more →