AI Essay Reddit

AI Essay Reddit — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • ImageMixer

    ImageMixer

    ImageMixer is a brand name of video editing software that edits digital video and still image in camcorders and authors to VCD and DVD. It is a second-party Japanese product, distributed by Pixela Corporation, a Japanese manufacturer of PC peripheral hardware and multimedia software. == Bundling == ImageMixer is widely used for several camcorder brands, such as JVC, Hitachi and Canon. Also, Sony has chosen to package ImageMixer with its DVD and HDD Handycam. == ImageMixer series == ImageMixer has other series of software for digital camera, such as ImageMixer Label Maker and ImageMixer DVD dubbing. ImageMixer also has movie editing solution for Macintosh. == Windows Vista version of ImageMixer == A Windows Vista version of ImageMixer has been developed (ImageMixer3).

    Read more →
  • Business process automation

    Business process automation

    Business process automation (BPA), also known as business automation, refers to the technology-enabled automation of business processes. == Development approaches == There are three main approaches to developing BPA: traditional business process automation involves developing BPA software in a programming language for integrating relevant applications in the digital ecosystem to execute a given process; robotic process automation uses software robots (also called agents, bots, or workers) to emulate human-computer interaction for executing a combination of processes, activities, transactions, and tasks in one or more unrelated software systems; hyperautomation (also called intelligent automation (IA), intelligent process automation (IPA), integrated automation platform (IAP), and cognitive automation (CA) combines business process automation, artificial intelligence (AI), and machine learning (ML) to discover, validate, and execute organizational processes automatically with no or minimal human intervention. == Deployment == BPA toolsets vary in capability. With the increasing adoption of artificial intelligence (AI), organizations are implementing AI-driven technologies that can process natural language, interpret unstructured datasets, and interact with users. These systems are designed to adapt to new types of problems with reduced reliance on human intervention. == Business process management implementation == A business process management system differs from BPA. However, it is possible to implement automation based on a BPM implementation. The methods to achieve this vary, from writing custom application code to using specialist BPA tools. == Robotic process automation == Robotic process automation (RPA) involves the deployment of attended or unattended software agents in an organization's environment. These software agents, or robots, are programmed to perform predefined structured and repetitive sets of business tasks or processes. Robotic process automation is designed to streamline workflows by delegating repetitive tasks to software agents, allowing human workers to focus on more complex and strategic activities. BPA providers typically focus on different industry sectors, but the underlying approach is generally similar in that they aim to provide the shortest route to automation by interacting with the user interface rather than modifying the application code or database behind it. == Use of artificial intelligence == Artificial intelligence software robots are used to handle unstructured data sets (like images, texts, audios) and are often deployed after implementing robotic process automation. They can, for instance, generate an automatic transcript from a video. The combination of automation and artificial intelligence (AI) enables autonomy for robots, along with the capability to perform cognitive tasks. At this stage, robots can learn and improve processes by analyzing and adapting them.

    Read more →
  • AI agent

    AI agent

    In the context of generative artificial intelligence, AI agents (also referred to as compound AI systems or agentic AI) are a class of intelligent agents that can pursue goals, use tools, and take actions with varying degrees of autonomy. In practice, they usually operate within human-defined objectives, constraints, and available tools. == Overview == AI agents possess several key attributes, including goal-directed behavior, natural language interfaces, the capacity to use external tools, and the ability to perform multi-step tasks. Their control flow is frequently driven by large language models (LLMs). Agent systems may also include memory components, planning logic, tool interfaces, and orchestration software for coordinating agent components. AI agents do not have a standard definition. NIST describes agentic AI as an emerging area requiring standards for secure operation, interoperability, and reliable interaction with external systems. A common application of AI agents is task automation: for example, booking travel plans based on a user's prompted request. Companies such as Google, Microsoft and Amazon Web Services have offered platforms for deploying pre-built AI agents. Several protocols have been proposed for standardizing inter-agent communication, with examples including the Model Context Protocol, Gibberlink, and many others. Some of these protocols are also used for connecting agents to external applications. In December 2025, Linux Foundation announced the formation of the Agentic AI Foundation (AAIF), with the goal of ensuring agentic AI evolves transparently and collaboratively. == History == AI agents have been traced back to research from the 1990s, with Harvard professor Milind Tambe noting that the definition of an AI agent was not clear at the time. Researcher Andrew Ng has been credited with spreading the term "agentic" to a wider audience in 2024. == Training and testing == Researchers have attempted to build world models and reinforcement learning environments to train or evaluate AI agents. For example, video games such as Minecraft and No Man's Sky as well as replicas of company websites, have also been used for training such agents. == Autonomous capabilities == The Financial Times compared the autonomy of AI agents to the SAE classification of self-driving cars, likening most applications to level 2 or level 3, with some achieving level 4 in highly specialized circumstances, and level 5 being theoretical. == Cognitive architecture == The following are some internal design options for reasoning within an agent: Retrieval-augmented generation ReAct (Reason + Act) pattern is an iterative process in which an AI agent alternates between reasoning and taking actions, receives observations from the environment or external tools, and integrates these observations into subsequent reasoning steps. Reflexion, which uses an LLM to create feedback on the agent's plan of action and stores that feedback in a memory cache. A tool/agent registry, for organizing software functions or other agents that the agent can use. One-shot model querying, which queries the model once to create the plan of action. === Reference architecture === Ken Huang proposed an AI agent reference architecture, which consists of seven interconnected layers, with each layer building on the functionality of the layers beneath it: Layer 1: Foundation models - provide the core AI engines to power agent capabilities. Layer 2: Data operations - manage the complex data infrastructure required for AI agent operations, including Vector database, data loaders, RAG. Layer 3: Agent frameworks - sophisticated software and tools that simplify the development and management of the AI agents. Layer 4: Deployment and infrastructure - provide the robust technical foundation for running AI agents. Layer 5: Evaluation and observability - focus on assessing the safety and performance of AI agents. Layer 6: Security and compliance - a crucial protective framework ensuring AI agents operate safely, securely, and conform to regulatory boundaries. At this layer security and compliance features embedded into all the AI agent stack layers are integrated together. Layer 7: Agent ecosystem - represents the AI agents' interface with real-world applications and users. == Orchestration patterns == To execute complex tasks, autonomous agents are often integrated with other agents or specialized tools. These configurations, known as orchestration patterns or workflows, include the following: Prompt chaining: A sequence where the output of one step serves as the input for the next. Routing: The classification of an input to direct it to a specialized downstream task or tool. Parallelization: The simultaneous execution of multiple tasks. Sequential processing: A fixed, linear progression of tasks through a predefined pipeline. Planner-critic: An iterative pattern where one agent generates a proposal and another evaluates it to provide feedback for refinement. == Multimodal AI agents == In addition to large language models (LLMs), vision-language models (VLMs) and multimodal foundation models can be used as the basis for agents. In September 2024, Allen Institute for AI released an open-source vision-language model. Nvidia released a framework for developers to use VLMs, LLMs and retrieval-augmented generation for building AI agents that can analyze images and videos, including video search and video summarization. Microsoft released a multimodal agent model – trained on images, video, software user interface interactions, and robotics data – that the company claimed can manipulate software and robots. == Applications == As of April 2025, per the Associated Press, there are few real-world applications of AI agents. As of June 2025, per Fortune, many companies are primarily experimenting with AI agents. The Information divided AI agents into seven archetypes: business-task agents, for acting within enterprise software; conversational agents, which act as chatbots for customer support; research agents, for querying and analyzing information (such as OpenAI Deep Research); analytics agents, for analyzing data to create reports; software developer or coding agents (such as Cursor); domain-specific agents, which include specific subject matter knowledge; and web browser agents (such as OpenAI Operator). By mid-2025, AI agents have been used in video game development, gambling (including sports betting), cryptocurrency wallets (including cryptocurrency trading and meme coins) and social media. In August 2025, New York Magazine described software development as the most definitive use case of AI agents. Likewise, by October 2025, noting a decline in expectations, The Information noted AI coding agents and customer support as the primary use cases by businesses. In November 2025, The Wall Street Journal reported that few companies that deployed AI agents have received a return on investment. === Applications in government === Several government bodies in the United States and United Kingdom have deployed or announced the deployment of agents, at the local and national level. The city of Kyle, Texas deployed an AI agent from Salesforce in March 2025 for 311 customer service. In November 2025, the Internal Revenue Service stated that it would use Agentforce, AI agents from Salesforce, for the Office of Chief Counsel, Taxpayer Advocate Services and the Office of Appeals. That same month, Staffordshire Police announced that they would trial Agentforce agents for handling non-emergency 101 calls in the United Kingdom starting in 2026. In December 2025, the Department of Neighborhoods in Detroit, Michigan, in partnership with a local business, deployed a pilot project in two Detroit districts for an AI agent to be used for customer service calls. In February 2025, Thomas Shedd, the director of the Technology Transformation Services, proposed using AI coding agents across the United States federal government. A recruiter for the Department of Government Efficiency proposed in April 2025 to use AI agents to automate the work of about 70,000 United States federal government employees, as part of a startup with funding from OpenAI and a partnership agreement with Palantir. This proposal was criticized by experts for its impracticality, if not impossibility, and the lack of corresponding widespread adoption by businesses. In December 2025, the Food and Drug Administration announced that it would offer "agentic AI capabilities" to its staff for "meeting management, pre-market reviews, review validation, post-market surveillance, inspections and compliance and administrative functions." That same month, the United States Department of Defense launched GenAI.mil, an internal platform for American military personnel to use generative AI-based applications based on Google Gemini, including "intelligent agentic workflows". Defense Secretary Pete Hegseth listed applications such as "[conducting] deep r

    Read more →
  • Evolvability (computer science)

    Evolvability (computer science)

    The term evolvability is a framework of computational learning introduced by Leslie Valiant in his paper of the same name. The aim of this theory is to model biological evolution and categorize which types of mechanisms are evolvable. Evolution is an extension of PAC learning and learning from statistical queries. == General framework == Let F n {\displaystyle F_{n}\,} and R n {\displaystyle R_{n}\,} be collections of functions on n {\displaystyle n\,} variables. Given an ideal function f ∈ F n {\displaystyle f\in F_{n}} , the goal is to find by local search a representation r ∈ R n {\displaystyle r\in R_{n}} that closely approximates f {\displaystyle f\,} . This closeness is measured by the performance Perf ⁡ ( f , r ) {\displaystyle \operatorname {Perf} (f,r)} of r {\displaystyle r\,} with respect to f {\displaystyle f\,} . As is the case in the biological world, there is a difference between genotype and phenotype. In general, there can be multiple representations (genotypes) that correspond to the same function (phenotype). That is, for some r , r ′ ∈ R n {\displaystyle r,r'\in R_{n}} , with r ≠ r ′ {\displaystyle r\neq r'\,} , still r ( x ) = r ′ ( x ) {\displaystyle r(x)=r'(x)\,} for all x ∈ X n {\displaystyle x\in X_{n}} . However, this need not be the case. The goal then, is to find a representation that closely matches the phenotype of the ideal function, and the spirit of the local search is to allow only small changes in the genotype. Let the neighborhood N ( r ) {\displaystyle N(r)\,} of a representation r {\displaystyle r\,} be the set of possible mutations of r {\displaystyle r\,} . For simplicity, consider Boolean functions on X n = { − 1 , 1 } n {\displaystyle X_{n}=\{-1,1\}^{n}\,} , and let D n {\displaystyle D_{n}\,} be a probability distribution on X n {\displaystyle X_{n}\,} . Define the performance in terms of this. Specifically, Perf ⁡ ( f , r ) = ∑ x ∈ X n f ( x ) r ( x ) D n ( x ) . {\displaystyle \operatorname {Perf} (f,r)=\sum _{x\in X_{n}}f(x)r(x)D_{n}(x).} Note that Perf ⁡ ( f , r ) = Prob ⁡ ( f ( x ) = r ( x ) ) − Prob ⁡ ( f ( x ) ≠ r ( x ) ) . {\displaystyle \operatorname {Perf} (f,r)=\operatorname {Prob} (f(x)=r(x))-\operatorname {Prob} (f(x)\neq r(x)).} In general, for non-Boolean functions, the performance will not correspond directly to the probability that the functions agree, although it will have some relationship. Throughout an organism's life, it will only experience a limited number of environments, so its performance cannot be determined exactly. The empirical performance is defined by Perf s ⁡ ( f , r ) = 1 s ∑ x ∈ S f ( x ) r ( x ) , {\displaystyle \operatorname {Perf} _{s}(f,r)={\frac {1}{s}}\sum _{x\in S}f(x)r(x),} where S {\displaystyle S\,} is a multiset of s {\displaystyle s\,} independent selections from X n {\displaystyle X_{n}\,} according to D n {\displaystyle D_{n}\,} . If s {\displaystyle s\,} is large enough, evidently Perf s ⁡ ( f , r ) {\displaystyle \operatorname {Perf} _{s}(f,r)} will be close to the actual performance Perf ⁡ ( f , r ) {\displaystyle \operatorname {Perf} (f,r)} . Given an ideal function f ∈ F n {\displaystyle f\in F_{n}} , initial representation r ∈ R n {\displaystyle r\in R_{n}} , sample size s {\displaystyle s\,} , and tolerance t {\displaystyle t\,} , the mutator Mut ⁡ ( f , r , s , t ) {\displaystyle \operatorname {Mut} (f,r,s,t)} is a random variable defined as follows. Each r ′ ∈ N ( r ) {\displaystyle r'\in N(r)} is classified as beneficial, neutral, or deleterious, depending on its empirical performance. Specifically, r ′ {\displaystyle r'\,} is a beneficial mutation if Perf s ⁡ ( f , r ′ ) − Perf s ⁡ ( f , r ) ≥ t {\displaystyle \operatorname {Perf} _{s}(f,r')-\operatorname {Perf} _{s}(f,r)\geq t} ; r ′ {\displaystyle r'\,} is a neutral mutation if − t < Perf s ⁡ ( f , r ′ ) − Perf s ⁡ ( f , r ) < t {\displaystyle -t<\operatorname {Perf} _{s}(f,r')-\operatorname {Perf} _{s}(f,r) 0 {\displaystyle \epsilon >0\,} , for all ideal functions f ∈ F n {\displaystyle f\in F_{n}} and representations r 0 ∈ R n {\displaystyle r_{0}\in R_{n}} , with probability at least 1 − ϵ {\displaystyle 1-\epsilon \,} , Perf ⁡ ( f , r g ( n , 1 / ϵ ) ) ≥ 1 − ϵ , {\displaystyle \operatorname {Perf} (f,r_{g(n,1/\epsilon )})\geq 1-\epsilon ,} where the sizes of neighborhoods N ( r ) {\displaystyle N(r)\,} for r ∈ R n {\displaystyle r\in R_{n}\,} are at most p ( n , 1 / ϵ ) {\displaystyle p(n,1/\epsilon )\,} , the sample size is s ( n , 1 / ϵ ) {\displaystyle s(n,1/\epsilon )\,} , the tolerance is t ( 1 / n , ϵ ) {\displaystyle t(1/n,\epsilon )\,} , and the generation size is g ( n , 1 / ϵ ) {\displaystyle g(n,1/\epsilon )\,} . F {\displaystyle F\,} is evolvable over D {\displaystyle D\,} if it is evolvable by some R {\displaystyle R\,} over D {\displaystyle D\,} . F {\displaystyle F\,} is evolvable if it is evolvable over all distributions D {\displaystyle D\,} . == Results == The class of conjunctions and the class of disjunctions are evolvable over the uniform distribution for short conjunctions and disjunctions, respectively. The class of parity functions (which evaluate to the parity of the number of true literals in a given subset of literals) are not evolvable, even for the uniform distribution. Evolvability implies PAC learnability.

    Read more →
  • Plug computer

    Plug computer

    A plug computer is a small-form-factor computer whose chassis contains the AC power plug, and thus plugs directly into the wall. Alternatively, the computer may resemble an AC adapter or a similarly small device. Plug computers are often configured for use in the home or office as compact computer. == Description == Plug computers consist of a high-performance, low-power system-on-a-chip processor, with several I/O hardware ports (USB ports, Ethernet connectors, etc.). Most versions do not have provisions for connecting a display and are best suited to running media servers, back-up services, or file sharing and remote access functions; thus acting as a bridge between in-home protocols (such as Digital Living Network Alliance (DLNA) and Server Message Block (SMB)) and cloud-based services. There are, however, plug computer offerings that have analog VGA monitor and/or HDMI connectors, which, along with multiple USB ports, permit the use of a display, keyboard, and mouse, thus making them full-fledged, low-power alternatives to desktop and laptop computers. They typically run any of a number of Linux distributions. Plug computers typically consume little power and are inexpensive. == History == A number of other devices of this type began to appear at the 2009 Consumer Electronics Show. On January 6, 2009 CTERA Networks launched a device called CloudPlug that provides online backup at local disk speeds and overlays a file sharing service. The device also transforms any external USB hard drive into a network-attached storage device. On January 7, 2009, Cloud Engines unveiled the Pogoplug network access server. On January 8, 2009, Axentra announced availability of their HipServ platform. On February 23, 2009, Marvell Technology Group announced its plans to build a mini-industry around plug computers. On August 19, 2009, CodeLathe announced availability of their TonidoPlug network access server. On November 13, 2009 QuadAxis launched its plug computing device product line and development platform, featuring the QuadPlug and QuadPC and running QuadMix, a modified Linux. On January 5, 2010, Iomega announced their iConnect network access server. On January 7, 2010 Pbxnsip launched its plug computing device the sipJack running pbxnsip: an IP Communications platform.

    Read more →
  • Case-based reasoning

    Case-based reasoning

    Case-based reasoning (CBR), broadly construed, is the process of solving new problems based on the solutions of similar past problems. In everyday life, an auto mechanic who fixes an engine by recalling another car that exhibited similar symptoms is using case-based reasoning. A lawyer who advocates a particular outcome in a trial based on legal precedents or a judge who creates case law is using case-based reasoning. So, too, an engineer copying working elements of nature (practicing biomimicry) is treating nature as a database of solutions to problems. Case-based reasoning is a prominent type of analogy solution making. It has been argued that case-based reasoning is not only a powerful method for computer reasoning, but also a pervasive behavior in everyday human problem solving; or, more radically, that all reasoning is based on past cases personally experienced. This view is related to prototype theory, which is most deeply explored in cognitive science. == Process == Case-based reasoning has been formalized for purposes of computer reasoning as a four-step process: Retrieve: Given a target problem, retrieve cases relevant to solving it from memory. A case consists of a problem, its solution, and, typically, annotations about how the solution was derived. For example, suppose Fred wants to prepare blueberry pancakes. Being a novice cook, the most relevant experience he can recall is one in which he successfully made plain pancakes. The procedure he followed for making the plain pancakes, together with justifications for decisions made along the way, constitutes Fred's retrieved case. Reuse: Map the solution from the previous case to the target problem. This may involve adapting the solution as needed to fit the new situation. In the pancake example, Fred must adapt his retrieved solution to include the addition of blueberries. Revise: Having mapped the previous solution to the target situation, test the new solution in the real world (or a simulation) and, if necessary, revise. Suppose Fred adapted his pancake solution by adding blueberries to the batter. After mixing, he discovers that the batter has turned blue – an undesired effect. This suggests the following revision: delay the addition of blueberries until after the batter has been ladled into the pan. Retain: After the solution has been successfully adapted to the target problem, store the resulting experience as a new case in memory. Fred, accordingly, records his new-found procedure for making blueberry pancakes, thereby enriching his set of stored experiences, and better preparing him for future pancake-making demands. == Comparison to other methods == At first glance, CBR may seem similar to the rule induction algorithms of machine learning. Like a rule-induction algorithm, CBR starts with a set of cases or training examples; it forms generalizations of these examples, albeit implicit ones, by identifying commonalities between a retrieved case and the target problem. If for instance a procedure for plain pancakes is mapped to blueberry pancakes, a decision is made to use the same basic batter and frying method, thus implicitly generalizing the set of situations under which the batter and frying method can be used. The key difference, however, between the implicit generalization in CBR and the generalization in rule induction lies in when the generalization is made. A rule-induction algorithm draws its generalizations from a set of training examples before the target problem is even known; that is, it performs eager generalization. For instance, if a rule-induction algorithm were given recipes for plain pancakes, Dutch apple pancakes, and banana pancakes as its training examples, it would have to derive, at training time, a set of general rules for making all types of pancakes. It would not be until testing time that it would be given, say, the task of cooking blueberry pancakes. The difficulty for the rule-induction algorithm is in anticipating the different directions in which it should attempt to generalize its training examples. This is in contrast to CBR, which delays (implicit) generalization of its cases until testing time – a strategy of lazy generalization. In the pancake example, CBR has already been given the target problem of cooking blueberry pancakes; thus it can generalize its cases exactly as needed to cover this situation. CBR therefore tends to be a good approach for rich, complex domains in which there are myriad ways to generalize a case. In law, there is often explicit delegation of CBR to courts, recognizing the limits of rule based reasons: limiting delay, limited knowledge of future context, limit of negotiated agreement, etc. While CBR in law and cognitively inspired CBR have long been associated, the former is more clearly an interpolation of rule based reasoning, and judgment, while the latter is more closely tied to recall and process adaptation. The difference is clear in their attitude toward error and appellate review. Another name for case-based reasoning in problem solving is symptomatic strategies. It does require à priori domain knowledge that is gleaned from past experience which established connections between symptoms and causes. This knowledge is referred to as shallow, compiled, evidential, history-based as well as case-based knowledge. This is the strategy most associated with diagnosis by experts. Diagnosis of a problem transpires as a rapid recognition process in which symptoms evoke appropriate situation categories. An expert knows the cause by virtue of having previously encountered similar cases. Case-based reasoning is the most powerful strategy, and that used most commonly. However, the strategy won't work independently with truly novel problems, or where deeper understanding of whatever is taking place is sought. An alternative approach to problem solving is the topographic strategy which falls into the category of deep reasoning. With deep reasoning, in-depth knowledge of a system is used. Topography in this context means a description or an analysis of a structured entity, showing the relations among its elements. Also known as reasoning from first principles, deep reasoning is applied to novel faults when experience-based approaches aren't viable. The topographic strategy is therefore linked to à priori domain knowledge that is developed from a more a fundamental understanding of a system, possibly using first-principles knowledge. Such knowledge is referred to as deep, causal or model-based knowledge. Hoc and Carlier noted that symptomatic approaches may need to be supported by topographic approaches because symptoms can be defined in diverse terms. The converse is also true – shallow reasoning can be used abductively to generate causal hypotheses, and deductively to evaluate those hypotheses, in a topographical search. == Criticism == Critics of CBR argue that it is an approach that accepts anecdotal evidence as its main operating principle. Without statistically relevant data for backing and implicit generalization, there is no guarantee that the generalization is correct. However, all inductive reasoning where data is too scarce for statistical relevance is inherently based on anecdotal evidence. == History == CBR traces its roots to the work of Roger Schank and his students at Yale University in the early 1980s. Schank's model of dynamic memory was the basis for the earliest CBR systems: Janet Kolodner's CYRUS and Michael Lebowitz's IPP. Other schools of CBR and closely allied fields emerged in the 1980s, which directed at topics such as legal reasoning, memory-based reasoning (a way of reasoning from examples on massively parallel machines), and combinations of CBR with other reasoning methods. In the 1990s, interest in CBR grew internationally, as evidenced by the establishment of an International Conference on Case-Based Reasoning in 1995, as well as European, German, British, Italian, and other CBR workshops. CBR technology has resulted in the deployment of a number of successful systems, the earliest being Lockheed's CLAVIER, a system for laying out composite parts to be baked in an industrial convection oven. CBR has been used extensively in applications such as the Compaq SMART system and has found a major application area in the health sciences, as well as in structural safety management. There is recent work that develops CBR within a statistical framework and formalizes case-based inference as a specific type of probabilistic inference. Thus, it becomes possible to produce case-based predictions equipped with a certain level of confidence. One description of the difference between CBR and induction from instances is that statistical inference aims to find what tends to make cases similar while CBR aims to encode what suffices to claim similarly.

    Read more →
  • Wetware (brain)

    Wetware (brain)

    Wetware is a term drawn from the computer-related idea of hardware or software, but applied to biological life forms. == Usage == The prefix "wet" is a reference to the water found in living creatures. Wetware is used to describe the elements equivalent to hardware and software found in a person, especially the central nervous system (CNS) and the human mind. The term wetware finds use in works of fiction, in scholarly publications and in popularizations. The "hardware" component of wetware concerns the bioelectric and biochemical properties of the CNS, specifically the brain. If the sequence of impulses traveling across the various neurons are thought of symbolically as software, then the physical neurons would be the hardware. The amalgamated interaction of this software and hardware is manifested through continuously changing physical connections, and chemical and electrical influences that spread across the body. The process by which the mind and brain interact to produce the collection of experiences that we define as self-awareness is in question. == History == Although the exact definition has shifted over time, the term Wetware and its fundamental reference to "the physical mind" has been around at least since the mid-1950s. Mostly used in relatively obscure articles and papers, it was not until the heyday of cyberpunk, however, that the term found broad adoption. Among the first uses of the term in popular culture was the Bruce Sterling novel Schismatrix (1985) and the Michael Swanwick novel Vacuum Flowers (1987). Rudy Rucker references the term in a number of books, including one entitled Wetware (1988): ... all sparks and tastes and tangles, all its stimulus/response patterns – the whole bio-cybernetic software of mind. Rucker did not use the word to simply mean a brain, nor in the human-resources sense of employees. He used wetware to stand for the data found in any biological system, analogous perhaps to the firmware that is found in a ROM chip. In Rucker's sense, a seed, a plant graft, an embryo, or a biological virus are all wetware. DNA, the immune system, and the evolved neural architecture of the brain are further examples of wetware in this sense. Rucker describes his conception in a 1992 compendium The Mondo 2000 User's Guide to the New Edge, which he quotes in a 2007 blog entry. Early cyber-guru Arthur Kroker used the term in his blog. With the term getting traction in trendsetting publications, it became a buzzword in the early 1990s. In 1991, Dutch media theorist Geert Lovink organized the Wetware Convention in Amsterdam, which was supposed to be an antidote to the "out-of-body" experiments conducted in high-tech laboratories, such as experiments in virtual reality. Timothy Leary, in an appendix to Info-Psychology originally written in 1975–76 and published in 1989, used the term wetware, writing that "psychedelic neuro-transmitters were the hot new technology for booting-up the 'wetware' of the brain". Another common reference is: "Wetware has 7 plus or minus 2 temporary registers." The numerical allusion is to a classic 1957 article by George A. Miller, The magical number 7 plus or minus two: some limits in our capacity for processing information, which later gave way to Miller's law.

    Read more →
  • Slopaganda

    Slopaganda

    Slopaganda is a portmanteau of "AI slop" and "propaganda", referring to AI-generated content designed to manipulate beliefs, emotions, and political decision-making at scale. The term is credited to Michał Klincewicz, an assistant professor in the Department of Computational Cognitive Science at Tilburg University, in 2025. == Definition == Slopaganda is distinguished from traditional propaganda by three features: scale, scope, and speed. Generative AI makes it possible to produce large volumes of content quickly and at low cost, allows for highly personalised and targeted messaging to specific sub-audiences, and leverages the hyper-connectivity of social networks to accelerate dissemination beyond what conventional media could achieve. Unlike traditional propaganda, which delivers a uniform message to all recipients, slopaganda can be micro-targeted — tailored to individuals based on estimated prior beliefs to reinforce political biases or emotional associations. The authors note that it need not aim at literal deception: much slopaganda is expressive rather than truth-apt, designed to create emotional associations rather than false factual beliefs. == Relation to AI slop == Slopaganda is a subset of AI slop — low-quality, mass-produced AI-generated content — distinguished by intent. Where AI slop may be produced indifferently for commercial or engagement-farming purposes, slopaganda is deployed with a deliberate political or ideological goal. == Notable examples == Examples discussed by the term's originators include Donald Trump's prolific use of AI in Truth Social posts and Iranian Lego-themed music videos. AI-generated videos posted by the White House mixing real military footage with clips from films and video games; and deepfake audio imitating political candidates during the 2024 US presidential campaign have also been given the label slopaganda.

    Read more →
  • Clean Email

    Clean Email

    Clean Email is an automated software as a service email management application which identifies and clears junk mail from inboxes. The service uses a subscription business model with a free trial for the first 1,000 emails. and is available on macOS, iOS, Android, and on the web. == History == Clean Email is a self-funded company headquartered in Los Angeles, California. Initially developed by the founder for personal use, the service was designed to address the growing issue of inbox clutter and privacy concerns. In 2017, John Gruber recognized Clean Email as a trustworthy alternative to Unroll.me after the latter was found to be selling user data. == Features == Clean Email uses algorithms to identify and categorize emails, enabling users to group, remove, label, and archive email messages in bulk. Its Unsubscriber tool consolidates all subscriptions and newsletters into a single view for quick management, allowing users to bulk unsubscribe or temporarily pause mail. Its Screener feature transforms the inbox into an "opt-in" system, enabling users to pre-approve mail from new senders. Cleaning Suggestions identifies frequently cleaned mail, recommending actions accordingly. Additional functionalities include automatic deletion of aging emails, delivery of messages to specified folders, and options to mute or block senders.

    Read more →
  • AI nationalism

    AI nationalism

    AI nationalism is the idea that nations should develop and control their own artificial intelligence technologies to advance their own interests and ensure technological sovereignty. This concept is gaining traction globally, leading countries to implement new laws, form strategic alliances, and invest significantly in domestic AI capabilities. == Global trends and national strategies == In 2018, British technology investor Ian Hogarth published an influential essay titled AI Nationalism. He argued that as AI gains more power and its economic and military significance expands, governments will take measures to bolster their own domestic AI industries, and predicted that the advancement of machine learning systems would lead to what he termed "AI nationalism." He anticipated that this rise in AI would accelerate a global arms race, resulting in more closed economies, restrictions on foreign acquisitions, and limitations on the movement of talent. Hogarth predicted that AI policy would become a central focus of government agendas. He also criticized Britain’s approach to AI strategy, citing the sale of London-based DeepMind—one of the leading AI laboratories, acquired by Google for a relatively modest £400 million in 2014—as a significant misstep. AI nationalism is chiefly reflected in the escalating rhetoric of an artificial intelligence arms race, portraying AI development as a zero-sum game where the winner gains significant economic, political, and military advantages. This mindset, as highlighted in a 2017 Pentagon report, warns that sharing AI technology could erode technological supremacy and enhance rivals' capabilities. The winner-takes-all mentality of AI nationalism poses risks including unsafe AI development, increased geopolitical tension, and potential military aggression (such as cyberattacks or targeting AI professionals). Several countries, including Canada, France, and India, have formulated national strategies to advance their positions in AI. In the United States, a leading player in the global AI arena, trade policies have been enacted to restrict China's access to critical microchips, reflecting a strategic effort to maintain a technological edge. The United States’ National Security Commission on Artificial Intelligence (NSCAI) frames AI development as a critical aspect of a broader technology competition crucial for national success. It emphasizes the need to outpace China in AI to maintain strategic advantage, reflecting AI nationalism by linking geopolitical power directly to advancements in AI. France has seen notable governmental support for local AI startups, particularly those specializing in language technologies that cater to French and other non-English languages. In Saudi Arabia, Crown Prince Mohammed bin Salman is investing billions in AI research and development. The country has actively collaborated with major technology firms such as Amazon, IBM, and Microsoft to establish itself as a prominent AI hub. == Historical and cultural context == AI nationalism is seen as deeply connected to historical racism and imperialism. It is viewed not merely as a technological competition but as a contest over racial and civilizational superiority. Historically, technological achievements were often used to justify colonialism and racial hierarchies, with Western societies perceiving their advancements as evidence of superiority. In the context of AI, this historical context continues to shape views on intelligence and development. Some argue that AI nationalism reinforces the idea of fundamental civilizational divides, especially between the Western world and China. This perspective often frames China's progress in AI as a direct challenge to Western values, presenting the AI competition as a struggle over values. AI nationalism is said to draw from long-standing anti-Asian stereotypes, such as the "Yellow Peril," which portray Asian nations as threats to Western civilization. This viewpoint links Asian technological advances with dehumanization and artificiality, reflecting persistent anxieties about China's growing role in the global tech landscape. == Implications == AI nationalism is seen as a component of a broader trend towards the fragmentation of the internet, where digital services are increasingly influenced by local regulations and national interests. This shift is creating a new technological landscape in which the impact of artificial intelligence on individuals' lives can vary significantly depending on their geographic location. J. Paul Goode argues that AI nationalism may exacerbate existing societal divisions by promoting the development of systems that embed cultural biases, thereby privileging certain groups while disadvantaging others.

    Read more →
  • Video Super Resolution

    Video Super Resolution

    RTX Video Super Resolution (RTX VSR) is a video scaling feature by Nvidia. It was released on February 28, 2023. == History == The feature was first unveiled during CES 2023 as RTX Video Super Resolution. It uses the on-board Tensor Cores to upscale browser video content in real time. Video Super Resolution was initially only available on RTX 30 and 40 series GPUs, while support for 20 series GPUs was added afterwards; it is now available on all Nvidia RTX-branded GPUs. The feature supports input resolutions from 360p to 1440p and a max output of 4K and comes without support for HDR content although that could be likely added in the future. Nvidia released RTX Video Super Resolution 1.5 with improved video quality and RTX 20 series support on October 17, 2023. == Reception == According to ComputerBase, although "the algorithm is not yet working flawlessly", the feature is "overall recommendable".

    Read more →
  • Feature hashing

    Feature hashing

    In machine learning, feature hashing, also known as the hashing trick (by analogy to the kernel trick), is a fast and space-efficient way of vectorizing features, i.e. turning arbitrary features into indices in a vector or matrix. It works by applying a hash function to the features and using their hash values as indices directly (after a modulo operation), rather than looking the indices up in an associative array. In addition to its use for encoding non-numeric values, feature hashing can also be used for dimensionality reduction. This trick is often attributed to Weinberger et al. (2009), but there exists a much earlier description of this method published by John Moody in 1989. == Motivation == === Motivating example === In a typical document classification task, the input to the machine learning algorithm (both during learning and classification) is free text. From this, a bag of words (BOW) representation is constructed: the individual tokens are extracted and counted, and each distinct token in the training set defines a feature (independent variable) of each of the documents in both the training and test sets. Machine learning algorithms, however, are typically defined in terms of numerical vectors. Therefore, the bags of words for a set of documents is regarded as a term-document matrix where each row is a single document, and each column is a single feature/word; the entry i, j in such a matrix captures the frequency (or weight) of the j'th term of the vocabulary in document i. (An alternative convention swaps the rows and columns of the matrix, but this difference is immaterial.) Typically, these vectors are extremely sparse—according to Zipf's law. The common approach is to construct, at learning time or prior to that, a dictionary representation of the vocabulary of the training set, and use that to map words to indices. Hash tables and tries are common candidates for dictionary implementation. E.g., the three documents John likes to watch movies. Mary likes movies too. John also likes football. can be converted, using the dictionary to the term-document matrix ( John likes to watch movies Mary too also football 1 1 1 1 1 0 0 0 0 0 1 0 0 1 1 1 0 0 1 1 0 0 0 0 0 1 1 ) {\displaystyle {\begin{pmatrix}{\textrm {John}}&{\textrm {likes}}&{\textrm {to}}&{\textrm {watch}}&{\textrm {movies}}&{\textrm {Mary}}&{\textrm {too}}&{\textrm {also}}&{\textrm {football}}\\1&1&1&1&1&0&0&0&0\\0&1&0&0&1&1&1&0&0\\1&1&0&0&0&0&0&1&1\end{pmatrix}}} (Punctuation was removed, as is usual in document classification and clustering.) The problem with this process is that such dictionaries take up a large amount of storage space and grow in size as the training set grows. On the contrary, if the vocabulary is kept fixed and not increased with a growing training set, an adversary may try to invent new words or misspellings that are not in the stored vocabulary so as to circumvent a machine learned filter. To address this challenge, Yahoo! Research attempted to use feature hashing for their spam filters. Note that the hashing trick isn't limited to text classification and similar tasks at the document level, but can be applied to any problem that involves large (perhaps unbounded) numbers of features. === Mathematical motivation === Mathematically, a token is an element t {\displaystyle t} in a finite (or countably infinite) set T {\displaystyle T} . Suppose we only need to process a finite corpus, then we can put all tokens appearing in the corpus into T {\displaystyle T} , meaning that T {\displaystyle T} is finite. However, suppose we want to process all possible words made of the English letters, then T {\displaystyle T} is countably infinite. Most neural networks can only operate on real vector inputs, so we must construct a "dictionary" function ϕ : T → R n {\displaystyle \phi :T\to \mathbb {R} ^{n}} . When T {\displaystyle T} is finite, of size | T | = m ≤ n {\displaystyle |T|=m\leq n} , then we can use one-hot encoding to map it into R n {\displaystyle \mathbb {R} ^{n}} . First, arbitrarily enumerate T = { t 1 , t 2 , . . , t m } {\displaystyle T=\{t_{1},t_{2},..,t_{m}\}} , then define ϕ ( t i ) = e i {\displaystyle \phi (t_{i})=e_{i}} . In other words, we assign a unique index i {\displaystyle i} to each token, then map the token with index i {\displaystyle i} to the unit basis vector e i {\displaystyle e_{i}} . One-hot encoding is easy to interpret, but it requires one to maintain the arbitrary enumeration of T {\displaystyle T} . Given a token t ∈ T {\displaystyle t\in T} , to compute ϕ ( t ) {\displaystyle \phi (t)} , we must find out the index i {\displaystyle i} of the token t {\displaystyle t} . Thus, to implement ϕ {\displaystyle \phi } efficiently, we need a fast-to-compute bijection h : T → { 1 , . . . , m } {\displaystyle h:T\to \{1,...,m\}} , then we have ϕ ( t ) = e h ( t ) {\displaystyle \phi (t)=e_{h(t)}} . In fact, we can relax the requirement slightly: It suffices to have a fast-to-compute injection h : T → { 1 , . . . , n } {\displaystyle h:T\to \{1,...,n\}} , then use ϕ ( t ) = e h ( t ) {\displaystyle \phi (t)=e_{h(t)}} . In practice, there is no simple way to construct an efficient injection h : T → { 1 , . . . , n } {\displaystyle h:T\to \{1,...,n\}} . However, we do not need a strict injection, but only an approximate injection. That is, when t ≠ t ′ {\displaystyle t\neq t'} , we should probably have h ( t ) ≠ h ( t ′ ) {\displaystyle h(t)\neq h(t')} , so that probably ϕ ( t ) ≠ ϕ ( t ′ ) {\displaystyle \phi (t)\neq \phi (t')} . At this point, we have just specified that h {\displaystyle h} should be a hashing function. Thus we reach the idea of feature hashing. == Algorithms == === Feature hashing (Weinberger et al. 2009) === The basic feature hashing algorithm presented in (Weinberger et al. 2009) is defined as follows. First, one specifies two hash functions: the kernel hash h : T → { 1 , 2 , . . . , n } {\displaystyle h:T\to \{1,2,...,n\}} , and the sign hash ζ : T → { − 1 , + 1 } {\displaystyle \zeta :T\to \{-1,+1\}} . Next, one defines the feature hashing function: ϕ : T → R n , ϕ ( t ) = ζ ( t ) e h ( t ) {\displaystyle \phi :T\to \mathbb {R} ^{n},\quad \phi (t)=\zeta (t)e_{h(t)}} Finally, extend this feature hashing function to strings of tokens by ϕ : T ∗ → R n , ϕ ( t 1 , . . . , t k ) = ∑ j = 1 k ϕ ( t j ) {\displaystyle \phi :T^{}\to \mathbb {R} ^{n},\quad \phi (t_{1},...,t_{k})=\sum _{j=1}^{k}\phi (t_{j})} where T ∗ {\displaystyle T^{}} is the set of all finite strings consisting of tokens in T {\displaystyle T} . Equivalently, ϕ ( t 1 , . . . , t k ) = ∑ j = 1 k ζ ( t j ) e h ( t j ) = ∑ i = 1 n ( ∑ j : h ( t j ) = i ζ ( t j ) ) e i {\displaystyle \phi (t_{1},...,t_{k})=\sum _{j=1}^{k}\zeta (t_{j})e_{h(t_{j})}=\sum _{i=1}^{n}\left(\sum _{j:h(t_{j})=i}\zeta (t_{j})\right)e_{i}} ==== Geometric properties ==== We want to say something about the geometric property of ϕ {\displaystyle \phi } , but T {\displaystyle T} , by itself, is just a set of tokens, we cannot impose a geometric structure on it except the discrete topology, which is generated by the discrete metric. To make it nicer, we lift it to T → R T {\displaystyle T\to \mathbb {R} ^{T}} , and lift ϕ {\displaystyle \phi } from ϕ : T → R n {\displaystyle \phi :T\to \mathbb {R} ^{n}} to ϕ : R T → R n {\displaystyle \phi :\mathbb {R} ^{T}\to \mathbb {R} ^{n}} by linear extension: ϕ ( ( x t ) t ∈ T ) = ∑ t ∈ T x t ζ ( t ) e h ( t ) = ∑ i = 1 n ( ∑ t : h ( t ) = i x t ζ ( t ) ) e i {\displaystyle \phi ((x_{t})_{t\in T})=\sum _{t\in T}x_{t}\zeta (t)e_{h(t)}=\sum _{i=1}^{n}\left(\sum _{t:h(t)=i}x_{t}\zeta (t)\right)e_{i}} There is an infinite sum there, which must be handled at once. There are essentially only two ways to handle infinities. One may impose a metric, then take its completion, to allow well-behaved infinite sums, or one may demand that nothing is actually infinite, only potentially so. Here, we go for the potential-infinity way, by restricting R T {\displaystyle \mathbb {R} ^{T}} to contain only vectors with finite support: ∀ ( x t ) t ∈ T ∈ R T {\displaystyle \forall (x_{t})_{t\in T}\in \mathbb {R} ^{T}} , only finitely many entries of ( x t ) t ∈ T {\displaystyle (x_{t})_{t\in T}} are nonzero. Define an inner product on R T {\displaystyle \mathbb {R} ^{T}} in the obvious way: ⟨ e t , e t ′ ⟩ = { 1 , if t = t ′ , 0 , else. ⟨ x , x ′ ⟩ = ∑ t , t ′ ∈ T x t x t ′ ⟨ e t , e t ′ ⟩ {\displaystyle \langle e_{t},e_{t'}\rangle ={\begin{cases}1,{\text{ if }}t=t',\\0,{\text{ else.}}\end{cases}}\quad \langle x,x'\rangle =\sum _{t,t'\in T}x_{t}x_{t'}\langle e_{t},e_{t'}\rangle } As a side note, if T {\displaystyle T} is infinite, then the inner product space R T {\displaystyle \mathbb {R} ^{T}} is not complete. Taking its completion would get us to a Hilbert space, which allows well-behaved infinite sums. Now we have an inner product space, with enough structure to describe the geometry of the feature hashing function ϕ : R T → R n {\displaystyle \phi :\ma

    Read more →
  • Frame grabber

    Frame grabber

    A frame grabber is an electronic device that captures (i.e., "grabs") individual, digital still frames from an analog video signal or a digital video stream. It is usually employed as a component of a computer vision system, in which video frames are captured in digital form and then displayed, stored, transmitted, analyzed, or combinations of these. Historically, frame grabber expansion cards were the predominant way to interface cameras to PCs. Other interface methods have emerged since then, with frame grabbers (and in some cases, cameras with built-in frame grabbers) connecting to computers via interfaces such as USB, Ethernet and IEEE 1394 ("FireWire"). Early frame grabbers typically had only enough memory to store a single digitized video frame, whereas many modern frame grabbers can store multiple frames. Modern frame grabbers often are able to perform functions beyond capturing a single video input. For example, some devices capture audio in addition to video, and some devices provide, and concurrently capture frames from multiple video inputs. Other operations may be performed as well, such as deinterlacing, text or graphics overlay, image transformations (e.g., resizing, rotation, mirroring), and conversion to JPEG or other compressed image formats. To satisfy the technological demands of applications such as radar acquisition, manufacturing and remote guidance, some frame grabbers can capture images at high frame rates, high resolutions, or both. == Circuitry == Analog frame grabbers, which accept and process analog video signals, include these circuits: Input signal conditioner that buffers the analog video input signal to protect downstream circuitry Video decoder that converts SD analog video (e.g., NTSC, SECAM, PAL) or HD analog video (e.g., AHD, HD-TVI, HD-CVI) to a digital format Digital frame grabbers, which accept and process digital video streams, include these circuits: Digital video decoder that interfaces to and converts a specific type of digital video source, such as Camera Link, CoaXPress, DVI, GigE Vision, LVDS, or SDI Circuitry common to both analog and digital frame grabbers: Memory for storing the acquired image (i.e., a frame buffer) A bus interface through which a processor can control the acquisition and access the data General purpose I/O for triggering image acquisition or controlling external equipment == Applications == === Healthcare === Frame grabbers are used in medicine for many applications, including telenursing and remote guidance. In situations where an expert at another location needs to be consulted, frame grabbers capture the image or video from the appropriate medical equipment, so it can be sent digitally to the distant expert. === Manufacturing === "Pick and place" machines are often used to mount electronic components on circuit boards during the circuit board assembly process. Such machines use one or more cameras to monitor the robotics that places the components. Each camera is paired with a frame grabber that digitizes the analog video, thus converting the video to a form that can be processed by the machine software. === Network security === Frame grabbers may be used in security applications. For example, when a potential breach of security is detected, a frame grabber captures an image or a sequence of images, and then the images are transmitted across a digital network where they are recorded and viewed by security personnel. === Personal use === In recent years with the rise of personal video recorders like camcorders, mobile phones, etc. video and photo applications have gained ascending prominence. Frame grabbing is becoming very popular on these devices. === Astronomy & astrophotography === Amateur astronomers and astrophotographers use frame grabbers when using analog "low light" cameras for live image display and internet video broadcasting of celestial objects. Frame grabbers are essential to connect the analog cameras used in this application to the computers that store or process the images.

    Read more →
  • Computer audition

    Computer audition

    Computer audition (CA) or machine listening is the general field of study of algorithms and systems for audio interpretation by machines. Since the notion of what it means for a machine to "hear" is very broad and somewhat vague, computer audition attempts to bring together several disciplines that originally dealt with specific problems or had a concrete application in mind. The engineer Paris Smaragdis, interviewed in Technology Review, talks about these systems — "software that uses sound to locate people moving through rooms, monitor machinery for impending breakdowns, or activate traffic cameras to record accidents." Inspired by models of human audition, CA deals with questions of representation, transduction, grouping, use of musical knowledge and general sound semantics for the purpose of performing intelligent operations on audio and music signals by the computer. Technically this requires a combination of methods from the fields of signal processing, auditory modelling, music perception and cognition, pattern recognition, and machine learning, as well as more traditional methods of artificial intelligence for musical knowledge representation. == Applications == Like computer vision versus image processing, computer audition versus audio engineering deals with understanding of audio rather than processing. It also differs from problems of speech understanding by machine since it deals with general audio signals, such as natural sounds and musical recordings. Applications of computer audition are widely varying, and include search for sounds, genre recognition, acoustic monitoring, music transcription, score following, audio texture, music improvisation, emotion in audio and so on. == Related disciplines == Computer Audition overlaps with the following disciplines: Music information retrieval: methods for search and analysis of similarity between music signals. Auditory scene analysis: understanding and description of audio sources and events. Computational musicology and mathematical music theory: use of algorithms that employ musical knowledge for analysis of music data. Computer music: use of computers in creative musical applications. Machine musicianship: audition driven interactive music systems. == Areas of study == Since audio signals are interpreted by the human ear–brain system, that complex perceptual mechanism should be simulated somehow in software for "machine listening". In other words, to perform on par with humans, the computer should hear and understand audio content much as humans do. Analyzing audio accurately involves several fields: electrical engineering (spectrum analysis, filtering, and audio transforms); artificial intelligence (machine learning and sound classification); psychoacoustics (sound perception); cognitive sciences (neuroscience and artificial intelligence); acoustics (physics of sound production); and music (harmony, rhythm, and timbre). Furthermore, audio transformations such as pitch shifting, time stretching, and sound object filtering, should be perceptually and musically meaningful. For best results, these transformations require perceptual understanding of spectral models, high-level feature extraction, and sound analysis/synthesis. Finally, structuring and coding the content of an audio file (sound and metadata) could benefit from efficient compression schemes, which discard inaudible information in the sound. Computational models of music and sound perception and cognition can lead to a more meaningful representation, a more intuitive digital manipulation and generation of sound and music in musical human-machine interfaces. The study of CA could be roughly divided into the following sub-problems: Representation: signal and symbolic. This aspect deals with time-frequency representations, both in terms of notes and spectral models, including pattern playback and audio texture. Feature extraction: sound descriptors, segmentation, onset, pitch and envelope detection, chroma, and auditory representations. Musical knowledge structures: analysis of tonality, rhythm, and harmonies. Sound similarity: methods for comparison between sounds, sound identification, novelty detection, segmentation, and clustering. Sequence modeling: matching and alignment between signals and note sequences. Source separation: methods of grouping of simultaneous sounds, such as multiple pitch detection and time-frequency clustering methods. Auditory cognition: modeling of emotions, anticipation and familiarity, auditory surprise, and analysis of musical structure. Multi-modal analysis: finding correspondences between textual, visual, and audio signals. === Representation issues === Computer audition deals with audio signals that can be represented in a variety of fashions, from direct encoding of digital audio in two or more channels to symbolically represented synthesis instructions. Audio signals are usually represented in terms of analogue or digital recordings. Digital recordings are samples of acoustic waveform or parameters of audio compression algorithms. One of the unique properties of musical signals is that they often combine different types of representations, such as graphical scores and sequences of performance actions that are encoded as MIDI files. Since audio signals usually comprise multiple sound sources, then unlike speech signals that can be efficiently described in terms of specific models (such as source-filter model), it is hard to devise a parametric representation for general audio. Parametric audio representations usually use filter banks or sinusoidal models to capture multiple sound parameters, sometimes increasing the representation size in order to capture internal structure in the signal. Additional types of data that are relevant for computer audition are textual descriptions of audio contents, such as annotations, reviews, and visual information in the case of audio-visual recordings. === Features === Description of contents of general audio signals usually requires extraction of features that capture specific aspects of the audio signal. Generally speaking, one could divide the features into signal or mathematical descriptors such as energy, description of spectral shape etc., statistical characterization such as change or novelty detection, special representations that are better adapted to the nature of musical signals or the auditory system, such as logarithmic growth of sensitivity (bandwidth) in frequency or octave invariance (chroma). Since parametric models in audio usually require very many parameters, the features are used to summarize properties of multiple parameters in a more compact or salient representation. === Musical knowledge === Finding specific musical structures is possible by using musical knowledge as well as supervised and unsupervised machine learning methods. Examples of this include detection of tonality according to distribution of frequencies that correspond to patterns of occurrence of notes in musical scales, distribution of note onset times for detection of beat structure, distribution of energies in different frequencies to detect musical chords and so on. === Sound similarity and sequence modeling === Comparison of sounds can be done by comparison of features with or without reference to time. In some cases an overall similarity can be assessed by close values of features between two sounds. In other cases when temporal structure is important, methods of dynamic time warping need to be applied to "correct" for different temporal scales of acoustic events. Finding repetitions and similar sub-sequences of sonic events is important for tasks such as texture synthesis and machine improvisation. === Source separation === Since one of the basic characteristics of general audio is that it comprises multiple simultaneously sounding sources, such as multiple musical instruments, people talking, machine noises or animal vocalization, the ability to identify and separate individual sources is very desirable. Unfortunately, there are no methods that can solve this problem in a robust fashion. Existing methods of source separation rely sometimes on correlation between different audio channels in multi-channel recordings. The ability to separate sources from stereo signals requires different techniques than those usually applied in communications where multiple sensors are available. Other source separation methods rely on training or clustering of features in mono recording, such as tracking harmonically related partials for multiple pitch detection. Some methods, before explicit recognition, rely on revealing structures in data without knowing the structures (like recognizing objects in abstract pictures without attributing them meaningful labels) by finding the least complex data representations, for instance describing audio scenes as generated by a few tone patterns and their trajectories (polyphonic voices) and acoustical contours drawn by a tone (c

    Read more →
  • Pythia (machine learning)

    Pythia (machine learning)

    Pythia is an ancient text restoration model that recovers missing characters from damaged text input using deep neural networks. It was created by Yannis Assael, Thea Sommerschield, and Jonathan Prag, researchers from Google DeepMind and the University of Oxford. To study the society and the history of ancient civilisations, ancient history relies on disciplines such as epigraphy, the study of ancient inscribed texts. Hundreds of thousands of these texts, known as inscriptions, have survived to our day, but are often damaged over the centuries. Illegible parts of the text must then be restored by specialists, called epigraphists, in order to extract meaningful information from the text and use it to expand our knowledge of the context in which the text was written. Pythia takes as input the damaged text, and is trained to return hypothesised restorations of ancient Greek inscriptions, working as an assistive aid for ancient historians. Its neural network architecture works at both the character- and word-level, thereby effectively handling long-term context information, and dealing efficiently with incomplete word representations. Pythia is applicable to any discipline dealing with ancient texts (philology, papyrology, codicology) and can work in any language (ancient or modern).

    Read more →