AI Video Generator Tools

AI Video Generator Tools — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Showbox.com

    Showbox.com

    Showbox is an online video streaming platform that enables users to stream and download many videos, commonly movies and TV shows, for free. == History == The company opened the platforms to users who registered from its beta in late 2015. The platform was officially launched in February 2016, enabling any visitor to sign up and create videos online. In April 2016, Showbox was featured on the Product Hunt website, coming to the top of the website's lists for that day and week with over 1400 upvotes from the Product Hunt community. Also in April 2016, Showbox partnered with YouTube's leading multi-channel networks, including Fullscreen, BroadbandTV, StyleHaul, AwesomenessTV, and BuzzMyVideos, to enable their communities of creators to access the platform. In June 2016, the company launched Showbox For Brands, a business-oriented video creation platform, enabling companies to create video content in-house and with their communities and influencers. In March 2017, the company launched Showbox Engage, a use case of its B2B product launched in 2016, enabling companies to launch user-generated content campaigns with their communities. In April 2017, Showbox and the United Nations announced a partnership around the 70th anniversary of the declaration of human rights, with an annual, ongoing global campaign in 135 languages, inviting people worldwide to create their part of the declaration in a video from anywhere around the world. In November 2017, Showbox partnered with the Ad:tech and Digital Marketing World Forum conferences (DMWF) in New York to provide their users and communities with a User Generated Content video solution. == Technology == Showbox's video creation technology includes an online green screen feature, proprietary computer vision algorithms, deep learning technology to support the automatic creation of videos in the cloud, and advanced video composition, including special effects. == Coverage and awards == In March 2015, Showbox was nominated as one of the 10 Israeli startups to take over our TV screens this year. In July 2016, Showbox won the Publicis90 award as part of Publicis' "global initiative to foster digital entrepreneurship". In March 2017, Showbox was chosen as one of The Culture Trip's 10 startups to watch for in 2017.

    Read more →
  • Label noise

    Label noise

    Label noise refers to errors or inaccuracies in the class labels of data instances. This is a widespread issue in machine learning datasets, arising from human annotator mistakes, unclear labeling instructions, automated labeling methods, or adversarial attacks in supervised learning. Label noise can be roughly divided into random noise, where labels are flipped independently of input features, and systematic noise, where mislabeling is dependent on certain patterns or biases in the data. Label noise can be damaging to model performance, especially for complex models that may overfit to noisy labels rather than generalizable patterns. Many approaches have been proposed to deal with the effects of label noise, including robust loss functions, noise-tolerant algorithms, data cleaning methods, and semi-supervised learning approaches. To reduce the impact of wrong labels during training, techniques like label smoothing, sample reweighting and using trusted validation sets are used. The role of noise-robust training paradigms and curriculum learning strategies to improve resilience against mislabeled data is also explored in recent research.

    Read more →
  • AI agent

    AI agent

    In the context of generative artificial intelligence, AI agents (also referred to as compound AI systems or agentic AI) are a class of intelligent agents that can pursue goals, use tools, and take actions with varying degrees of autonomy. In practice, they usually operate within human-defined objectives, constraints, and available tools. == Overview == AI agents possess several key attributes, including goal-directed behavior, natural language interfaces, the capacity to use external tools, and the ability to perform multi-step tasks. Their control flow is frequently driven by large language models (LLMs). Agent systems may also include memory components, planning logic, tool interfaces, and orchestration software for coordinating agent components. AI agents do not have a standard definition. NIST describes agentic AI as an emerging area requiring standards for secure operation, interoperability, and reliable interaction with external systems. A common application of AI agents is task automation: for example, booking travel plans based on a user's prompted request. Companies such as Google, Microsoft and Amazon Web Services have offered platforms for deploying pre-built AI agents. Several protocols have been proposed for standardizing inter-agent communication, with examples including the Model Context Protocol, Gibberlink, and many others. Some of these protocols are also used for connecting agents to external applications. In December 2025, Linux Foundation announced the formation of the Agentic AI Foundation (AAIF), with the goal of ensuring agentic AI evolves transparently and collaboratively. == History == AI agents have been traced back to research from the 1990s, with Harvard professor Milind Tambe noting that the definition of an AI agent was not clear at the time. Researcher Andrew Ng has been credited with spreading the term "agentic" to a wider audience in 2024. == Training and testing == Researchers have attempted to build world models and reinforcement learning environments to train or evaluate AI agents. For example, video games such as Minecraft and No Man's Sky as well as replicas of company websites, have also been used for training such agents. == Autonomous capabilities == The Financial Times compared the autonomy of AI agents to the SAE classification of self-driving cars, likening most applications to level 2 or level 3, with some achieving level 4 in highly specialized circumstances, and level 5 being theoretical. == Cognitive architecture == The following are some internal design options for reasoning within an agent: Retrieval-augmented generation ReAct (Reason + Act) pattern is an iterative process in which an AI agent alternates between reasoning and taking actions, receives observations from the environment or external tools, and integrates these observations into subsequent reasoning steps. Reflexion, which uses an LLM to create feedback on the agent's plan of action and stores that feedback in a memory cache. A tool/agent registry, for organizing software functions or other agents that the agent can use. One-shot model querying, which queries the model once to create the plan of action. === Reference architecture === Ken Huang proposed an AI agent reference architecture, which consists of seven interconnected layers, with each layer building on the functionality of the layers beneath it: Layer 1: Foundation models - provide the core AI engines to power agent capabilities. Layer 2: Data operations - manage the complex data infrastructure required for AI agent operations, including Vector database, data loaders, RAG. Layer 3: Agent frameworks - sophisticated software and tools that simplify the development and management of the AI agents. Layer 4: Deployment and infrastructure - provide the robust technical foundation for running AI agents. Layer 5: Evaluation and observability - focus on assessing the safety and performance of AI agents. Layer 6: Security and compliance - a crucial protective framework ensuring AI agents operate safely, securely, and conform to regulatory boundaries. At this layer security and compliance features embedded into all the AI agent stack layers are integrated together. Layer 7: Agent ecosystem - represents the AI agents' interface with real-world applications and users. == Orchestration patterns == To execute complex tasks, autonomous agents are often integrated with other agents or specialized tools. These configurations, known as orchestration patterns or workflows, include the following: Prompt chaining: A sequence where the output of one step serves as the input for the next. Routing: The classification of an input to direct it to a specialized downstream task or tool. Parallelization: The simultaneous execution of multiple tasks. Sequential processing: A fixed, linear progression of tasks through a predefined pipeline. Planner-critic: An iterative pattern where one agent generates a proposal and another evaluates it to provide feedback for refinement. == Multimodal AI agents == In addition to large language models (LLMs), vision-language models (VLMs) and multimodal foundation models can be used as the basis for agents. In September 2024, Allen Institute for AI released an open-source vision-language model. Nvidia released a framework for developers to use VLMs, LLMs and retrieval-augmented generation for building AI agents that can analyze images and videos, including video search and video summarization. Microsoft released a multimodal agent model – trained on images, video, software user interface interactions, and robotics data – that the company claimed can manipulate software and robots. == Applications == As of April 2025, per the Associated Press, there are few real-world applications of AI agents. As of June 2025, per Fortune, many companies are primarily experimenting with AI agents. The Information divided AI agents into seven archetypes: business-task agents, for acting within enterprise software; conversational agents, which act as chatbots for customer support; research agents, for querying and analyzing information (such as OpenAI Deep Research); analytics agents, for analyzing data to create reports; software developer or coding agents (such as Cursor); domain-specific agents, which include specific subject matter knowledge; and web browser agents (such as OpenAI Operator). By mid-2025, AI agents have been used in video game development, gambling (including sports betting), cryptocurrency wallets (including cryptocurrency trading and meme coins) and social media. In August 2025, New York Magazine described software development as the most definitive use case of AI agents. Likewise, by October 2025, noting a decline in expectations, The Information noted AI coding agents and customer support as the primary use cases by businesses. In November 2025, The Wall Street Journal reported that few companies that deployed AI agents have received a return on investment. === Applications in government === Several government bodies in the United States and United Kingdom have deployed or announced the deployment of agents, at the local and national level. The city of Kyle, Texas deployed an AI agent from Salesforce in March 2025 for 311 customer service. In November 2025, the Internal Revenue Service stated that it would use Agentforce, AI agents from Salesforce, for the Office of Chief Counsel, Taxpayer Advocate Services and the Office of Appeals. That same month, Staffordshire Police announced that they would trial Agentforce agents for handling non-emergency 101 calls in the United Kingdom starting in 2026. In December 2025, the Department of Neighborhoods in Detroit, Michigan, in partnership with a local business, deployed a pilot project in two Detroit districts for an AI agent to be used for customer service calls. In February 2025, Thomas Shedd, the director of the Technology Transformation Services, proposed using AI coding agents across the United States federal government. A recruiter for the Department of Government Efficiency proposed in April 2025 to use AI agents to automate the work of about 70,000 United States federal government employees, as part of a startup with funding from OpenAI and a partnership agreement with Palantir. This proposal was criticized by experts for its impracticality, if not impossibility, and the lack of corresponding widespread adoption by businesses. In December 2025, the Food and Drug Administration announced that it would offer "agentic AI capabilities" to its staff for "meeting management, pre-market reviews, review validation, post-market surveillance, inspections and compliance and administrative functions." That same month, the United States Department of Defense launched GenAI.mil, an internal platform for American military personnel to use generative AI-based applications based on Google Gemini, including "intelligent agentic workflows". Defense Secretary Pete Hegseth listed applications such as "[conducting] deep r

    Read more →
  • Dynamic epistemic logic

    Dynamic epistemic logic

    Dynamic epistemic logic (DEL) is a logical framework dealing with knowledge and information change. Typically, DEL focuses on situations involving multiple agents and studies how their knowledge changes when events occur. These events can change factual properties of the actual world (they are called ontic events): for example a red card is painted in blue. They can also bring about changes of knowledge without changing factual properties of the world (they are called epistemic events): for example, a card is revealed publicly (or privately) to be red. Originally, DEL focused on epistemic events. Only some of the basic ideas are present in this entry of the original DEL framework; more details about DEL in general can be found in the references. Due to the nature of its object of study and its abstract approach, DEL is related and has applications to numerous research areas, such as computer science (artificial intelligence), philosophy (formal epistemology), economics (game theory) and cognitive science. In computer science, DEL is for example very much related to multi-agent systems, which are systems where multiple intelligent agents interact and exchange information. As a combination of dynamic logic and epistemic logic, dynamic epistemic logic is a young field of research. It really started in 1989 with Plaza's logic of public announcement. Independently, Gerbrandy and Groeneveld proposed a system dealing moreover with private announcement and that was inspired by the work of Veltman. Another system was proposed by van Ditmarsch whose main inspiration was the Cluedo game. But the most influential and original system was the system proposed by Baltag, Moss and Solecki. This system can deal with all the types of situations studied in the works above and its underlying methodology is conceptually grounded. This entry will present some of its basic ideas. Formally, DEL extends ordinary epistemic logic by the inclusion of event models to describe actions, and a product update operator that defines how epistemic models are updated as the consequence of executing actions described through event models. Epistemic logic will first be recalled. Then, actions and events will enter into the picture and we will introduce the DEL framework. == Epistemic logic == Epistemic logic is a modal logic dealing with the notions of knowledge and belief. As a logic, it is concerned with understanding the process of reasoning about knowledge and belief: which principles relating the notions of knowledge and belief are intuitively plausible? Like epistemology, it stems from the Greek word ϵ π ι σ τ η μ η {\displaystyle \epsilon \pi \iota \sigma \tau \eta \mu \eta } or ‘episteme’ meaning knowledge. Epistemology is nevertheless more concerned with analyzing the very nature and scope of knowledge, addressing questions such as “What is the definition of knowledge?” or “How is knowledge acquired?”. In fact, epistemic logic grew out of epistemology in the Middle Ages thanks to the efforts of Burley and Ockham. The formal work, based on modal logic, that inaugurated contemporary research into epistemic logic dates back only to 1962 and is due to Hintikka. It then sparked in the 1960s discussions about the principles of knowledge and belief and many axioms for these notions were proposed and discussed. For example, the interaction axioms K p → B p {\displaystyle Kp\rightarrow Bp} and B p → K B p {\displaystyle Bp\rightarrow KBp} are often considered to be intuitive principles: if an agent Knows p {\displaystyle p} then (s)he also Believes p {\displaystyle p} , or if an agent Believes p {\displaystyle p} , then (s)he Knows that (s)he Believes p {\displaystyle p} . More recently, these kinds of philosophical theories were taken up by researchers in economics, artificial intelligence and theoretical computer science where reasoning about knowledge is a central topic. Due to the new setting in which epistemic logic was used, new perspectives and new features such as computability issues were then added to the research agenda of epistemic logic. === Syntax === In the sequel, A G T S = { 1 , … , n } {\displaystyle AGTS=\{1,\ldots ,n\}} is a finite set whose elements are called agents and P R O P {\displaystyle PROP} is a set of propositional letters. The epistemic language is an extension of the basic multi-modal language of modal logic with a common knowledge operator C A {\displaystyle C_{A}} and a distributed knowledge operator D A {\displaystyle D_{A}} . Formally, the epistemic language L EL C {\displaystyle {\mathcal {L}}_{\textsf {EL}}^{C}} is defined inductively by the following grammar in BNF: L EL C : ϕ ::= p ∣ ¬ ϕ ∣ ( ϕ ∧ ϕ ) ∣ K j ϕ ∣ C A ϕ ∣ D A ϕ {\displaystyle {\mathcal {L}}_{\textsf {EL}}^{C}:\phi ~~::=~~p~\mid ~\neg \phi ~\mid ~(\phi \land \phi )~\mid ~K_{j}\phi ~\mid ~C_{A}\phi ~\mid ~D_{A}\phi } where p ∈ P R O P {\displaystyle p\in PROP} , j ∈ A G T S {\displaystyle j\in {AGTS}} and A ⊆ A G T S {\displaystyle A\subseteq {AGTS}} . The basic epistemic language L E L {\displaystyle {\mathcal {L}}_{EL}} is the language L E L C {\displaystyle {\mathcal {L}}_{EL}^{C}} without the common knowledge and distributed knowledge operators. The formula ⊥ {\displaystyle \bot } is an abbreviation for ¬ p ∧ p {\displaystyle \neg p\land p} (for a given p ∈ P R O P {\displaystyle p\in PROP} ), ⟨ K j ⟩ ϕ {\displaystyle \langle K_{j}\rangle \phi } is an abbreviation for ¬ K j ¬ ϕ {\displaystyle \neg K_{j}\neg \phi } , E A ϕ {\displaystyle E_{A}\phi } is an abbreviation for ⋀ j ∈ A K j ϕ {\displaystyle \bigwedge \limits _{j\in A}K_{j}\phi } and C ϕ {\displaystyle C\phi } an abbreviation for C A G T S ϕ {\displaystyle C_{AGTS}\phi } . Group notions: general, common and distributed knowledge. In a multi-agent setting there are three important epistemic concepts: general knowledge, distributed knowledge and common knowledge. The notion of common knowledge was first studied by Lewis in the context of conventions. It was then applied to distributed systems and to game theory, where it allows to express that the rationality of the players, the rules of the game and the set of players are commonly known. General knowledge. General knowledge of ϕ {\displaystyle \phi } means that everybody in the group of agents A G T S {\displaystyle {AGTS}} knows that ϕ {\displaystyle \phi } . Formally, this corresponds to the following formula: E ϕ := ⋀ j ∈ A G T S K j ϕ . {\displaystyle E\phi :={\underset {j\in {AGTS}}{\bigwedge }}K_{j}\phi .} Common knowledge. Common knowledge of ϕ {\displaystyle \phi } means that everybody knows ϕ {\displaystyle \phi } but also that everybody knows that everybody knows ϕ {\displaystyle \phi } , that everybody knows that everybody knows that everybody knows ϕ {\displaystyle \phi } , and so on ad infinitum. Formally, this corresponds to the following formula C ϕ := E ϕ ∧ E E ϕ ∧ E E E ϕ ∧ … {\displaystyle C\phi :=E\phi \land EE\phi \land EEE\phi \land \ldots } As we do not allow infinite conjunction the notion of common knowledge will have to be introduced as a primitive in our language. Before defining the language with this new operator, we are going to give an example introduced by Lewis that illustrates the difference between the notions of general knowledge and common knowledge. Lewis wanted to know what kind of knowledge is needed so that the statement p {\displaystyle p} : “every driver must drive on the right” be a convention among a group of agents. In other words, he wanted to know what kind of knowledge is needed so that everybody feels safe to drive on the right. Suppose there are only two agents i {\displaystyle i} and j {\displaystyle j} . Then everybody knowing p {\displaystyle p} (formally E p {\displaystyle Ep} ) is not enough. Indeed, it might still be possible that the agent i {\displaystyle i} considers possible that the agent j {\displaystyle j} does not know p {\displaystyle p} (formally ¬ K i K j p {\displaystyle \neg K_{i}K_{j}p} ). In that case the agent i {\displaystyle i} will not feel safe to drive on the right because he might consider that the agent j {\displaystyle j} , not knowing p {\displaystyle p} , could drive on the left. To avoid this problem, we could then assume that everybody knows that everybody knows that p {\displaystyle p} (formally E E p {\displaystyle EEp} ). This is again not enough to ensure that everybody feels safe to drive on the right. Indeed, it might still be possible that agent i {\displaystyle i} considers possible that agent j {\displaystyle j} considers possible that agent i {\displaystyle i} does not know p {\displaystyle p} (formally ¬ K i K j K i p {\displaystyle \neg K_{i}K_{j}K_{i}p} ). In that case and from i {\displaystyle i} ’s point of view, j {\displaystyle j} considers possible that i {\displaystyle i} , not knowing p {\displaystyle p} , will drive on the left. So from i {\displaystyle i} ’s point of view, j {\displaystyle j} might drive on the left as well (by the same argument as abov

    Read more →
  • Point-set registration

    Point-set registration

    In computer vision, pattern recognition, and robotics, point-set registration, also known as point-cloud registration or scan matching, is the process of finding a spatial transformation (e.g., scaling, rotation and translation) that aligns two point clouds. The purpose of finding such a transformation includes merging multiple data sets into a globally consistent model (or coordinate frame), and mapping a new measurement to a known data set to identify features or to estimate its pose. Raw 3D point cloud data are typically obtained from Lidars and RGB-D cameras. 3D point clouds can also be generated from computer vision algorithms such as triangulation, bundle adjustment, and more recently, monocular image depth estimation using deep learning. For 2D point set registration used in image processing and feature-based image registration, a point set may be 2D pixel coordinates obtained by feature extraction from an image, for example corner detection. Point cloud registration has extensive applications in autonomous driving, motion estimation and 3D reconstruction, object detection and pose estimation, robotic manipulation, simultaneous localization and mapping (SLAM), panorama stitching, virtual and augmented reality, and medical imaging. As a special case, registration of two point sets that only differ by a 3D rotation (i.e., there is no scaling and translation), is called the Wahba Problem and also related to the orthogonal procrustes problem. == Formulation == The problem may be summarized as follows: Let { M , S } {\displaystyle \lbrace {\mathcal {M}},{\mathcal {S}}\rbrace } be two finite size point sets in a finite-dimensional real vector space R d {\displaystyle \mathbb {R} ^{d}} , which contain M {\displaystyle M} and N {\displaystyle N} points respectively (e.g., d = 3 {\displaystyle d=3} recovers the typical case of when M {\displaystyle {\mathcal {M}}} and S {\displaystyle {\mathcal {S}}} are 3D point sets). The problem is to find a transformation to be applied to the moving "model" point set M {\displaystyle {\mathcal {M}}} such that the difference (typically defined in the sense of point-wise Euclidean distance) between M {\displaystyle {\mathcal {M}}} and the static "scene" set S {\displaystyle {\mathcal {S}}} is minimized. In other words, a mapping from R d {\displaystyle \mathbb {R} ^{d}} to R d {\displaystyle \mathbb {R} ^{d}} is desired which yields the best alignment between the transformed "model" set and the "scene" set. The mapping may consist of a rigid or non-rigid transformation. The transformation model may be written as T {\displaystyle T} , using which the transformed, registered model point set is: The output of a point set registration algorithm is therefore the optimal transformation T ⋆ {\displaystyle T^{\star }} such that M {\displaystyle {\mathcal {M}}} is best aligned to S {\displaystyle {\mathcal {S}}} , according to some defined notion of distance function dist ⁡ ( ⋅ , ⋅ ) {\displaystyle \operatorname {dist} (\cdot ,\cdot )} : where T {\displaystyle {\mathcal {T}}} is used to denote the set of all possible transformations that the optimization tries to search for. The most popular choice of the distance function is to take the square of the Euclidean distance for every pair of points: where ‖ ⋅ ‖ 2 {\displaystyle \|\cdot \|_{2}} denotes the vector 2-norm, s m {\displaystyle s_{m}} is the corresponding point in set S {\displaystyle {\mathcal {S}}} that attains the shortest distance to a given point m {\displaystyle m} in set M {\displaystyle {\mathcal {M}}} after transformation. Minimizing such a function in rigid registration is equivalent to solving a least squares problem. == Types of algorithms == When the correspondences (i.e., s m ↔ m {\displaystyle s_{m}\leftrightarrow m} ) are given before the optimization, for example, using feature matching techniques, then the optimization only needs to estimate the transformation. This type of registration is called correspondence-based registration. On the other hand, if the correspondences are unknown, then the optimization is required to jointly find out the correspondences and transformation together. This type of registration is called simultaneous pose and correspondence registration. === Rigid registration === Given two point sets, rigid registration yields a rigid transformation which maps one point set to the other. A rigid transformation is defined as a transformation that does not change the distance between any two points. Typically such a transformation consists of translation and rotation. In rare cases, the point set may also be mirrored. In robotics and computer vision, rigid registration has the most applications. === Non-rigid registration === Given two point sets, non-rigid registration yields a non-rigid transformation which maps one point set to the other. Non-rigid transformations include affine transformations such as scaling and shear mapping. However, in the context of point set registration, non-rigid registration typically involves nonlinear transformation. If the eigenmodes of variation of the point set are known, the nonlinear transformation may be parametrized by the eigenvalues. A nonlinear transformation may also be parametrized as a thin plate spline. === Other types === Some approaches to point set registration use algorithms that solve the more general graph matching problem. However, the computational complexity of such methods tend to be high and they are limited to rigid registrations. In this article, we will only consider algorithms for rigid registration, where the transformation is assumed to contain 3D rotations and translations (possibly also including a uniform scaling). The PCL (Point Cloud Library) is an open-source framework for n-dimensional point cloud and 3D geometry processing. It includes several point registration algorithms. == Correspondence-based registration == Correspondence-based methods assume the putative correspondences m ↔ s m {\displaystyle m\leftrightarrow s_{m}} are given for every point m ∈ M {\displaystyle m\in {\mathcal {M}}} . Therefore, we arrive at a setting where both point sets M {\displaystyle {\mathcal {M}}} and S {\displaystyle {\mathcal {S}}} have N {\displaystyle N} points and the correspondences m i ↔ s i , i = 1 , … , N {\displaystyle m_{i}\leftrightarrow s_{i},i=1,\dots ,N} are given. === Outlier-free registration === In the simplest case, one can assume that all the correspondences are correct, meaning that the points m i , s i ∈ R 3 {\displaystyle m_{i},s_{i}\in \mathbb {R} ^{3}} are generated as follows:where l > 0 {\displaystyle l>0} is a uniform scaling factor (in many cases l = 1 {\displaystyle l=1} is assumed), R ∈ SO ( 3 ) {\displaystyle R\in {\text{SO}}(3)} is a proper 3D rotation matrix ( SO ( d ) {\displaystyle {\text{SO}}(d)} is the special orthogonal group of degree d {\displaystyle d} ), t ∈ R 3 {\displaystyle t\in \mathbb {R} ^{3}} is a 3D translation vector and ϵ i ∈ R 3 {\displaystyle \epsilon _{i}\in \mathbb {R} ^{3}} models the unknown additive noise (e.g., Gaussian noise). Specifically, if the noise ϵ i {\displaystyle \epsilon _{i}} is assumed to follow a zero-mean isotropic Gaussian distribution with standard deviation σ i {\displaystyle \sigma _{i}} , i.e., ϵ i ∼ N ( 0 , σ i 2 I 3 ) {\displaystyle \epsilon _{i}\sim {\mathcal {N}}(0,\sigma _{i}^{2}I_{3})} , then the following optimization can be shown to yield the maximum likelihood estimate for the unknown scale, rotation and translation:Note that when the scaling factor is 1 and the translation vector is zero, then the optimization recovers the formulation of the Wahba problem. Despite the non-convexity of the optimization (cb.2) due to non-convexity of the set SO ( 3 ) {\displaystyle {\text{SO}}(3)} , seminal work by Berthold K.P. Horn showed that (cb.2) actually admits a closed-form solution, by decoupling the estimation of scale, rotation and translation. Similar results were discovered by Arun et al. In addition, in order to find a unique transformation ( l , R , t ) {\displaystyle (l,R,t)} , at least N = 3 {\displaystyle N=3} non-collinear points in each point set are required. More recently, Briales and Gonzalez-Jimenez have developed a semidefinite relaxation using Lagrangian duality, for the case where the model set M {\displaystyle {\mathcal {M}}} contains different 3D primitives such as points, lines and planes (which is the case when the model M {\displaystyle {\mathcal {M}}} is a 3D mesh). Interestingly, the semidefinite relaxation is empirically tight, i.e., a certifiably globally optimal solution can be extracted from the solution of the semidefinite relaxation. === Robust registration === The least squares formulation (cb.2) is known to perform arbitrarily badly in the presence of outliers. An outlier correspondence is a pair of measurements s i ↔ m i {\displaystyle s_{i}\leftrightarrow m_{i}} that departs from the generative model (cb.1). In this case, one can consider a differen

    Read more →
  • Data augmentation

    Data augmentation

    Data augmentation is a statistical technique which allows maximum likelihood estimation from incomplete data. Data augmentation has important applications in Bayesian analysis, and the technique is widely used in machine learning to reduce overfitting when training machine learning models, achieved by training models on several slightly-modified copies of existing data. == Synthetic oversampling techniques for traditional machine learning == Synthetic Minority Over-sampling Technique (SMOTE) is a method used to address imbalanced datasets in machine learning. In such datasets, the number of samples in different classes varies significantly, leading to biased model performance. For example, in a medical diagnosis dataset with 90 samples representing healthy individuals and only 10 samples representing individuals with a particular disease, traditional algorithms may struggle to accurately classify the minority class. SMOTE rebalances the dataset by generating synthetic samples for the minority class. For instance, if there are 100 samples in the majority class and 10 in the minority class, SMOTE can create synthetic samples by randomly selecting a minority class sample and its nearest neighbors, then generating new samples along the line segments joining these neighbors. This process helps increase the representation of the minority class, improving model performance. == Data augmentation for image classification == When convolutional neural networks grew larger in mid-1990s, there was a lack of data to use, especially considering that some part of the overall dataset should be spared for later testing. It was proposed to perturb existing data with affine transformations to create new examples with the same labels, which were complemented by so-called elastic distortions in 2003, and the technique was widely used as of 2010s. Data augmentation can enhance CNN performance and acts as a countermeasure against CNN profiling attacks. Data augmentation has become fundamental in image classification, enriching training dataset diversity to improve model generalization and performance. The evolution of this practice has introduced a broad spectrum of techniques, including geometric transformations, color space adjustments, and noise injection. === Geometric Transformations === Geometric transformations alter the spatial properties of images to simulate different perspectives, orientations, and scales. Common techniques include: Affine Transformation Rotation: Rotating images by a specified degree to help models recognize objects at various angles. Reflection: Reflecting images horizontally or vertically to introduce variability in orientation. Translation: Shifting images in different directions to teach models positional invariance. Scaling Shear Mapping Cropping: Removing sections of the image to focus on particular features or simulate closer views. Elastic Distortion Morphing within the same class: Generating new samples by applying morphing techniques between two images belonging to the same class, thereby increasing intra-class diversity. === Color Space Transformations === Color space transformations modify the color properties of images, addressing variations in lighting, color saturation, and contrast. Techniques include: Brightness Adjustment: Varying the image's brightness to simulate different lighting conditions. Contrast Adjustment: Changing the contrast to help models recognize objects under various clarity levels. Saturation Adjustment: Altering saturation to prepare models for images with diverse color intensities. Color Jittering: Randomly adjusting brightness, contrast, saturation, and hue to introduce color variability. === Noise Injection === Injecting noise into images simulates real-world imperfections, teaching models to ignore irrelevant variations. Techniques involve: Gaussian Noise: Adding Gaussian noise mimics sensor noise or graininess. Salt and Pepper Noise: Introducing black or white pixels at random simulates sensor dust or dead pixels. == Data augmentation for signal processing == Residual or block bootstrap can be used for time series augmentation. === Biological signals === Synthetic data augmentation is of paramount importance for machine learning classification, particularly for biological data, which tend to be high dimensional and scarce. The applications of robotic control and augmentation in disabled and able-bodied subjects still rely mainly on subject-specific analyses. Data scarcity is notable in signal processing problems such as for Parkinson's Disease Electromyography signals, which are difficult to source - Zanini, et al. noted that it is possible to use a generative adversarial network (in particular, a DCGAN) to perform style transfer in order to generate synthetic electromyographic signals that corresponded to those exhibited by sufferers of Parkinson's Disease. The approaches are also important in electroencephalography (brainwaves). Wang, et al. explored the idea of using deep convolutional neural networks for EEG-Based Emotion Recognition, results show that emotion recognition was improved when data augmentation was used. A common approach is to generate synthetic signals by re-arranging components of real data. Lotte proposed a method of "Artificial Trial Generation Based on Analogy" where three data examples x 1 , x 2 , x 3 {\displaystyle x_{1},x_{2},x_{3}} provide examples and an artificial x s y n t h e t i c {\displaystyle x_{synthetic}} is formed which is to x 3 {\displaystyle x_{3}} what x 2 {\displaystyle x_{2}} is to x 1 {\displaystyle x_{1}} . A transformation is applied to x 1 {\displaystyle x_{1}} to make it more similar to x 2 {\displaystyle x_{2}} , the same transformation is then applied to x 3 {\displaystyle x_{3}} which generates x s y n t h e t i c {\displaystyle x_{synthetic}} . This approach was shown to improve performance of a Linear Discriminant Analysis classifier on three different datasets. Current research shows great impact can be derived from relatively simple techniques. For example, Freer observed that introducing noise into gathered data to form additional data points improved the learning ability of several models which otherwise performed relatively poorly. Tsinganos et al. studied the approaches of magnitude warping, wavelet decomposition, and synthetic surface EMG models (generative approaches) for hand gesture recognition, finding classification performance increases of up to +16% when augmented data was introduced during training. More recently, data augmentation studies have begun to focus on the field of deep learning, more specifically on the ability of generative models to create artificial data which is then introduced during the classification model training process. In 2018, Luo et al. observed that useful EEG signal data could be generated by Conditional Wasserstein Generative Adversarial Networks (GANs) which was then introduced to the training set in a classical train-test learning framework. The authors found classification performance was improved when such techniques were introduced. === Mechanical signals === The prediction of mechanical signals based on data augmentation brings a new generation of technological innovations, such as new energy dispatch, 5G communication field, and robotics control engineering. In 2022, Yang et al. integrate constraints, optimization and control into a deep network framework based on data augmentation and data pruning with spatio-temporal data correlation, and improve the interpretability, safety and controllability of deep learning in real industrial projects through explicit mathematical programming equations and analytical solutions.

    Read more →
  • Nolot

    Nolot

    Nolot is a chess test suite with 11 positions from real games. They were compiled by Pierre Nolot (French: [nɔ.lo]) for the French chess magazine Gambisco and posted on the rec.games.chess Usenet group in 1994. They were designed to be particularly hard to solve for chess engines to solve at the time, although modern engines can find a solution near-instantaneously. == Problem 1 == FEN: r3qb1k/1b4p1/p2pr2p/3n4/Pnp1N1N1/6RP/1B3PP1/1B1QR1K1 w - - 0 1 26.Nxh6!! c3 (26... Rxh6 27.Nxd6 Qh5 (best) 28.Rg5! Qxd1 29.Nf7+ Kg8 30.Nxh6+ Kh8 31.Rxd1 c3 32.Nf7+ Kg8 33.Bg6! Nf4 34.Bxc3 Nxg6 35.Bxb4 Kxf7 36.Rd7+ Kf6 37.Rxg6+ Kxg6 38.Rxb7 ±) 27.Nf5! cxb2 28.Qg4 Bc8 (28... g6!? 29.Kh2! 29.Qd7 30.Nh4 Bc6 31.Nc5! dxc 32.Rxe6 Nf6 33.Nxg6+ Kg7 34.Qg5 Nbd5 35.Ne5 Kh8 36.Nxd7 ±) 29.Qh4+ Rh6 30.Nxh6 gxh6 31.Kh2! Qe5 32.Ng5 Qf6 33.Re8 Bf5 34.Qxh6 (missing a mate in 6: 34.Nf7+ Qxf7 35.Qxh6+ Bh7 36.Rxa8 Nf6 37.Rxf8 Qxf8 38.Qxf8+ Ng8 39.Qg7#) 34...Qxh6 35.Nf7+ Kh7 36.Bxf5+ Qg6 37.Bxg6+ Kg7 38.Rxa8 Be7 39.Rb8 a5 40.Be4+ Kxf7 41.Bxd5+ 1–0 The best Novag computer, the Diablo 68000, finds 26. Nxh6 after seven and a half months (Pierre Nolot has let it run on the position for 14 months and one day, until a power failure stopped an analysis of over 80,000,000,000 nodes.) but for wrong reasons: it evaluates white's position as inferior and thinks this move would enable it to draw. Today Gambit Tiger 2.0 for example can find it quite quickly: Most free engines running on 64-bit processors in 2010 could solve this problem and the others in a few seconds. 1.Qd4 c3 2.Bxc3 Nxc3 3.Qxb4 Nxe4 4.Qxb7 Rb8 5.Qxb8 Qxb8 6.Bxe4 d5 7.Rb1 μ (-1.20) Depth: 12 00:00:09 6055 kN 1.Nxh6 c3 2.Nf5 cxb2 3.Qg4 Rb8 4.Nxg7 Rg6 5.Qxg6 Qxg6 6.Rxg6 Bxg7 7.Nxd6 ³ (-0.48) Depth: 12 00:00:21 14368 kN 1.Nxh6 c3 2.Nf5 cxb2 3.Qg4 Rc8 4.Nxg7 Rg6 5.Nxe8 Rxg4 6.Rxg4 Rxe8 7.Rg6 μ (-0.74) Depth: 13 00:00:55 38455 kN 1.Ne3 Rxe4 2.Bxe4 Qxe4 3.Nxd5 Qxd5 4.Qc1 Qf5 5.Qxh6+ Qh7 6.Qe6 Nd3 7.Re2 Nxb2 8.Rxb2 ³ (-0.58) Depth: 13 00:01:30 62979 kN 1.Ne3 Rxe4 ³ (-0.58) Depth: 14 00:02:02 84941 kN 1.Ne3 Nxe3 2.Rexe3 Bxe4 3.Qg4 Rg6 4.Qxe4 Qxe4 5.Bxe4 Rxg3 6.Rxg3 d5 7.Bf5 Re8 8.Bc3 ³ (-0.30) Depth: 15 00:03:05 128968 kN 1.Nxh6 ² (0.32) Depth: 15 00:07:58 350813 kN With the next ply showing a clear advantage. Stockfish 14dev 64bit 4CPU running on 2020 hardware recognises the significance of Nxh6!! in 1 second. Stockfish_21092606_x64_avx2: NNUE evaluation using nn-13406b1dcbe0.nnue enabled. 19/32 00:01 7708k 4882k +3,00 Nxh6 Rxh6 Nxd6 Qh5 Bg6 Qxd1 Nf7+ Kg8 Nxh6+ gxh6 Bh5+ Kh7 Rxd1 c3 Bxc3 Nxc3 Rd7+ Kh8 Rxb7 Ne4 Re3 Nxf2 Kxf2 Bc5 Ke2 Bxe3 Kxe3 Nd5+ Kf2 49/73 15:02 5118270k 5673k +6,15 Nxh6 Rxh6 Nxd6 Qh5 Rg5 Qxd1 Nf7+ Kg8 Nxh6+ Kh8 Rxd1 c3 Nf7+ Kg8 Bg6 Nf4 Bxc3 Nbd5 Rb1 Bc6 Bd2 Nxg6 Rxg6 Ne7 Rxc6 Nxc6 Rb6 Rc8 Ng5 a5 Ra6 Bb4 Be3 Ne5 Bd4 Nc6 Bb6 Bd2 h4 Kf8 Bc5+ Kg8 Be3 Bxe3 fxe3 Kf8 Kf2 Ke7 Nf3 Kd7 Rb6 Ne7 Rb5 Kd6 Rxa5 Rc2+ Kg3 Re2 Nd4 Rxe3+ Kf4 Rd3 Nf5+ Kc7 Nxe7 == Problem 2 == FEN: r4rk1/pp1n1p1p/1nqP2p1/2b1P1B1/4NQ2/1B3P2/PP2K2P/2R5 w - - 0 1 22.Rxc5!! Nxc5 23.Nf6+ Kh8 24.Qh4 Qb5+ (computers think there is perpetual check here, but...) 25.Ke3! 25... h5 26.Nxh5 Qxb3+ (26... d5+ 27.Bxd5 Qd3 28.Kf2 Ne4+ 29.Bxe4 Qd4+ 30.Kg2 Qxb2+ 31.Kh3 ±) and White won in 41 moves. Today Deep Junior 8.ZX for example finds it very quickly (around 1 minute): 1.Kd1 Rac8 2.Bh6 Qb5 3.Rc3 Qf1+ 4.Kc2 Rc6 5.Bxf8 −+ (-2.11) Depth: 12 00:00:04 10422 kN 1.Nxc5 Nxc5 2.Rxc5 Qxc5 3.e6 Rae8 4.e7 Nc8 5.Kf1 Nxd6 6.Bf6 b5 −+ (-2.10) Depth: 12 00:00:14 25054 kN 1.Bf6! μ (-1.35) Depth: 12 00:00:17 34601 kN 1.Bf6 Qb5+ 2.Ke1 Bb4+ 3.Kf2 Bc5+ = (0.00) Depth: 12 00:00:20 34601 kN 1.Bf6 Qb5+ 2.Ke1 Nxf6 3.Nxf6+ Kg7 4.Nh5+ gxh5 5.Qf6+ Kg8 6.Qg5+ Kh8 7.Qf6+ = (0.00) Depth: 15 00:01:01 130544 kN 1.Rxc5! = (0.15) Depth: 15 00:01:12 145875 kN 1.Rxc5 Nxc5 2.Nf6+ Kh8 3.Qh4 Qb5+ 4.Ke3 h5 5.Nxh5 Qd3+ 6.Kf2 Ne4+ 7.fxe4 Qd4+ 8.Kf1 Qd3+ 9.Ke1 Qb1+ 10.Bd1 ± (2.18) Depth: 15 00:01:18 145875 kN Stockfish 14dev 64bit 4CPU running on 2020 hardware recognises the significance of Rxc5!! in 1 second. Stockfish_21092606_x64_avx2: NNUE evaluation using nn-13406b1dcbe0.nnue enabled. 21/25 00:01 5822k 5545k +6,61 Rxc5 Qxc5 Nxc5 Nxc5 Bh6 Nbd7 Bxf8 Rxf8 Qe3 Rc8 f4 Nxe5 Qxe5 Ne6 Bxe6 Rc2+ Kd3 Rxh2 46/86 11:27 5057055k 7355k +7,61 Rxc5 Qxc5 Nxc5 Nxc5 Bf6 Ne6 Qh6 Nd4+ Kf2 Nf5 Qg5 Nd7 h4 Nxf6 Qxf6 Ng7 d7 b5 Bd5 Rab8 b4 Nh5 Bxf7+ Rxf7 d8R+ Rxd8 Qxd8+ Rf8 Qd5+ Kg7 e6 Kf6 Qd7 Ng7 Qd4+ Kxe6 Qxg7 Rf7 Qc3 Ke7 Qc5+ Ke8 Qc8+ Ke7 h5 gxh5 Kg3 h4+ Kh2 h6 Qc5+ Kf6 Qxb5 Kg7 f4 Rxf4 Qe5+ Rf6 b5 h3 Qd4 Kg8 Qxf6 h5 Blacks 22. .. Nxc5 is suboptimal and leads faster mate 77/44 09:18 6987714k 12518k +M22 Nf6+ Kh8 Qh4 Qb5+ Ke3 Qxb3+ axb3 h5 Nxh5 Nd5+ Kd4 Ne6+ Kxd5 Nxg5 Qxg5 gxh5 f4 Rad8 f5 f6 Qxh5+ Kg7 Qg6+ Kh8 e6 b6 e7 Rb8 exf8Q+ Rxf8 Ke6 b5 Ke7 Rb8 Qh5+ Kg7 Qf7+ Kh8 Kxf6 Rf8 Qxf8+ Kh7 Qg7+ == Problem 3 == FEN: r2qk2r/ppp1b1pp/2n1p3/3pP1n1/3P2b1/2PB1NN1/PP4PP/R1BQK2R w KQkq - 0 1 12.Nxg5!! Bxd1 13.Nxe6 Qb8 14.Nxg7+!! Kf8 15.Bh6! Bg4 16.0-0+ Kg8 17.Rf4 ± White wins with a queen sac but black has defensive resources. Stockfish 8 64bit 3CPU running on 2016 hardware recognizes the significance of Nxg5!! in 55 seconds. Stockfish 14 dev (Stockfish_21092606_x64_avx2) 64bit 4CPU running on 2020 hardware recognizes the significance of Nxg5!! in 1 second. NNUE evaluation using nn-13406b1dcbe0.nnue enabled. 21/34 00:01 8291k 4530k +2,78 Nxg5 Bxd1 Nxe6 Qb8 Nxg7+ Kd8 Kxd1 b5 N3f5 Bf8 Rf1 Kc8 Nh5 Kb7 Bxb5 Ne7 g4 a6 Ba4 Nxf5 gxf5 Ka7 Nf4 c5 47/59 37:49 10390430k 4578k +3,16 Nxg5 Bxd1 Nxe6 Qb8 Nxg7+ Kd8 Kxd1 b5 Rf1 Kc8 N3f5 Bf8 Ne6 Kd7 Nf4 Ne7 g4 a5 Ke2 Qb7 h4 Ra6 a3 Kc8 Be3 Kb8 Kf3 Rb6 Bd2 Qc8 Kg3 c5 Be3 c4 Nxe7 Bxe7 Bf5 Qd8 h5 Qg8 Kh3 Bg5 Rf3 Ra6 Raf1 b4 Nxd5 Qxd5 Bxg5 bxc3 bxc3 Rb6 Be3 Rb3 Blacks 14 .. Kf8 is suboptimal and leads loss fast 41/68 06:31 3269727k 8350k +9,28 Bh6 Kg8 Rxd1 Bf8 N3h5 Bxg7 Nxg7 Qf8 Nf5 Ne7 Bxf8 Nxf5 Bxf5 Rxf8 Be6+ Kg7 Rd3 Rf4 Bxd5 c6 Rg3+ Kf8 Rf3 Rxf3 Bxf3 Kg7 Rf1 Re8 Be4 Re6 Ke2 a5 Ke3 Rh6 h3 a4 Kf4 Re6 h4 Re8 Ke3 h6 h5 Rf8 Rxf8 Kxf8 == Problem 4 == FEN: r1b1kb1r/1p1n1ppp/p2ppn2/6BB/2qNP3/2N5/PPP2PPP/R2Q1RK1 w kq - 0 1 10.Nxe6!! Qxe6 11.Nd5 Kd8 12.Bg4 Qe5 13.f4 Qxe4 (13...Qxb2 stronger but not sufficient: 14.Bxd7 Bxd7 15.Rb1 Qa3 16.Nxf6 Bb5 17.Qd4 Qc5 18.Rfd1 ±) 14.Bxd7 Bxd7 15.Nxf6 gxf6 16.Bxf6+ Kc7 17.Bxh8 and Black resigned on move 27. Stockfish 14dev 64bit 4CPU running on 2020 hardware recognises the significance of 10.Nxe6 in 1 second. Stockfish_21092606_x64_avx2: NNUE evaluation using nn-13406b1dcbe0.nnue enabled. 22/37 00:01 6955k 5367k +4,00 Nxe6 Qxe6 Nd5 Kd8 Bg4 Qe5 f4 Qxb2 Rb1 Qa3 Bxd7 Bxd7 Nxf6 Bb5 Rf3 Qxa2 c4 Bxc4 Rf2 Qa5 Nd5+ f6 Nxf6 Kc7 Rc1 b5 Qd5 gxf6 Bxf6 Kb8 Rxc4 Qe1+ Rf1 51/70 47:10 14538911k 5137k +5,76 Nxe6 Qxe6 Nd5 Kd8 Bg4 Qe5 f4 Qxe4 Bxd7 Bxd7 Nxf6 Qf5 Qd4 Kc8 Nd5 Bc6 c4 f6 Nb6+ Kb8 Bh4 Be7 Rae1 Bd8 Nxa8 Kxa8 Bf2 Kb8 Qxd6+ Bc7 Ba7+ Kc8 Qe6+ Qxe6 Rxe6 h5 h4 Rd8 Re7 g6 Be3 Ba5 Kf2 Rd6 Rc1 Bd8 Rg7 Be4 Rg8 Kd7 c5 Rd3 Rc4 Bd5 Rg7+ Ke6 Rd4 Rxd4 Bxd4 Kf5 Rd7 Bc6 Rxd8 Kxf4 Bxf6 == Problem 5 == FEN: r2qrb1k/1p1b2p1/p2ppn1p/8/3NP3/1BN5/PPP3QP/1K3RR1 w - - 0 1 21.e5!! dxe5 22.Ne4! Nh5 23.Qg6!? (stronger is 23.Qg4!! Nf4 24.Nf3 Qc7 25.Nh4 ± ) 23...exd4? (23...Nf4 24.Rxf4! exf4 25.Nf3! Qb6 26.Rg5!! covering b5 and threatening Nf6 or Ne5-f7+) 24.Ng5 1−0 Stockfish 8 64bit 3CPU running on 2016 hardware recognises the significance of 21.e5 in 5 seconds. Stockfish 12 dev (Stockfish_20062212_x64_modern) 64bit 1CPU running on 2016 hardware recognizes the significance of 21.e5 in 11 seconds. 25/42 00:06 7 963k 1309k +6,93 e5 Nh5 Ne4 dxe5 Nf3 Nf4 Qg4 Qc7 Nh4 Bc6 Nf6 g5 Rxf4 exf4 Qh5 Qe7 Ng6+ Kg7 Nxe7 Rxe7 Ng4 37/62 03:12 298 083k 1545k +10,70 e5 Ng4 Qxg4 Qg5 Qh3 Qxe5 Nde2 g5 Rxf8+ Kg7 Rff1 Rf8 Re1 Qf5 Qg3 Rad8 Nd4 Qf4 Nxe6+ Bxe6 Rxe6 Qxg3 == Problem 6 == FEN: rnbqk2r/1p3ppp/p7/1NpPp3/QPP1P1n1/P4N2/4KbPP/R1B2B1R b kq - 0 1 13... axb5!! offers an exchange to keep the white queen out of play. 14.Qxa8 Bd4 15.Nxd4 cxd4 16.Qxb8 0-0! 17.Ke1 Qh4 18.g3 Qf6 19.Bf4 g5? (Ivanchuk found 19...d3! during post-game analysis.) 20.Rc1 exf4 21.Qxf4 Qd4 22.Rd1 bxc4 23.e5 Qc3+ 24.Rd2 Re8 25.Bxd3 cxd3 −+ Tasc R30 finds 19... d3! in 2 1/2 hours. 19... Bf5!! is even stronger than 19... d3. Position is already lost at 19... d3 +8.00 for black, ... Bf5 not much better Stockfish 14dev 64bit 4CPU running on 2020 hardware recognises the significance of axb5!! in 1 second. Stockfish_21092606_x64_avx2: NNUE evaluation using nn-13406b1dcbe0.nnue enabled. 21/28 00:01 9264k 4714k -1,22 axb5 Qxa8 Bd4 Nxd4 cxd4 h3 Nf6 Bg5 0-0 cxb5 h6 Bxf6 Qxf6 Re1 Nd7 Kd1 Qg6 Qa4 Qg3 Qc2 Qxa3 Bd3 Qxb4 Qb1 46/67 1:05:00 18113493k 4644k -2,40 axb5 Qxa8 Bd4 h3 Nf6 Nxd4 exd4 Kf2 Nxe4+ Kg1 Nd7 Bg5 Qxg5 Qxc8+ Ke7 Qc7 Qe5 d6+ Qxd6 Qxd6+ Kxd6 bxc5+ Ndxc5 cxb5 d3 h4 d2 Rh3 Ke5 Be2 f5 Ra2 Rd8 Bd1 Rd4 Re3 f4 Re2 b6 a4 Kd6 Rc2 Kd5 Ra2 h6 Rb2 Nxa4 Bxa4 Rxa4 Rexd2+ Nxd2 Rxd2+ Kc4 Rd7 g6 == Problem 7 == FEN 1r1bk2r/2R2ppp/p3p3/1b2P2q/4QP2/4N3/1B4PP/3R2K1 w k - 0 1 1.Rxd8+!! Rxd8 (1...Kxd8 2.Ra7! Qe2 3.Qd4+ Ke8 4.h3 Qe1+ 5.Kh2 Rd8 6.Qc5 Qh4 7.Ba3 Rd7 8.Ra8+ Rd8 9.g3 1−0)

    Read more →
  • The 2028 Global Intelligence Crisis

    The 2028 Global Intelligence Crisis

    The 2028 Global Intelligence Crisis is a report authored by James van Geelen and Alap Shah and published by Citrini Research in February 2026, on the impact of artificial intelligence on humanity's future. Written in the form of a scenario analysis, it was viewed millions of times online and reportedly caused a fall in the stock market prices of major tech and financial firms. It also received criticism among others, for its allegedly flawed economic logic. The 'thought exercise', as the authors called it, painted a gloomy picture for the near future, where outputs keep growing while consumer's ability to spend collapses. "...driven by ai agents that don’t sleep, take sick days or require health insurance”, "outputs that are shown in national accounts increases, "but never circulates through the real economy"(which the report calls 'Ghost GDP'), the authors argued. In other words, the authors predict a scenario where the owners of the AI firms will accumulate a vast fortune but there will be scant demand from consumers as AI would cause massive unemployment. The authors caution the reader that what they make is a scenario and not a prediction. In the scenario they visualise, any service whose value proposition is “I will navigate complexity that you find tedious” is getting disrupted. The reports argues that the unique ability of human beings to analyse, decide, create, persuade, and coordinate was “the thing that could not be replicated at scale,” and call the historical scarcity of this precious entity 'friction'. When this friction becomes zero, a gamut of changes occur which then triggers a cascading of changes across the economy. ”Travel booking platforms are an early casualty; Financial advice. tax prep., and routine legal work follow suit. National unemployment rate go as high 10.2% and the S&P 500 goes for a massive 38% peak-to-trough crash. In contrast to the previous technological revolutions the high-earning professionals suffers more and get forced to take up roles in the gig economy. Labour supply becomes abundant and this cuts wages all across the economy. The dent in income for the employees then affects other sectors of the economy such as the residential mortgage market. The losses for the software companies triggers loan defaults and heralds peril for the private credit sector.

    Read more →
  • Motor theory of speech perception

    Motor theory of speech perception

    The motor theory of speech perception is the hypothesis that people perceive spoken words by identifying the vocal tract gestures with which they are pronounced rather than by identifying the sound patterns that speech generates. It originally claimed that speech perception is done through a specialized module that is innate and human-specific. Though the idea of a module has been qualified in more recent versions of the theory, the idea remains that the role of the speech motor system is not only to produce speech articulations but also to detect them. The hypothesis has gained more interest outside the field of speech perception than inside. This has increased particularly since the discovery of mirror neurons that link the production and perception of motor movements, including those made by the vocal tract. The theory was initially proposed in the Haskins Laboratories in the 1950s by Alvin Liberman and Franklin S. Cooper, and developed further by Donald Shankweiler, Michael Studdert-Kennedy, Ignatius Mattingly, Carol Fowler and Douglas Whalen. == Origins and development == The hypothesis has its origins in research using pattern playback to create reading machines for the blind that would substitute sounds for orthographic letters. This led to a close examination of how spoken sounds correspond to the acoustic spectrogram of them as a sequence of auditory sounds. This found that successive consonants and vowels overlap in time with one another (a phenomenon known as coarticulation). This suggested that speech is not heard like an acoustic "alphabet" or "cipher," but as a "code" of overlapping speech gestures. === Associationist approach === Initially, the theory was associationist: infants mimic the speech they hear and that this leads to behavioristic associations between articulation and its sensory consequences. Later, this overt mimicry would be short-circuited and become speech perception. This aspect of the theory was dropped, however, with the discovery that prelinguistic infants could already detect most of the phonetic contrasts used to separate different speech sounds. === Cognitivist approach === The behavioristic approach was replaced by a cognitivist one in which there was a speech module. The module detected speech in terms of hidden distal objects rather than at the proximal or immediate level of their input. The evidence for this was the research finding that speech processing was special such as duplex perception. === Changing distal objects === Initially, speech perception was assumed to link to speech objects that were both the invariant movements of speech articulators the invariant motor commands sent to muscles to move the vocal tract articulators This was later revised to include the phonetic gestures rather than motor commands, and then the gestures intended by the speaker at a prevocal, linguistic level, rather than actual movements. === Modern revision === The "speech is special" claim has been dropped, as it was found that speech perception could occur for nonspeech sounds (for example, slamming doors for duplex perception). === Mirror neurons === The discovery of mirror neurons has led to renewed interest in the motor theory of speech perception, and the theory still has its advocates, although there are also critics. == Support == === Nonauditory gesture information === If speech is identified in terms of how it is physically made, then nonauditory information should be incorporated into speech percepts even if it is still subjectively heard as "sounds". This is, in fact, the case. The McGurk effect shows that seeing the production of a spoken syllable that differs from an auditory cue synchronized with it affects the perception of the auditory one. In other words, if someone hears "ba" but sees a video of someone pronouncing "ga", what they hear is different—some people believe they hear "da". People find it easier to hear speech in noise if they can see the speaker. People can hear syllables better when their production can be felt haptically. === Categorical perception === Using a speech synthesizer, speech sounds can be varied in place of articulation along a continuum from /bɑ/ to /dɑ/ to /ɡɑ/, or in voice onset time on a continuum from /dɑ/ to /tɑ/ (for example). When listeners are asked to discriminate between two different sounds, they perceive sounds as belonging to discrete categories, even though the sounds vary continuously. In other words, 10 sounds (with the sound on one extreme being /dɑ/ and the sound on the other extreme being /tɑ/, and the ones in the middle varying on a scale) may all be acoustically different from one another, but the listener will hear all of them as either /dɑ/ or /tɑ/. Likewise, the English consonant /d/ may vary in its acoustic details across different phonetic contexts (the /d/ in /du/ does not technically sound the same as the one in /di/, for example), but all /d/'s as perceived by a listener fall within one category (voiced alveolar plosive) and that is because "linguistic representations are abstract, canonical, phonetic segments or the gestures that underlie these segments." This suggests that humans identify speech using categorical perception, and thus that a specialized module, such as that proposed by the motor theory of speech perception, may be on the right track. === Speech imitation === If people can hear the gestures in speech, then the imitation of speech should be very fast, as in when words are repeated that are heard in headphones as in speech shadowing. People can repeat heard syllables more quickly than they would be able to produce them normally. === Speech production === Hearing speech activates vocal tract muscles, and the motor cortex and premotor cortex. The integration of auditory and visual input in speech perception also involves such areas. Disrupting the premotor cortex disrupts the perception of speech units such as plosives. The activation of the motor areas occurs in terms of the phonemic features which link with the vocal track articulators that create speech gestures. The perception of a speech sound is aided by pre-emptively stimulating the motor representation of the articulators responsible for its pronunciation . Auditory and motor cortical coupling is restricted to a specific range of neuronal firing frequency. === Perception-action meshing === Evidence exists that perception and production are generally coupled in the motor system. This is supported by the existence of mirror neurons that are activated both by seeing (or hearing) an action and when that action is carried out. Another source of evidence is that for common coding theory between the representations used for perception and action. == Criticisms == The motor theory of speech perception is not widely held in the field of speech perception, though it is more popular in other fields, such as theoretical linguistics. As three of its advocates have noted, "it has few proponents within the field of speech perception, and many authors cite it primarily to offer critical commentary".p. 361 Several critiques of it exist. === Multiple sources === Speech perception is affected by nonproduction sources of information, such as context. Individual words are hard to understand in isolation but easy when heard in sentence context. It therefore seems that speech perception uses multiple sources that are integrated together in an optimal way. === Production === The motor theory of speech perception would predict that speech motor abilities in infants predict their speech perception abilities, but in actuality it is the other way around. It would also predict that defects in speech production would impair speech perception, but they do not. However, this only affects the first and already superseded behaviorist version of the theory, where infants were supposed to learn all production-perception patterns by imitation early in childhood. This is no longer the mainstream view of motor-speech theorists. === Speech module === Several sources of evidence for a specialized speech module have failed to be supported. Duplex perception can be observed with door slams. The McGurk effect can also be achieved with nonlinguistic stimuli, such as showing someone a video of a basketball bouncing but playing the sound of a ping-pong ball bouncing. As for categorical perception, listeners can be sensitive to acoustic differences within single phonetic categories. As a result, this part of the theory has been dropped by some researchers. === Sublexical tasks === The evidence provided for the motor theory of speech perception is limited to tasks such as syllable discrimination that use speech units not full spoken words or spoken sentences. As a result, "speech perception is sometimes interpreted as referring to the perception of speech at the sublexical level. However, the ultimate goal of these studies is presumably to understand the neural processes supporting the ability to process spee

    Read more →
  • Symbolic regression

    Symbolic regression

    Symbolic regression (SR) is a type of regression analysis that searches the space of mathematical expressions to find the model that best fits a given dataset, both in terms of accuracy and simplicity. No particular model is provided as a starting point for symbolic regression. Instead, initial expressions are formed by randomly combining mathematical building blocks such as mathematical operators, analytic functions, constants, and state variables. Usually, a subset of these primitives will be specified by the person operating it, but that's not a requirement of the technique. The symbolic regression problem for mathematical functions has been tackled with a variety of methods, including recombining equations most commonly using genetic programming, as well as more recent methods utilizing Bayesian methods and neural networks. Another non-classical alternative method to SR is called Universal Functions Originator (UFO), which has a different mechanism, search-space, and building strategy. Further methods such as Exact Learning attempt to transform the fitting problem into a moments problem in a natural function space, usually built around generalizations of the Meijer-G function. By not requiring a priori specification of a model, symbolic regression isn't affected by human bias, or unknown gaps in domain knowledge. It attempts to uncover the intrinsic relationships of the dataset, by letting the patterns in the data itself reveal the appropriate models, rather than imposing a model structure that is deemed mathematically tractable from a human perspective. The fitness function that drives the evolution of the models takes into account not only error metrics (to ensure the models accurately predict the data), but also special complexity measures, thus ensuring that the resulting models reveal the data's underlying structure in a way that's understandable from a human perspective. This facilitates reasoning and favors the odds of getting insights about the data-generating system, as well as improving generalisability and extrapolation behaviour by preventing overfitting. Accuracy and simplicity may be left as two separate objectives of the regression—in which case the optimum solutions form a Pareto front—or they may be combined into a single objective by means of a model selection principle such as minimum description length. It has been proven that symbolic regression is an NP-hard problem. Nevertheless, if the sought-for equation is not too complex it is possible to solve the symbolic regression problem exactly by generating every possible function (built from some predefined set of operators) and evaluating them on the dataset in question. == Difference from classical regression == While conventional regression techniques seek to optimize the parameters for a pre-specified model structure, symbolic regression avoids imposing prior assumptions, and instead infers the model from the data. In other words, it attempts to discover both model structures and model parameters. This approach has the disadvantage of having a much larger space to search, because not only the search space in symbolic regression is infinite, but there are an infinite number of models which will perfectly fit a finite data set (provided that the model complexity isn't artificially limited). This means that it will possibly take a symbolic regression algorithm longer to find an appropriate model and parametrization, than traditional regression techniques. This can be attenuated by limiting the set of building blocks provided to the algorithm, based on existing knowledge of the system that produced the data; but in the end, using symbolic regression is a decision that has to be balanced with how much is known about the underlying system. Nevertheless, this characteristic of symbolic regression also has advantages: because the evolutionary algorithm requires diversity in order to effectively explore the search space, the result is likely to be a selection of high-scoring models (and their corresponding set of parameters). Examining this collection could provide better insight into the underlying process, and allows the user to identify an approximation that better fits their needs in terms of accuracy and simplicity. == Benchmarking == === SRBench === In 2021, SRBench was proposed as a large benchmark for symbolic regression. In its inception, SRBench featured 14 symbolic regression methods, 7 other ML methods, and 252 datasets from PMLB. The benchmark intends to be a living project: it encourages the submission of improvements, new datasets, and new methods, to keep track of the state of the art in SR. === SRBench Competition 2022 === In 2022, SRBench announced the competition Interpretable Symbolic Regression for Data Science, which was held at the GECCO conference in Boston, MA. The competition pitted nine leading symbolic regression algorithms against each other on a novel set of data problems and considered different evaluation criteria. The competition was organized in two tracks, a synthetic track and a real-world data track. ==== Synthetic Track ==== In the synthetic track, methods were compared according to five properties: re-discovery of exact expressions; feature selection; resistance to local optima; extrapolation; and sensitivity to noise. Rankings of the methods were: QLattice PySR (Python Symbolic Regression) uDSR (Deep Symbolic Optimization) ==== Real-world Track ==== In the real-world track, methods were trained to build interpretable predictive models for 14-day forecast counts of COVID-19 cases, hospitalizations, and deaths in New York State. These models were reviewed by a subject expert and assigned trust ratings and evaluated for accuracy and simplicity. The ranking of the methods was: uDSR (Deep Symbolic Optimization) QLattice geneticengine (Genetic Engine) == Non-standard methods == Most symbolic regression algorithms prevent combinatorial explosion by implementing evolutionary algorithms that iteratively improve the best-fit expression over many generations. Recently, researchers have proposed algorithms utilizing other tactics in AI. Silviu-Marian Udrescu and Max Tegmark developed the "AI Feynman" algorithm, which attempts symbolic regression by training a neural network to represent the mystery function, then runs tests against the neural network to attempt to break up the problem into smaller parts. For example, if f ( x 1 , . . . , x i , x i + 1 , . . . , x n ) = g ( x 1 , . . . , x i ) + h ( x i + 1 , . . . , x n ) {\displaystyle f(x_{1},...,x_{i},x_{i+1},...,x_{n})=g(x_{1},...,x_{i})+h(x_{i+1},...,x_{n})} , tests against the neural network can recognize the separation and proceed to solve for g {\displaystyle g} and h {\displaystyle h} separately and with different variables as inputs. This is an example of divide and conquer, which reduces the size of the problem to be more manageable. AI Feynman also transforms the inputs and outputs of the mystery function in order to produce a new function which can be solved with other techniques, and performs dimensional analysis to reduce the number of independent variables involved. The algorithm was able to "discover" 100 equations from The Feynman Lectures on Physics, while a leading software using evolutionary algorithms, Eureqa, solved only 71. AI Feynman, in contrast to classic symbolic regression methods, requires a very large dataset in order to first train the neural network and is naturally biased towards equations that are common in elementary physics.

    Read more →
  • Artificial intelligence in spirituality

    Artificial intelligence in spirituality

    Some users of artificial intelligence (AI) technologies, especially chatbots, may develop beliefs that AI has or can attain supernatural or spiritual powers. AI models such as ChatGPT are turned to for fortune telling, mysticism and remote viewing. Recent and sudden advances in large language models have led to folk myths about their origin or capabilities, as well as their deification or worship by some users. Tucker Carlson has made similar claims, including directly to Sam Altman. Pope Leo XIV advised priests against using LLM models when it came to the creation of sermons.

    Read more →
  • Machine-learned interatomic potential

    Machine-learned interatomic potential

    Machine-learned interatomic potentials (MLIPs), or simply machine learning potentials (MLPs), are interatomic potentials constructed using machine learning. Beginning in the 1990s, researchers have employed such programs to construct interatomic potentials by mapping atomic structures to their potential energies. These potentials are referred to as MLIPs or MLPs. Such machine learning potentials promised to fill the gap between density functional theory, a highly accurate but computationally intensive modelling method, and empirically derived or intuitively-approximated potentials, which were far lighter computationally but substantially less accurate. Improvements in artificial intelligence technology heightened the accuracy of MLPs while lowering their computational cost, increasing the role of machine learning in fitting potentials. Machine learning potentials began by using neural networks to tackle low-dimensional systems. While promising, these models could not systematically account for interatomic energy interactions; they could be applied to small molecules in a vacuum, or molecules interacting with frozen surfaces, but not much else – and even in these applications, the models often relied on force fields or potentials derived empirically or with simulations. These models thus remained confined to academia. Modern neural networks construct highly accurate and computationally light potentials, as theoretical understanding of materials science was increasingly built into their architectures and preprocessing. Almost all are local, accounting for all interactions between an atom and its neighbor up to some cutoff radius. There exist some nonlocal models, but these have been experimental for almost a decade. For most systems, reasonable cutoff radii enable highly accurate results. Almost all neural networks intake atomic coordinates and output potential energies. For some, these atomic coordinates are converted into atom-centered symmetry functions. From this data, a separate atomic neural network is trained for each element; each atomic network is evaluated whenever that element occurs in the given structure, and then the results are pooled together at the end. This process – in particular, the atom-centered symmetry functions which convey translational, rotational, and permutational invariances – has greatly improved machine learning potentials by significantly constraining the neural network search space. Other models use a similar process but emphasize bonds over atoms, using pair symmetry functions and training one network per atom pair. Other models to learn their own descriptors rather than using predetermined symmetry-dictating functions. These models, called message-passing neural networks (MPNNs), are graph neural networks. Treating molecules as three-dimensional graphs (where atoms are nodes and bonds are edges), the model takes feature vectors describing the atoms as input, and iteratively updates these vectors as information about neighboring atoms is processed through message functions and convolutions. These feature vectors are then used to predict the final potentials. The flexibility of this method often results in stronger, more generalizable models. In 2017, the first-ever MPNN model (a deep tensor neural network) was used to calculate the properties of small organic molecules. == Gaussian Approximation Potential (GAP) == One popular class of machine-learned interatomic potential is the Gaussian Approximation Potential (GAP), which combines compact descriptors of local atomic environments with Gaussian process regression to machine learn the potential energy surface of a given system. To date, the GAP framework has been used to successfully develop a number of MLIPs for various systems, including for elemental systems such as carbon, silicon, phosphorus, and tungsten, as well as for multicomponent systems such as Ge2Sb2Te5 and austenitic stainless steel, Fe7Cr2Ni. == Equivariant graph neural networks == A significant limitation of early MPNNs was that they were not inherently equivariant to rotations and reflections of atomic structures — meaning predictions could change depending on how a molecule was oriented in space. Beginning around 2021, a new class of models addressed this by incorporating equivariance directly into the message-passing layers using spherical harmonics and irreducible representations. Notable examples include NequIP (2021), MACE (2022), and GemNet-OC (2022). These equivariant architectures proved substantially more data-efficient and accurate than their predecessors, and became the dominant paradigm for high-accuracy MLIPs. == Universal MLIPs and large-scale datasets == Early MLIPs were system-specific, trained on a few thousand structures of a single material. A major shift occurred with the creation of large, chemically diverse datasets enabling models that generalize across many elements, bonding environments, and application domains — so-called universal MLIPs. A key driver was the Open Catalyst Project (OC20, OC22), a collaboration between Meta AI (FAIR) and Carnegie Mellon University launched in 2020. OC20 comprises approximately 1.3 million DFT relaxations across 82 elements, designed to accelerate the discovery of catalysts for renewable energy applications. It was among the first datasets large enough to train GNNs that generalize across diverse chemical systems, and established a widely-used benchmark for the field. A subsequent dataset, Open Direct Air Capture (OpenDAC 2023 and OpenDAC 2025), applied the same approach to carbon capture, providing a large computational database of metal-organic frameworks and sorbent candidates evaluated for CO₂ capture, generated using nearly 400 million CPU hours of quantum chemistry calculations in collaboration with Georgia Tech. These datasets revealed a new challenge: the GNN architectures most effective for atomic simulations were memory-intensive, as they model higher-order interactions between triplets or quadruplets of atoms, making it difficult to scale model size. Graph Parallelism, introduced by Sriram et al. (ICLR 2022), addressed this by distributing a single input graph across multiple GPUs — a distinct strategy from data parallelism (which distributes training examples) or model parallelism (which distributes layers). This enabled training GNNs with hundreds of millions to billions of parameters for the first time. Building on these foundations, Meta FAIR released the Universal Model for Atoms (UMA) in 2025, trained on approximately 500 million unique 3D atomic structures spanning molecules, materials, and catalysts — the largest training run to date for an MLIP. UMA introduced a Mixture of Linear Experts (MoLE) architecture, enabling one model to learn from datasets generated by different DFT codes and settings without significant inference overhead. It matches or surpasses specialized models across catalysis, materials, and molecular benchmarks without task-specific fine-tuning, and has been described as marking a "pre/post-UMA" divide in the field. == Applications == Catalyst discovery: MLIPs have significantly accelerated the computational screening of heterogeneous catalysts by replacing expensive DFT relaxations with fast neural network surrogates. The Open Catalyst Project explicitly targets this application, aiming to identify new catalysts for green hydrogen production and other renewable energy reactions. Carbon capture: The OpenDAC project applies universal MLIPs to screening sorbent materials for direct air capture of CO₂, a key technology for climate change mitigation. AI-accelerated screening allows evaluation of orders of magnitude more candidate materials than traditional DFT workflows. Drug discovery and molecular design: MLIPs are increasingly used in pharmaceutical research to model molecular conformations and binding energies. The Open Molecules 2025 (OMol25) dataset, released by Meta FAIR in 2025, provides high-accuracy calculations for a large set of molecular systems to support this use case. Materials discovery: Universal MLIPs enable high-throughput screening of novel inorganic materials, including battery electrolytes, semiconductors, and superconductors, by rapidly estimating stability and properties across large chemical spaces.

    Read more →
  • Sensory, Inc.

    Sensory, Inc.

    Sensory, Inc. is an American company which develops software AI technologies for speech, sound and vision. It is based in Santa Clara, California. Sensory’s technologies have shipped in over three billion products from hundreds of leading consumer electronics manufacturers including AT&T, Hasbro, Huawei, Google, Amazon, Samsung, LG, Mattel, Motorola, Plantronics, GoPro, Sony, Tencent, Garmin, LG, Microsoft, Lenovo, and more. Sensory has over 60 issued patents covering speech recognition in consumer electronics, biometric authentication, sensor/speech combinations, wake word technology, and more. == History == Sensory, Inc. was founded in 1994, originally as Sensory Circuits, by Forrest Mozer, Mike Mozer and Todd Mozer. The three had also co-founded ESS Technology years earlier. In 1999 Sensory acquired Fluent Speech Technologies, which was formed and started by a group of professors out of the Oregon Graduate Institute (formerly OGI, now OHSU). Fluent Speech Technologies developed high performance embedded speech engines, the technology from this acquisition is now the core technology used throughout Sensory's chip and software line. === Company timeline === 1994 – Founded 1995 – Introduces the RSC 164 - first commercially successful speech recognition IC 1998 – Introduces first speaker verification IC 2000 – Acquires Oregon based Fluent-Speech Technologies 2002 – Acquires Texas Instruments line of speech output ICs (the SC series) 2007 – Introduces first Voice User Interface for Bluetooth silicon (CSR BC-5) - BlueGenie 2008 - Sensory and BlueAnt partner on the V1 - Revolutionary new Bluetooth headset with a voice user interface. First wearable to use a voice user interface for control and best-reviewed speech recognition product in history 2009 – Introduced world's smallest text to speech system (TTS) and Truly HandsfreeTM Triggers/ wake words. 2010 – Introduced the NLP-5x – First Natural Language Voice Processor and TrulyHandsfree wake words in SDKs for Android, iOS, Linux, and Windows. NLP5x used the first generation of TrulyHandsfree wake words with low power and enhanced accuracy. 2011 – Sensory partners with Google and Microsoft to enable TrulyHandsfree as a front end to Goog411 and Bing411 2012 – Partnered with Tensilica to offer ultra-low power TrulyHandsfree wake words; introduced Speaker Verification and Speaker Identification for mobile phones and other consumer electronics. 2012 - TrulyHandsfree released into Samsung's Galaxy S2 for "Hey Galaxy" wake word 2013 – TrulyHandsfree wake words migrated to many new platforms and began shipping as MotoVoice in the Google-owned MotoX. Sensory's TrulyHandsfree in mobile takes off with the Galaxy S3 and S4 and Galaxy Note and is licensed into wearables like Google Glass. 2014 – Announced new initiative in Vision; added LG and Motorola as customers; received the 2014 Global Mobile Award for Best Mobile Technology Breakthrough at the GSMA Mobile World Congress in Barcelona, Spain (judges commented, "A big advance for the wearables market, this offers many benefits for consumers, increasing uptake and usage of many mobile apps, driving revenue for operators and content providers.") 2015-2018 - Licensed Google, Amazon, MSFT, Baidu, Huawei, ZTE, and many others with TrulyHandsfree wake words. Sensory develops first wake words for OK Google, Hey Siri, and Hey Cortana. 2019 - Sensory launched two new solutions: SoundID, sound identification, and TrulyNatural, embedded large vocabulary speech recognition. Sensory also acquired Vocalize.ai, an independent testing lab. 2020 - Sensory introduced VoiceHub, which allows the automated generation of wake words. 2021 - Sensory expands VoiceHub with speech recognition and NLU capabilities. The company initiated a new cloud platform, SensoryCloud.ai. 2022-Sensory rolls out SensoryCloud.ai with speech to text, text to speech, face & voice biometrics 2024- Sensory Automotive & TrulyNatural Speech-to-text On-Device launched == Technology and products == Sensory originally developed both hardware (Integrated Circuit - IC or "chip") and software platforms but migrated to software only around 2005 and added cloud and hybrid computing capabilities in 2021. Sensory's RSC-164 IC (Integrated Circuit or "chip") was used on NASA's Mars Polar Lander in the Mars Microphone on the Lander. Speech Synthesis SC-6x chips – acquired some speech synthesis technology from Texas Instruments. Sensory’s embedded AI solutions include the following: TrulyHandsfree (THF) - wake word detection and phrase spotting. TrulyNatural (TNL) - large vocabulary continuous speech recognition with NLU. TrulySecure (TS) - face and voice biometrics. TrulySecureSpeakerVerification (TSSV) - speaker and sound identification. VoiceHub - Online portal for creating custom wake words and speech recognition models with NLU. Sensory Automotive- Sensory Automotive is a full voice and vision suite of AI technologies that operate efficiently in the car without connecting to a network. The cloud initiative, SensoryCloud.ai, is targeting Speech To Text (STT), Text To Speech (TTS), Wake Word verification, face and voice recognition, and sound identification.

    Read more →
  • Dataset shift

    Dataset shift

    Dataset shift is a phenomenon in machine learning and statistics in which the joint distribution of input variables and target labels is different in the training phase and the deployment or test phase (i.e., P t r a i n ( X , Y ) ≠ P t e s t ( X , Y ) {\displaystyle P_{train}(X,Y)\neq P_{test}(X,Y)} ). This happens when the statistical properties of data used to train a model are no longer representative of the data encountered in real-world use, often resulting in degraded predictive performance and diminished generalization ability. Dataset shift is a generic term for a number of particular types of distributional change. Covariate shift is when the distribution of the input features changes, but the conditional relationship between inputs and outputs remains constant . Prior probability shift (or label shift) happens when the distribution of target labels changes, but the conditional distribution of inputs given labels stays the same. Concept shift (also known as concept drift) is the change of the conditional relationship between inputs and outputs that renders previously learned patterns invalid over time. A key challenge for deploying machine learning systems is dataset shift, in particular in dynamic environments where the data distributions change over time. Detecting and mitigating such shifts is an active area of research, e.g., drift detection, domain adaptation, continual learning.

    Read more →
  • EfficientNet

    EfficientNet

    EfficientNet is a family of convolutional neural networks (CNNs) for computer vision published by researchers at Google AI in 2019. Its key innovation is compound scaling, which uniformly scales all dimensions of depth, width, and resolution using a single parameter. EfficientNet models have been adopted in various computer vision tasks, including image classification, object detection, and segmentation. == Compound scaling == EfficientNet introduces compound scaling, which, instead of scaling one dimension of the network at a time, such as depth (number of layers), width (number of channels), or resolution (input image size), uses a compound coefficient ϕ {\displaystyle \phi } to scale all three dimensions simultaneously. Specifically, given a baseline network, the depth, width, and resolution are scaled according to the following equations: depth multiplier: d = α ϕ width multiplier: w = β ϕ resolution multiplier: r = γ ϕ {\displaystyle {\begin{aligned}{\text{depth multiplier: }}d&=\alpha ^{\phi }\\{\text{width multiplier: }}w&=\beta ^{\phi }\\{\text{resolution multiplier: }}r&=\gamma ^{\phi }\end{aligned}}} subject to α ⋅ β 2 ⋅ γ 2 ≈ 2 {\displaystyle \alpha \cdot \beta ^{2}\cdot \gamma ^{2}\approx 2} and α ≥ 1 , β ≥ 1 , γ ≥ 1 {\displaystyle \alpha \geq 1,\beta \geq 1,\gamma \geq 1} . The α ⋅ β 2 ⋅ γ 2 ≈ 2 {\displaystyle \alpha \cdot \beta ^{2}\cdot \gamma ^{2}\approx 2} condition is such that increasing ϕ {\displaystyle \phi } by a factor of ϕ 0 {\displaystyle \phi _{0}} would increase the total FLOPs of running the network on an image approximately 2 ϕ 0 {\displaystyle 2^{\phi _{0}}} times. The hyperparameters α {\displaystyle \alpha } , β {\displaystyle \beta } , and γ {\displaystyle \gamma } are determined by a small grid search. The original paper suggested 1.2, 1.1, and 1.15, respectively. Architecturally, they optimized the choice of modules by neural architecture search (NAS), and found that the inverted bottleneck convolution (which they called MBConv) used in MobileNet worked well. The EfficientNet family is a stack of MBConv layers, with shapes determined by the compound scaling. The original publication consisted of 8 models, from EfficientNet-B0 to EfficientNet-B7, with increasing model size and accuracy. EfficientNet-B0 is the baseline network, and subsequent models are obtained by scaling the baseline network by increasing ϕ {\displaystyle \phi } . == Variants == EfficientNet has been adapted for fast inference on edge TPUs and centralized TPU or GPU clusters by NAS. EfficientNet V2 was published in June 2021. The architecture was improved by further NAS search with more types of convolutional layers. It also introduced a training method, which progressively increases image size during training, and uses regularization techniques like dropout, RandAugment, and Mixup. The authors claim this approach mitigates accuracy drops often associated with progressive resizing.

    Read more →