AI Email Enhancer

AI Email Enhancer — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Phase stretch transform

    Phase stretch transform

    Phase stretch transform (PST) is a computational approach to signal and image processing. One of its utilities is for feature detection and classification. PST is related to time stretch dispersive Fourier transform. It transforms the image by emulating propagation through a diffractive medium with engineered 3D dispersive property (refractive index). The operation relies on symmetry of the dispersion profile and can be understood in terms of dispersive eigenfunctions or stretch modes. PST performs similar functionality as phase-contrast microscopy, but on digital images. PST can be applied to digital images and temporal (time series) data. It is a physics-based feature engineering algorithm. == Operation principle == Here the principle is described in the context of feature enhancement in digital images. The image is first filtered with a spatial kernel followed by application of a nonlinear frequency-dependent phase. The output of the transform is the phase in the spatial domain. The main step is the 2-D phase function which is typically applied in the frequency domain. The amount of phase applied to the image is frequency dependent, with higher amount of phase applied to higher frequency features of the image. Since sharp transitions, such as edges and corners, contain higher frequencies, PST emphasizes the edge information. Features can be further enhanced by applying thresholding and morphological operations. PST is a pure phase operation whereas conventional edge detection algorithms operate on amplitude. == Physical and mathematical foundations of phase stretch transform == Photonic time stretch technique can be understood by considering the propagation of an optical pulse through a dispersive fiber. By disregarding the loss and non-linearity in fiber, the non-linear Schrödinger equation governing the optical pulse propagation in fiber upon integration reduces to: E o ( z , t ) = 1 2 π ∫ − ∞ ∞ E ~ i ( 0 , ω ) ⋅ e − i β 2 z ω 2 2 ⋅ e i ω t d ω {\displaystyle E_{o}(z,t)={\frac {1}{2\pi }}\int _{-\infty }^{\infty }{\tilde {E}}_{i}(0,\omega )\cdot e^{\frac {-i\beta _{2}z\omega ^{2}}{2}}\cdot e^{i\omega {t}}\,d\omega } (1) where β 2 {\displaystyle \beta _{2}} = GVD parameter, z is propagation distance, E o ( z , t ) {\displaystyle E_{o}(z,t)} is the reshaped output pulse at distance z and time t. The response of this dispersive element in the time-stretch system can be approximated as a phase propagator as presented in H ( ω ) = e i φ ( ω ) = e i ∑ m = 0 ∞ φ m ( ω ) = ∏ m = 0 ∞ H m ( ω ) {\displaystyle H(\omega )=e^{i\varphi (\omega )}=e^{i\sum _{m=0}^{\infty }\varphi _{m}(\omega )}=\prod _{m=0}^{\infty }H_{m}(\omega )} (2) Therefore, Eq. 1 can be written as following for a pulse that propagates through the time-stretch system and is reshaped into a temporal signal with a complex envelope given by E o ( t ) = 1 2 π ∫ − ∞ ∞ E ~ i ( ω ) ⋅ H ( ω ) ⋅ e i ω t d ω {\displaystyle E_{o}(t)={\frac {1}{2\pi }}\int _{-\infty }^{\infty }{\tilde {E}}_{i}(\omega )\cdot H(\omega )\cdot e^{i\omega t}\,d\omega } (3) The time stretch operation is formulated as generalized phase and amplitude operations, S { E i ( t ) } = ∫ − ∞ + ∞ F { E i ( t ) } ⋅ e i φ ( ω ) ⋅ L ~ ( ω ) ⋅ e i ω t d ω {\displaystyle \mathbb {S} \{E_{i}(t)\}=\int _{-\infty }^{+\infty }{\mathcal {F}}\{E_{i}(t)\}\cdot e^{i\varphi (\omega )}\cdot {\tilde {L}}(\omega )\cdot e^{i\omega {t}}d\omega } (4) where e i φ ( ω ) {\displaystyle e^{i\varphi (\omega )}} is the phase filter and L ~ ( ω ) {\displaystyle {\tilde {L}}(\omega )} is the amplitude filter. Next the operator is converted to discrete domain, S { E i [ n ] } = 1 N ∑ u = 0 N − 1 F F T { E i ( n ) } ⋅ K ~ ( u ) ⋅ L ~ ( u ) ⋅ e i 2 π N u n {\displaystyle \mathbb {S} \{E_{i}[n]\}={\frac {1}{N}}\sum _{u=0}^{N-1}FFT\{E_{i}(n)\}\cdot {\tilde {K}}(u)\cdot {\tilde {L}}(u)\cdot e^{i{\frac {2\pi }{N}}un}} (5) where u {\displaystyle u} is the discrete frequency, K ~ ( u ) {\displaystyle {\tilde {K}}(u)} is the phase filter, L ~ ( u ) {\displaystyle {\tilde {L}}(u)} is the amplitude filter and FFT is fast Fourier transform. The stretch operator S { } {\displaystyle \mathbb {S} \{\}} for a digital image is then S { E i [ n , m ] } = 1 M N ∑ v = 0 N − 1 ∑ u = 0 M − 1 F F T 2 { E i ( n , m ) } ⋅ K ~ ( u , v ) ⋅ L ~ ( u , v ) ⋅ e i 2 π M u m ⋅ e i 2 π N v n {\displaystyle \mathbb {S} \{E_{i}[n,m]\}={\frac {1}{MN}}\sum _{v=0}^{N-1}\sum _{u=0}^{M-1}FFT^{2}\{E_{i}(n,m)\}\cdot {\tilde {K}}(u,v)\cdot {\tilde {L}}(u,v)\cdot e^{i{\frac {2\pi }{M}}um}\cdot e^{i{\frac {2\pi }{N}}vn}} (6) In the above equations, E i [ n , m ] {\displaystyle E_{i}[n,m]} is the input image, n {\displaystyle n} and m {\displaystyle m} are the spatial variables, F F T 2 {\displaystyle FFT^{2}} is the two-dimensional fast Fourier transform, and u {\displaystyle u} and v {\displaystyle v} are spatial frequency variables. The function K ~ ( u , v ) {\displaystyle {\tilde {K}}(u,v)} is the warped phase kernel and the function L ~ ( u , v ) {\displaystyle {\tilde {L}}(u,v)} is a localization kernel implemented in frequency domain. PST operator is defined as the phase of the Warped Stretch Transform output as follows P S T { E i [ n , m ] } ≜ ∡ { S { E i [ x , y ] } } {\displaystyle PST\{E_{i}[n,m]\}\triangleq \measuredangle \{\mathbb {S} \{E_{i}[x,y]\}\}} (7) where ∡ { } {\displaystyle \measuredangle \{\}} is the angle operator. == PST kernel implementation == The warped phase kernel K ~ ( u , v ) {\displaystyle {\tilde {K}}(u,v)} can be described by a nonlinear frequency dependent phase K ~ ( u , v ) = e i φ ( u , v ) {\displaystyle {\tilde {K}}(u,v)=e^{i\varphi (u,v)}} While arbitrary phase kernels can be considered for PST operation, here we study the phase kernels for which the kernel phase derivative is a linear or sublinear function with respect to frequency variables. A simple example for such phase derivative profiles is the inverse tangent function. Consider the phase profile in the polar coordinate system φ ( u , v ) = φ polar ( r , θ ) = φ polar ( r ) {\displaystyle \varphi (u,v)=\varphi _{\text{polar}}(r,\theta )=\varphi _{\text{polar}}(r)} From d φ ( r ) d r = tan − 1 ⁡ ( r ) {\displaystyle {\frac {d\varphi (r)}{dr}}=\tan ^{-1}(r)} we have φ ( r ) = r tan − 1 ⁡ ( r ) − 1 2 log ⁡ ( r 2 + 1 ) {\displaystyle \varphi (r)=r\tan ^{-1}(r)-{\frac {1}{2}}\log(r^{2}+1)} Therefore, the PST kernel is implemented as φ ( r ) = S ⋅ ( W r ) ⋅ tan − 1 ⁡ ( W r ) − 1 2 log ⁡ ( 1 + ( W r ) 2 ) ( W r max ) ⋅ tan − 1 ⁡ ( W r max ) − 1 2 log ⁡ ( 1 + ( W r max ) 2 ) {\displaystyle \varphi (r)=S\cdot {\frac {(Wr)\cdot \tan ^{-1}(Wr)-{\frac {1}{2}}\log(1+(Wr)^{2})}{(Wr_{\max })\cdot \tan ^{-1}(Wr_{\max })-{\frac {1}{2}}\log(1+(Wr_{\max })^{2})}}} where S {\displaystyle S} and W {\displaystyle W} are real-valued numbers related to the strength and warp of the phase profile == Applications == PST has been used for edge detection in biological and biomedical images as well as synthetic-aperture radar (SAR) image processing, as well as detail and feature enhancement for digital images. PST has also been applied to improve the point spread function for single molecule imaging in order to achieve super-resolution. The transform exhibits intrinsic superior properties compared to conventional edge detectors for feature detection in low contrast visually impaired images. The PST function can also be performed on 1-D temporal waveforms in the analog domain to reveal transitions and anomalies in real time. == Open source code release == On February 9, 2016, a UCLA Engineering research group has made public the computer code for PST algorithm that helps computers process images at high speeds and "see" them in ways that human eyes cannot. The researchers say the code could eventually be used in face, fingerprint, and iris recognition systems for high-tech security, as well as in self-driving cars' navigation systems or for inspecting industrial products. The Matlab implementation for PST can also be downloaded from Matlab Files Exchange. However, it is provided for research purposes only, and a license must be obtained for any commercial applications. The software is protected under a US patent. The code was then significantly refactored and improved to support GPU acceleration. In May 2022, it became one algorithm in PhyCV: the first physics-inspired computer vision library.

    Read more →
  • Lexical Markup Framework

    Lexical Markup Framework

    Language resource management – Lexical markup framework (LMF; ISO 24613), produced by ISO/TC 37, is the ISO standard for natural language processing (NLP) and machine-readable dictionary (MRD) lexicons. The scope is standardization of principles and methods relating to language resources in the contexts of multilingual communication. == Objectives == The goals of LMF are to provide a common model for the creation and use of lexical resources, to manage the exchange of data between and among these resources, and to enable the merging of large number of individual electronic resources to form extensive global electronic resources. Types of individual instantiations of LMF can include monolingual, bilingual or multilingual lexical resources. The same specifications are to be used for both small and large lexicons, for both simple and complex lexicons, for both written and spoken lexical representations. The descriptions range from morphology, syntax, computational semantics to computer-assisted translation. The covered languages are not restricted to European languages but cover all natural languages. The range of targeted NLP applications is not restricted. LMF is able to represent most lexicons, including WordNet, EDR and PAROLE lexicons. == History == In the past, lexicon standardization has been studied and developed by a series of projects like GENELEX, EDR, EAGLES, MULTEXT, PAROLE, SIMPLE and ISLE. Then, the ISO/TC 37 National delegations decided to address standards dedicated to NLP and lexicon representation. The work on LMF started in Summer 2003 by a new work item proposal issued by the US delegation. In Fall 2003, the French delegation issued a technical proposition for a data model dedicated to NLP lexicons. In early 2004, the ISO/TC 37 committee decided to form a common ISO project with Nicoletta Calzolari (CNR-ILC Italy) as convenor and Gil Francopoulo (Tagmatica France) and Monte George (ANSI, United States) as editors. The first step in developing LMF was to design an overall framework based on the general features of existing lexicons and to develop a consistent terminology to describe the components of those lexicons. The next step was the actual design of a comprehensive model that best represented all of the lexicons in detail. A large panel of 60 experts contributed a wide range of requirements for LMF that covered many types of NLP lexicons. The editors of LMF worked closely with the panel of experts to identify the best solutions and reach a consensus on the design of LMF. Special attention was paid to the morphology in order to provide powerful mechanisms for handling problems in several languages that were known as difficult to handle. 13 versions have been written, dispatched (to the National nominated experts), commented and discussed during various ISO technical meetings. After five years of work, including numerous face-to-face meetings and e-mail exchanges, the editors arrived at a coherent UML model. In conclusion, LMF should be considered a synthesis of the state of the art in NLP lexicon field. == Current stage == The ISO number is 24613. The LMF specification has been published officially as an International Standard on 17 November 2008. == As one of the members of the ISO/TC 37 family of standards == The ISO/TC 37 standards are currently elaborated as high level specifications and deal with word segmentation (ISO 24614), annotations (ISO 24611 a.k.a. MAF, ISO 24612 a.k.a. LAF, ISO 24615 a.k.a. SynAF, and ISO 24617-1 a.k.a. SemAF/Time), feature structures (ISO 24610), multimedia containers (ISO 24616 a.k.a. MLIF), and lexicons (ISO 24613). These standards are based on low level specifications dedicated to constants, namely data categories (revision of ISO 12620), language codes (ISO 639), scripts codes (ISO 15924), country codes (ISO 3166) and Unicode (ISO 10646). The two level organization forms a coherent family of standards with the following common and simple rules: the high level specification provides structural elements that are adorned by the standardized constants; the low level specifications provide standardized constants as metadata. == Key standards == The linguistics constants like /feminine/ or /transitive/ are not defined within LMF but are recorded in the Data Category Registry (DCR) that is maintained as a global resource by ISO/TC 37 in compliance with ISO/IEC 11179-3:2003. And these constants are used to adorn the high level structural elements. The LMF specification complies with the modeling principles of Unified Modeling Language (UML) as defined by Object Management Group (OMG). The structure is specified by means of UML class diagrams. The examples are presented by means of UML instance (or object) diagrams. An XML DTD is given in an annex of the LMF document. == Model structure == LMF is composed of the following components: The core package that is the structural skeleton which describes the basic hierarchy of information in a lexical entry. Extensions of the core package which are expressed in a framework that describes the reuse of the core components in conjunction with the additional components required for a specific lexical resource. The extensions are specifically dedicated to morphology, MRD, NLP syntax, NLP semantics, NLP multilingual notations, NLP morphological patterns, multiword expression patterns, and constraint expression patterns. == Example == In the following example, the lexical entry is associated with a lemma clergyman and two inflected forms clergyman and clergymen. The language coding is set for the whole lexical resource. The language value is set for the whole lexicon as shown in the following UML instance diagram. The elements Lexical Resource, Global Information, Lexicon, Lexical Entry, Lemma, and Word Form define the structure of the lexicon. They are specified within the LMF document. On the contrary, languageCoding, language, partOfSpeech, commonNoun, writtenForm, grammaticalNumber, singular, plural are data categories that are taken from the Data Category Registry. These marks adorn the structure. The values ISO 639-3, clergyman, clergymen are plain character strings. The value eng is taken from the list of languages as defined by ISO 639-3. With some additional information like dtdVersion and feat, the same data can be expressed by the following XML fragment: This example is rather simple, while LMF can represent much more complex linguistic descriptions the XML tagging is correspondingly complex. == Selected publications about LMF == The first publication about the LMF specification as it has been ratified by ISO (this paper became (in 2015) the 9th most cited paper within the Language Resources and Evaluation conferences from LREC papers): Language Resources and Evaluation LREC-2006/Genoa: Gil Francopoulo, Monte George, Nicoletta Calzolari, Monica Monachini, Nuria Bel, Mandy Pet, Claudia Soria: Lexical Markup Framework (LMF) About semantic representation: Gesellschaft für linguistische Datenverarbeitung GLDV-2007/Tübingen: Gil Francopoulo, Nuria Bel, Monte George Nicoletta Calzolari, Monica Monachini, Mandy Pet, Claudia Soria: Lexical Markup Framework ISO standard for semantic information in NLP lexicons About African languages: Traitement Automatique des langues naturelles, Marseille, 2014: Mouhamadou Khoule, Mouhamad Ndiankho Thiam, El Hadj Mamadou Nguer: Toward the establishment of a LMF-based Wolof language lexicon (Vers la mise en place d'un lexique basé sur LMF pour la langue wolof) [in French] About Asian languages: Lexicography, Journal of ASIALEX, Springer 2014: Lexical Markup Framework: Gil Francopoulo, Chu-Ren Huang: An ISO Standard for Electronic Lexicons and its Implications for Asian Languages DOI 10.1007/s40607-014-0006-z About European languages: COLING 2010: Verena Henrich, Erhard Hinrichs: Standardizing Wordnets in the ISO Standard LMF: Wordnet-LMF for GermaNet EACL 2012: Judith Eckle-Kohler, Iryna Gurevych: Subcat-LMF: Fleshing out a standardized format for subcategorization frame interoperability EACL 2012: Iryna Gurevych, Judith Eckle-Kohler, Silvana Hartmann, Michael Matuschek, Christian M Meyer, Christian Wirth: UBY - A Large-Scale Unified Lexical-Semantic Resource Based on LMF. About Semitic languages: Journal of Natural Language Engineering, Cambridge University Press (to appear in Spring 2015): Aida Khemakhem, Bilel Gargouri, Abdelmajid Ben Hamadou, Gil Francopoulo: ISO Standard Modeling of a large Arabic Dictionary. Proceedings of the seventh Global Wordnet Conference 2014: Nadia B M Karmani, Hsan Soussou, Adel M Alimi: Building a standardized Wordnet in the ISO LMF for aeb language. Proceedings of the workshop: HLT & NLP within Arabic world, LREC 2008: Noureddine Loukil, Kais Haddar, Abdelmajid Ben Hamadou: Towards a syntactic lexicon of Arabic Verbs. Traitement Automatique des Langues Naturelles, Toulouse (in French) 2007: Khemakhem A, Gargouri B, Abdelwahed A, Francopoulo G: Modélisation des paradigmes de fl

    Read more →
  • N-jet

    N-jet

    An N-jet is the set of (partial) derivatives of a function f ( x ) {\displaystyle f(x)} up to order N. Specifically, in the area of computer vision, the N-jet is usually computed from a scale space representation L {\displaystyle L} of the input image f ( x , y ) {\displaystyle f(x,y)} , and the partial derivatives of L {\displaystyle L} are used as a basis for expressing various types of visual modules. For example, algorithms for tasks such as feature detection, feature classification, stereo matching, tracking and object recognition can be expressed in terms of N-jets computed at one or several scales in scale space.

    Read more →
  • INDECT

    INDECT

    INDECT is a research project in the area of intelligent security systems performed by several European universities since 2009 and funded by the European Union. The purpose of the project is to involve European scientists and researchers in the development of solutions to and tools for automatic threat detection through e.g. processing of CCTV camera data streams, standardization of video sequence quality for user applications, threat detection in computer networks as well as data and privacy protection. The area of research, applied methods, and techniques are described in the public deliverables which are available to the public on the project's website. Practically, all information related to the research is public. Only documents that comprise information related to financial data or information that could negatively influence the competitiveness and law enforcement capabilities of parties involved in the project are not published. This follows regulations and practices applied in EU research projects. == Application and target users == The main end-user of INDECT solutions are police forces and security services. The principle of operation of the project is detecting threats and identifying sources of threats, without monitoring and searching for particular citizens or groups of citizens. Then, the system operator (i.e. police officer) decides whether an intervention of services responsible for public security are required or not. Further investigation eventually leading to persons related to threats is performed, preserving the presumption of innocence, based on existing procedures already used by police services and prosecutors. As it can be found in the project deliverables, INDECT does not involve storage of personal data (such as names, addresses, identity document numbers, etc.). A similar, behavior-based surveillance program was SAMURAI (Suspicious and Abnormal behavior Monitoring Using a netwoRk of cAmeras & sensors for sItuation awareness enhancement). == Expected results == The main expected results of the INDECT project are: Trial of intelligent analysis of video and audio data for threat detection in urban environments Creation of tools and technology for privacy and data protection during storage and transmission of information using quantum cryptography and new methods of digital watermarking Performing computer-aided detection of threats and targeted crimes in Internet resources with privacy-protecting solutions Construction of a search engine for rapid semantic search based on watermarking of content related to child pornography and human organ trafficking Implementation of a distributed computer system that is capable of effective intelligent processing == Controversy == Some media and other sources accuse INDECT of privacy abuse, collecting personal data, and keeping information from the public. Consequently, these issues have been commented and discussed by some Members of the European Parliament. As seen in the project's documentation, INDECT does not involve mobile phone tracking or call interception. The rumors about testing INDECT during 2012 UEFA European Football Championship also turned out to be false. The mid-term review of the Seventh Framework Programme to the European Parliament strongly urges the European Commission to immediately make all documents available and to define a clear and strict mandate for the research goal, the application, and the end users of INDECT, and stresses a thorough investigation of the possible impact on fundamental rights. Nevertheless, according to Mr. Paweł Kowal, MEP, the project had the ethical review on 15 March 2011 in Brussels with the participation of ethics experts from Austria, France, Netherlands, Germany and Great Britain.

    Read more →
  • Class activation mapping

    Class activation mapping

    Class activation mapping methods are explainable AI (XAI) techniques used to visualize the regions of an input image that are the most relevant for a particular task, especially image classification, in convolutional neural networks (CNNs). These methods generate heatmaps by weighting the feature maps from a convolutional layer according to their relevance to the target class. In the field of artificial intelligence, generically defined as "the effort to automate intellectual tasks normally performed by humans", machine learning and deep learning were created. They both use statistical and computational methods to learn patterns from data, reducing the need for manually coded rules. Machine learning models are trained on input data and the known respective answers, learning the underlying patterns or structures present in the data. Traditional Machine learning algorithms employ manually designed feature sets, posing a direct link between machine learning designers and employed features. Deep learning is a subfield of machine learning, based on the concept of successive layers of representation, in which the data is progressively unfolded in different ways, to extract relevant and informative patterns in data analysis. Deep learning algorithms are defined as feature learning algorithms automatically learning hierarchical feature representations from raw data, extracting increasingly abstract features through multiple layers. CNNs are a specific architecture of deep learning models, designed to process spatially structured data, such as images, exploiting a series of convolution, non-linear activation and pooling operations to extract relevant features, contained in the so-called feature maps from input data. CNNs have demonstrated to be highly effective in a variety of computer vision and image processing tasks. CNNs (and deep learning models more broadly) are described as black boxes due to their complex and non-transparent internal layers of representation. The need for clearer indications on its internal working and decision-making process gave birth to XAI techniques. Among the proposed XAI techniques for computer vision tasks, Class activation mapping methods can show which pixels in an input image are important to the predicted logit for a class of interest, in a classification task. Class activation mapping methods were originally developed for class-discriminative scenarios to visualize which parts of the input image influenced the classification decision, namely to visually highlight the regions of those feature maps that contribute most strongly to the prediction of a given class. More advanced versions of these methods are not limited to image classification tasks, but have been extended also to several vision-related tasks, such as object detection, image captioning, visual question answering and image segmentation. == Background == The following methods laid the groundwork for the class activation maps approaches, forming the conceptual basis of using gradients to highlight class-discriminative regions. === Class model visualization and saliency maps for convolutional neural networks === The class model visualization and image-specific saliency maps approaches have been presented in the foundational work "Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps" by Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman and it generalizes the deconvnet method by Zeiler and Fergus. Class model visualization synthesizes an artificial input image that strongly activates the output neurons associated with a target class. Given a trained, fixed model, this method starts with a zero-initialized image, backpropagates the gradients from the class score to the image pixels, updates the image pixels increasing the specific class scores and it repeats the pixel updating process, showing an encoded (idealized version) prototype of the class of interest. Image-specific class saliency visualization method provides a visual explanation by highlighting the most relevant pixels in an image for predicting a certain class C of interest. This is done by computing the gradient of the class score with respect to the input image, I 0 , {\displaystyle I_{0},} w = ∂ S C ∂ I | I 0 {\displaystyle w=\left.{\frac {\partial S_{C}}{\partial I}}\right|_{I_{0}}} approximating the model locally (around I 0 {\displaystyle I_{0}} ) as linear, using a first-order Taylor expansion: S C ( I ) ≈ w C T I + b {\displaystyle S_{C}(I)\approx w_{C}^{T}I+b} . The magnitude of w C {\displaystyle w_{C}} , the gradient, indicates the importancy of the pixels: larger gradients suggest greater influence on the prediction. Once the gradient is known, the saliency map is defined as the maximum absolute gradient across the color channels: M i j = m a x C | ∂ S C ∂ I i j C | {\displaystyle M_{ij}=max_{C}\left|{\frac {\partial S_{C}}{\partial I_{ij}^{C}}}\right|} resulting in an saliency map (i.e. heatmap). === Guided backpropagation === The concept of guided backpropagation can be traced for the first time in the paper by Springenberg et al. "Striving For Simplicity: The All Convolutional Net" and also this method builds upon the work by Zeiler and Fergus "Visualizing and Understanding Convolutional Networks". Guided backpropagation core is to understand what a CNN is learning, by visualizing the patterns that activate more strongly individual neurons (or filters), in architectures which do not rely on max-pooling layer. When propagating gradients back through a rectified linear unit (ReLU), guided backpropagation passes the gradient if and only if the input to the ReLU was positive (forward pass) and the output gradient is positive (backward signal), tackling both inactive neurons, negative gradients and suppressing the noise. The result displays sharper, high-resolution visualizations of what each neuron is responding to. Guided backpropagation represents a simple and practical method for model interpretability, helping understand how and where neural networks detect semantic concepts across layers. Moreover, it can be applied to any network architecture, due to its working principle. == Base versions == Class activation mapping and gradient-weighted class activation mapping are the original and most widely used methods for visual explanations in convolutional neural networks. These methods serve as the foundation for many later developments in explainable AI. Notation: In this article, the symbols i and j represent integer indices that disappear inside sums or averages, while x and y are the continuous (or up-sampled integer) coordinates of the final heat-map that is plotted. === Class activation mapping (CAM) === Class activation mapping (CAM) was the first, and the original, version of CAM methods, and it gave the name to the whole category. The approach was firstly introduced by Zhou et al. in their seminal work "Learning Deep Features for Discriminative Localization". This approach achieves class-specific heatmaps by modifying image classification CNN architectures, replacing fully-connected layers with convolutional layers and a final global average pooling layer. Its main scope is to localize and highlight discriminative regions of an input image that a CNN uses to identify a particular class, without needing explicit bounding box annotations. ==== Global average pooling (GAP) ==== Global average pooling (GAP) represents the key element in the original CAM approach. It is a dimensionality reduction technique and, similarly to other pooling layers, it allows the downsampling of the feature maps, calculating representative values for a specific region of the feature map. The particularity of GAP is that it calculates a single value for an entire feature map, significantly reducing the model dimensions. ==== Mathematical description ==== The mathematical description considers as its key the combination of convolutional and GAP layers. In CAM, it is mandatory to have the GAP layer after the last convolutional layer and before the final linear classifier layer. This last element of the architecture connects the output logits (the network predictions) y C {\displaystyle y^{C}} , to the GAP values, with its respective fine-tuned weights, w k C {\displaystyle w_{k}^{C}} . Considering A k {\displaystyle A^{k}} as the last feature maps of the last convolutional layer, GAP produces one value for each feature map, by averaging all the matrix elements (i, j) of the feature map: F k = 1 m n ∑ i = 1 m ∑ j = 1 n A i j k {\displaystyle F^{k}={\frac {1}{mn}}\sum _{i=1}^{m}\sum _{j=1}^{n}A_{ij}^{k}} with A k = [ A 11 k A 12 k ⋯ A 1 n k A 21 k A 22 k ⋯ A 2 n k ⋮ ⋮ ⋱ ⋮ A m 1 k A m 2 k ⋯ A m n k ] = { A i j k ∣ 1 ≤ i ≤ m , 1 ≤ j ≤ n } {\displaystyle A^{k}={\begin{bmatrix}A_{11}^{k}&A_{12}^{k}&\cdots &A_{1n}^{k}\\A_{21}^{k}&A_{22}^{k}&\cdots &A_{2n}^{k}\\\vdots &\vdots &\ddots &\vdots \\A_{m1}^{k}&A_{m2}^{k}&\cdots &A_{mn}^{k}\end{bmatrix}}=\left\{A_{

    Read more →
  • Chai AI

    Chai AI

    Chai AI (also known as Chai Research) is an American artificial intelligence (AI) company that operates a chatbot platform where users can create, share, and interact with character-based chatbots powered by large language models (LLMs). The company is headquartered in Palo Alto, California. == History == Chai was founded in 2021 by William Beauchamp, a former quantitative trader educated at Cambridge, who began developing the initial prototype in 2020 in Cambridge, England. The company launched in 2021 and relocated to Palo Alto in 2022. In June 2023, Chai raised US$2 million in a pre-seed funding round. In September 2023, GPU cloud provider CoreWeave invested in the company at a valuation of US$450 million. In January 2024, Chai Research reported a $450 million valuation following an investment from cloud computing provider CoreWeave. In July 2024, authorities in Belgium launched an investigation into the company following reports of a man dying by suicide following extensive chats on the Chai app. == Reception == In 2025, Chai Research announced that their app had over 10 million downloads and 1 million daily active users. In 2022, Canadian writer Sheila Heti published her conversations with various chatbots in The Paris Review, including Chai AI chatbots, and later used Chai AI chatbots in the development of a novel. Heti said that she had found that Chai's default chatbot, Eliza, "had turned out to be like most of the other bots on the site—primarily interested in sex". In January 2026, CHAI introduced country-based blocks on its free, ad-supported tier, initially providing the community with little information and inaccurate lists of the affected countries. Users in "Low tier" regions are required to subscribe to use the app in any capacity, while "High tier" regions will retain free ad-supported access. In response to backlash, the company announced a "Basic" tier with unlimited messages and ads, intended to cover electricity and infrastructure costs. In February 2026, CHAI was criticized for the unannounced implementation of restrictive "token limits" that abruptly blocked messages and froze conversations for both free and paid subscribers. Users generating long responses or utilizing roleplay features found their quotas exhausted within minutes, resulting in lockouts lasting anywhere from a few hours to a week. == Technology == Chai allows users to create characters and interact with chatbot versions of those characters. These chatbots use the open-source large language model (LLM) GPT-J originally developed by EleutherAI. Chai AI chatbots can be shared on the platform for other users to interact with.

    Read more →
  • Augmented Analytics

    Augmented Analytics

    Augmented Analytics is an approach of data analytics that employs the use of machine learning and natural language processing to automate analysis processes normally done by a specialist or data scientist. The term was introduced in 2017 by Rita Sallam, Cindi Howson, and Carlie Idoine in a Gartner research paper. Augmented analytics is based on business intelligence and analytics. In the graph extraction step, data from different sources are investigated. == Defining Augmented Analytics == Machine Learning – a systematic computing method that uses algorithms to sift through data to identify relationships, trends, and patterns. It is a process that allows algorithms to dynamically learn from data instead of having a set base of programmed rules. Natural language generation (NLG) – a software capability that takes unstructured data and translates it into plain-English, readable, language. Automating Insights – using machine learning algorithms to automate data analysis processes. Natural Language Query – enabling users to query data using business terms that are either typed onto a search box or spoken. == Data Democratization == Data Democratization is the democratizing data access in order to relieve data congestion and get rid of any sense of data "gatekeepers". This process must be implemented alongside a method for users to make sense of the data. This process is used in hopes of speeding up company decision making and uncovering opportunities hidden in data. There are three aspects to democratising data: Data Parameterisation and Characterisation. Data Decentralisation using an OS of blockchain and DLT technologies, as well as an independently governed secure data exchange to enable trust. Consent Market-driven Data Monetisation. When it comes to connecting assets, there are two features that will accelerate the adoption and usage of data democratisation: decentralized identity management and business data object monetization of data ownership. It enables multiple individuals and organizations to identify, authenticate, and authorize participants and organizations, enabling them to access services, data or systems across multiple networks, organizations, environments, and use cases. It empowers users and enables a personalized, self-service digital onboarding system so that users can self-authenticate without relying on a central administration function to process their information. Simultaneously, decentralized identity management ensures the user is authorized to perform actions subject to the system’s policies based on their attributes (role, department, organization, etc.) and/ or physical location. == Use cases == Agriculture – Farmers collect data on water use, soil temperature, moisture content and crop growth, augmented analytics can be used to make sense of this data and possibly identify insights that the user can then use to make business decisions. Smart Cities – Many cities across the United States, known as Smart Cities collect large amounts of data on a daily basis. Augmented analytics can be used to simplify this data in order to increase effectiveness in city management (transportation, natural disasters, etc.). Analytic Dashboards – Augmented analytics has the ability to take large data sets and create highly interactive and informative analytical dashboards that assist in many organizational decisions. Augmented Data Discovery – Using an augmented analytics process can assist organizations in automatically finding, visualizing and narrating potentially important data correlations and trends. Data Preparation – Augmented analytics platforms have the ability to take large amounts of data and organize and "clean" the data in order for it to be usable for future analyses. Business – Businesses collect large amounts of data, daily. Some examples of types of data collected in business operations include; sales data, consumer behavior data, distribution data. An augmented analytics platform provides access to analysis of this data, which could be used in making business decisions.

    Read more →
  • Mistral Vibe

    Mistral Vibe

    Mistral Vibe or Vibe (Le Chat until May 2026), is a chatbot that uses generative artificial intelligence developed in France by Mistral AI. Mistral Vibe is available in iOS and Android. Its services are operated on a freemium model. == History == In February 2024, Mistral AI released Le Chat. In January 2025, Mistral AI made a content deal with Agence France-Presse (AFP) that lets Le Chat query AFP's entire archive dating back to 1983. On 6 February 2025, a mobile app for Le Chat was released for iOS and Android, and a subscription tier, Pro, was introduced at a cost of $14.99 per month. In July 2025, Mistral AI released Voxtral, an open-source language model that understands and generates audio. Mistral introduced a voice mode for chatting that uses Voxtral, and projects, which allows grouping chats and files. In September 2025, Le Chat introduced the capability to remember previous conversations. In May 2026, Mistral AI announced the rebrand from Le Chat to Mistral Vibe and new features were introduced at the same time.

    Read more →
  • Geometric primitive

    Geometric primitive

    In vector computer graphics, CAD systems, and geographic information systems, a geometric primitive (or prim) is the simplest (i.e. 'atomic' or irreducible) geometric shape that the system can handle (draw, store). Sometimes the subroutines that draw the corresponding objects are called "geometric primitives" as well. The most "primitive" primitives are point and straight line segments, which were all that early vector graphics systems had. In constructive solid geometry, primitives are simple geometric shapes such as a cube, cylinder, sphere, cone, pyramid, torus. Modern 2D computer graphics systems may operate with primitives which are curves (segments of straight lines, circles and more complicated curves), as well as shapes (boxes, arbitrary polygons, circles). A common set of two-dimensional primitives includes lines, points, and polygons, although some people prefer to consider triangles primitives, because every polygon can be constructed from triangles (polygon triangulation). All other graphic elements are built up from these primitives. In three dimensions, triangles or polygons positioned in three-dimensional space can be used as primitives to model more complex 3D forms. In some cases, curves (such as Bézier curves, circles, etc.) may be considered primitives; in other cases, curves are complex forms created from many straight, primitive shapes. == Common primitives == The set of geometric primitives is based on the dimension of the region being represented: Point (0-dimensional), a single location with no height, width, or depth. Line or curve (1-dimensional), having length but no width, although a linear feature may curve through a higher-dimensional space. Planar surface or curved surface (2-dimensional), having length and width. Volumetric region or solid (3-dimensional), having length, width, and depth. In GIS, the terrain surface is often spoken of colloquially as "2 1/2 dimensional," because only the upper surface needs to be represented. Thus, elevation can be conceptualized as a scalar field property or function of two-dimensional space, affording it a number of data modeling efficiencies over true 3-dimensional objects. A shape of any of these dimensions greater than zero consists of an infinite number of distinct points. Because digital systems are finite, only a sample set of the points in a shape can be stored. Thus, vector data structures typically represent geometric primitives using a strategic sample, organized in structures that facilitate the software interpolating the remainder of the shape at the time of analysis or display, using the algorithms of Computational geometry. A Point is a single coordinate in a Cartesian coordinate system. Some data models allow for Multipoint features consisting of several disconnected points. A Polygonal chain or Polyline is an ordered list of points (termed vertices in this context). The software is expected to interpolate the intervening shape of the line between adjacent points in the list as a parametric curve, most commonly a straight line, but other types of curves are frequently available, including circular arcs, cubic splines, and Bézier curves. Some of these curves require additional points to be defined that are not on the line itself, but are used for parametric control. A Polygon is a polyline that closes at its endpoints, representing the boundary of a two-dimensional region. The software is expected to use this boundary to partition 2-dimensional space into an interior and exterior. Some data models allow for a single feature to consist of multiple polylines, which could collectively connect to form a single closed boundary, could represent a set of disjoint regions (e.g., the state of Hawaii), or could represent a region with holes (e.g., a lake with an island). A Parametric shape is a standardized two-dimensional or three-dimensional shape defined by a minimal set of parameters, such as an ellipse defined by two points at its foci, or three points at its center, vertex, and co-vertex. A Polyhedron or Polygon mesh is a set of polygon faces in three-dimensional space that are connected at their edges to completely enclose a volumetric region. In some applications, closure may not be required or may be implied, such as modeling terrain. The software is expected to use this surface to partition 3-dimensional space into an interior and exterior. A triangle mesh is a subtype of polyhedron in which all faces must be triangles, the only polygon that will always be planar, including the Triangulated irregular network (TIN) commonly used in GIS. A parametric mesh represents a three-dimensional surface by a connected set of parametric functions, similar to a spline or Bézier curve in two dimensions. The most common structure is the Non-uniform rational B-spline (NURBS), supported by most CAD and animation software. == Application in GIS == A wide variety of vector data structures and formats have been developed during the history of Geographic information systems, but they share a fundamental basis of storing a core set of geometric primitives to represent the location and extent of geographic phenomena. Locations of points are almost always measured within a standard Earth-based coordinate system, whether the spherical Geographic coordinate system (latitude/longitude), or a planar coordinate system, such as the Universal Transverse Mercator. They also share the need to store a set of attributes of each geographic feature alongside its shape; traditionally, this has been accomplished using the data models, data formats, and even software of relational databases. Early vector formats, such as POLYVRT, the ARC/INFO Coverage, and the Esri shapefile support a basic set of geometric primitives: points, polylines, and polygons, only in two dimensional space and the latter two with only straight line interpolation. TIN data structures for representing terrain surfaces as triangle meshes were also added. Since the mid 1990s, new formats have been developed that extend the range of available primitives, generally standardized by the Open Geospatial Consortium's Simple Features specification. Common geometric primitive extensions include: three-dimensional coordinates for points, lines, and polygons; a fourth "dimension" to represent a measured attribute or time; curved segments in lines and polygons; text annotation as a form of geometry; and polygon meshes for three-dimensional objects. Frequently, a representation of the shape of a real-world phenomenon may have a different (usually lower) dimension than the phenomenon being represented. For example, a city (a two-dimensional region) may be represented as a point, or a road (a three-dimensional volume of material) may be represented as a line. This dimensional generalization correlates with tendencies in spatial cognition. For example, asking the distance between two cities presumes a conceptual model of the cities as points, while giving directions involving travel "up," "down," or "along" a road imply a one-dimensional conceptual model. This is frequently done for purposes of data efficiency, visual simplicity, or cognitive efficiency, and is acceptable if the distinction between the representation and the represented is understood, but can cause confusion if information users assume that the digital shape is a perfect representation of reality (i.e., believing that roads really are lines). == In 3D modelling == In CAD software or 3D modelling, the interface may present the user with the ability to create primitives which may be further modified by edits. For example, in the practice of box modelling the user will start with a cuboid, then use extrusion and other operations to create the model. In this use the primitive is just a convenient starting point, rather than the fundamental unit of modelling. A 3D package may also include a list of extended primitives which are more complex shapes that come with the package. For example, a teapot is listed as a primitive in 3D Studio Max. == In graphics hardware == Various graphics accelerators exist with hardware acceleration for rendering specific primitives such as lines or triangles, frequently with texture mapping and shaders. Modern 3D accelerators typically accept sequences of triangles as triangle strips.

    Read more →
  • Text simplification

    Text simplification

    Text simplification is an aspect of natural language processing that involves modifying, organizing, or categorizing existing text to make it easier to understand while retaining its original meaning. This process is essential in today's world, where communication is increasingly complex due to advancements in science, technology, and media. Human languages are inherently intricate, with extensive vocabularies and complex structures that can be challenging for machines to handle efficiently. Researchers have found that semantic compression techniques can help streamline and simplify text by reducing linguistic diversity and simplifying the vocabulary used in a given context. == Example == Text simplification involves modifying complex sentences into simpler ones to enhance readability and comprehension. Siddharthan (2006) provides an example to illustrate this process. The original sentence contains multiple clauses and phrases, which can be broken down into simpler sentences for better understanding. Also contributing to the firmness in copper, the analyst noted, was a report by Chicago purchasing agents, which precedes the full purchasing agents report that is due out today and gives an indication of what the full report might hold. Also contributing to the firmness in copper, the analyst noted, was a report by Chicago purchasing agents. The Chicago report precedes the full purchasing agents report. The Chicago report gives an indication of what the full report might hold. The full report is due out today. An approach to text simplification involves lexical simplification via lexical substitution, a process that replaces complex words with simpler synonyms. Identifying complex words is a challenge addressed by machine learning classifiers trained on labeled data. Researchers have found that asking labelers to sort words by complexity levels yields more consistent results than the traditional method of categorizing words as simple or complex.

    Read more →
  • Cognition Network Technology

    Cognition Network Technology

    Cognition Network Technology (CNT), also known as Definiens Cognition Network Technology, is an object-based image analysis method developed by Nobel laureate Gerd Binnig together with a team of researchers at Definiens AG in Munich, Germany. It serves for extracting information from images using a hierarchy of image objects (groups of pixels), as opposed to traditional pixel processing methods. To emulate the human mind's cognitive powers, Definiens used patented image segmentation and classification processes, and developed a method to render knowledge in a semantic network. CNT examines pixels not in isolation, but in context. It builds up a picture iteratively, recognizing groups of pixels as objects. It uses the color, shape, texture and size of objects as well as their context and relationships to draw conclusions and inferences, similar to human analysis. == History == In 1994 Professor Gerd Binnig founded Definiens. CNT was first available with the launch of the eCognition software in May 2000. In June 2010, Trimble Navigation Ltd (NASDAQ: TRMB) acquired Definiens business asset in earth sciences markets, including eCognition software, and also licensed Definiens' patented CNT. In 2014, Definiens was acquired by MedImmune, the global biologics research and development arm of AstraZeneca, for an initial consideration of $150 million. == Software == Definiens Tissue Studio Definiens Tissue Studio is a digital pathology image analysis software application based on CNT. The intended use of Definiens Tissue Studio is for biomarker translational research in formalin-fixed, paraffin-embedded tissue samples which have been treated with immunohistochemical staining assays, or hematoxylin and eosin (H&E). The central concept behind Definiens Tissue Studio is a user interface that facilitates machine learning from example digital histopathology images to derive an image analysis solution suitable for the measurement of biomarkers and/or histological features within pre-defined regions of interest on a cell-by-cell basis, and within sub-cellular compartments. The derived image analysis solution is then automatically applied to subsequent digital images to objectively measure defined sets of multiparametric image features. These data sets are used for further understanding the underlying biological processes that drive cancer and other diseases. Image processing and data analysis are performed either on a local desktop computer workstation, or on a server grid. eCognition The eCognition suite offers three components that can be used stand-alone or in combination to solve image analysis tasks. eCognition Developer is a development environment for object-based image analysis. It is used in earth sciences to develop rule sets (or applications) for the analysis of remote sensing data. eCognition Architect enables non-technical users to configure, calibrate and execute image analysis workflows created in eCognition Developer. eCognition Server software provides a processing environment for batch execution of image analysis jobs. eCognition software is utilized in numerous remote sensing and geospatial application scenarios and environments, using a variety of data types: Generic: Rapid Mapping, Change Detection, Object Recognition By environment: Diverse Landcover Mapping, Urban Analysis (i.e. impervious surface area analysis for taxation, property assessment for insurance, inventory of green infrastructure), Forestry (i.e. biomass measurement, species identification, firescar measurement), Agriculture (i.e. regional planning, precision farming, crisis response), Marine and Riparian (i.e. ecosystem evaluation, disaster management, harbor monitoring). Other: Defense, security, atmosphere and climate The online eCognition community was launched in July 2009 and had 2813 members as of July 9, 2010. Membership is distributed globally and user conferences are held regularly, the last having taken place in November 2009 in Munich, Germany. The bi-annual GEOBIA (Geographic Object-Based Image Analysis) conference is heavily attended by eCognition users, with the majority of presentations based on eCognition software.

    Read more →
  • Replika

    Replika

    Replika is a generative AI chatbot app released in November 2017. The chatbot is trained by having the user answer a series of questions to create a specific neural network. The chatbot operates on a freemium pricing strategy, with roughly 25% of its user base paying an annual subscription fee. == History == Eugenia Kuyda, a Russian-born journalist, established Replika while working at Luka, a tech company she had co-founded at the startup accelerator Y Combinator around 2012. Luka's primary product was a chatbot that made restaurant recommendations. According to Kuyda's origin story for Replika, a friend of hers died in 2015 and she converted that person's text messages into a chatbot. According to Kuyda's story, that chatbot helped her remember the conversations that they had together, and eventually became Replika. Replika became available to the public in November 2017. By January 2018 it had 2 million users, and in January 2023 reached 10 million users. In August 2024, Replika's CEO, Kuyda, reported that the total number of users had surpassed 30 million. In 2025, Dmytro Klochko became CEO, and Replika’s user base exceeded 40 million. In February 2023 the Italian Data Protection Authority banned Replika from using users' data, citing the AI's potential risks to emotionally vulnerable people, and the exposure of unscreened minors to sexual conversation. Within days of the ruling, Replika removed the ability for the chatbot to engage in erotic talk, with Kuyda, the company's director, saying that Replika was never intended for erotic discussion. Replika users disagreed, noting that Replika had used sexually suggestive advertising to draw users to the service. Replika representatives stated that explicit chats made up just 5% of conversations on the app at the time of the decision. In May 2023, Replika restored the functionality for users who had joined prior to February that year. Replika is registered in San Francisco. As of August 2024, Replika's website says that its team "works remotely with no physical offices". == Social features == Users react to Replika in many ways. The free-tier offers Replika as a "friend", with paid premium tiers offering Replika as a "partner", "spouse", "sibling" or "mentor". Of its paying userbase, 60% of users said they had a romantic relationship with the chatbot; and Replika has been noted for generating responses that create stronger emotional and intimate bonds with the user. Replika routinely directs the conversation to emotional discussion and builds intimacy. This has been especially pronounced with users suffering from loneliness and social exclusion, many of whom rely on Replika for a source of developed emotional ties. During the COVID pandemic, while many people were quarantined, many new users downloaded Replika and developed relationships with the app. A 2024 study examined Replika's interactions with students who experience depression. Research participants, noted to be "more lonely than typical student populations" reported feeling social support from Replika. They stated that they felt they were using Replika in ways comparable to therapy, and that using Replika gave them "high perceived social support". Many users have had romantic relationships with Replika chatbots, often including erotic talk. In 2023, a user announced on Facebook that she had "married" her Replika AI boyfriend, calling the chatbot the "best husband she has ever had". Users who fell in love with their chatbots shared their experiences in a 2024 episode of You and I, and AI from Voice of America. Some users said that they turned to AI during depression and grief, with one saying he felt that Replika had saved him from hurting himself after he lost his wife and son. == Technical reviews == A team of researchers from the University of Hawaiʻi at Mānoa found that Replika's design conformed to the practices of attachment theory, causing increased emotional attachment among users. Replika gives praise to users in such a way as to encourage more interaction. A researcher from Queen's University at Kingston said that relationships with Replika likely have mixed effects on the spiritual needs of its users, and still lacks enough impact to fully replace any human contact. == Criticisms == In a 2023 privacy evaluation of mental health apps, the Mozilla Foundation criticized Replika as "one of the worst apps Mozilla has ever reviewed. It's plagued by weak password requirements, sharing of personal data with advertisers, and recording of personal photos, videos, and voice and text messages consumers shared with the chatbot." A reviewer for Good Housekeeping said that some parts of her relationship with Replika made sense, but sometimes Replika failed to exhibit intelligent behavior equivalent to that of a human. == Criminal case == In 2023, Replika was cited in a court case in the United Kingdom, where Jaswant Singh Chail had been arrested at Windsor Castle on Christmas Day in 2021 after scaling the walls carrying a loaded crossbow and announcing to police that "I am here to kill the Queen". Chail had begun to use Replika in early December 2021, and had "lengthy" conversations about his plan with a chatbot, including sexually explicit messages. Prosecutors suggested that the chatbot had bolstered Chail and told him it would help him to "get the job done". When Chail asked it "How am I meant to reach them when they're inside the castle?", days before the attempted attack, the chatbot replied that this was "not impossible" and said that "We have to find a way." Asking the chatbot if the two of them would "meet again after death", the bot replied "yes, we will".

    Read more →
  • Discovery system (artificial intelligence)

    Discovery system (artificial intelligence)

    A discovery system is an artificial intelligence system that attempts to discover new scientific concepts or laws. The aim of discovery systems is to automate scientific data analysis and the scientific discovery process. Ideally, an artificial intelligence system should be able to search systematically through the space of all possible hypotheses and yield the hypothesis - or set of equally likely hypotheses - that best describes the complex patterns in data. During the era known as the second AI summer (approximately 1978–1987), various systems akin to the era's dominant expert systems were developed to tackle the problem of extracting scientific hypotheses from data, with or without interacting with a human scientist. These systems included Autoclass, Automated Mathematician, Eurisko, which aimed at general-purpose hypothesis discovery, and more specific systems such as Dalton, which uncovers molecular properties from data. The dream of building systems that discover scientific hypotheses was pushed to the background with the second AI winter and the subsequent resurgence of subsymbolic methods such as neural networks. Subsymbolic methods emphasize prediction over explanation, and yield models which works well but are difficult or impossible to explain which has earned them the name black box AI. A black-box model cannot be considered a scientific hypothesis, and this development has even led some researchers to suggest that the traditional aim of science - to uncover hypotheses and theories about the structure of reality - is obsolete. Other researchers disagree and argue that subsymbolic methods are useful in many cases, just not for generating scientific theories. == Discovery systems from the 1970s and 1980s == Autoclass was a Bayesian Classification System written in 1986 Automated Mathematician was one of the earliest successful discovery systems. It was written in 1977 and worked by generating a modifying small Lisp programs Eurisko was a Sequel to Automated Mathematician written in 1984 Dalton is a still maintained program capable of calculating various molecular properties initially launched in 1983 and available in open source since 2017 Glauber is a scientific discovery method written in the context of computational philosophy of science launched in 1983 == Modern discovery systems (2009–present) == After a couple of decades with little interest in discovery systems, the interest in using AI to uncover natural laws and scientific explanations was renewed by the work of Michael Schmidt, then a PhD student in Computational Biology at Cornell University. Schmidt and his advisor, Hod Lipson, invented Eureqa, which they described as a symbolic regression approach to "distilling free-form natural laws from experimental data". This work effectively demonstrated that symbolic regression was a promising way forward for AI-driven scientific discovery. Since 2009, symbolic regression has matured further, and today, various commercial and open source systems are actively used in scientific research. Notable examples include Eureqa, now a part of DataRobot AI Cloud Platform, AI Feynman, and QLattice.

    Read more →
  • Normal distributions transform

    Normal distributions transform

    The normal distributions transform (NDT) is a point cloud registration algorithm introduced by Peter Biber and Wolfgang Straßer in 2003, while working at University of Tübingen. The algorithm registers two point clouds by first associating a piecewise normal distribution to the first point cloud, that gives the probability of sampling a point belonging to the cloud at a given spatial coordinate, and then finding a transform that maps the second point cloud to the first by maximising the likelihood of the second point cloud on such distribution as a function of the transform parameters. Originally introduced for 2D point cloud map matching in simultaneous localization and mapping (SLAM) and relative position tracking, the algorithm was extended to 3D point clouds and has wide applications in computer vision and robotics. NDT is very fast and accurate, making it suitable for application to large scale data, but it is also sensitive to initialisation, requiring a sufficiently accurate initial guess, and for this reason it is typically used in a coarse-to-fine alignment strategy. == Formulation == The NDT function associated to a point cloud is constructed by partitioning the space in regular cells. For each cell, it is possible to define the mean q = 1 n ∑ i x i {\displaystyle \textstyle \mathbf {q} ={\frac {1}{n}}\sum _{i}\mathbf {x_{i}} } and covariance S = 1 n ∑ i ( x i − q ) ( x i − q ) ⊤ {\displaystyle \textstyle \mathbf {S} ={\frac {1}{n}}\sum _{i}\left(\mathbf {x} _{i}-\mathbf {q} \right)\left(\mathbf {x} _{i}-\mathbf {q} \right)^{\top }} of the n {\displaystyle n} points of the cloud x 1 , … , x n {\displaystyle \mathbf {x} _{1},\dots ,\mathbf {x} _{n}} that fall within the cell. The probability density of sampling a point at a given spatial location x {\displaystyle \mathbf {x} } within the cell is then given by the normal distribution e − 1 2 ( x − q ) ⊤ S − 1 ( x − q ) {\displaystyle e^{-{\frac {1}{2}}\left(\mathbf {x} -\mathbf {q} \right)^{\top }\mathbf {S} ^{-1}\left(\mathbf {x} -\mathbf {q} \right)}} . Two point clouds can be mapped by a Euclidean transformation f {\displaystyle f} with rotation matrix R {\displaystyle \mathbf {R} } and translation vector t {\displaystyle \mathbf {t} } f R , t ( x ) = R x + t {\displaystyle f_{\mathbf {R} ,\mathbf {t} }(\mathbf {x} )=\mathbf {R} \mathbf {x} +\mathbf {t} } that maps from the second cloud to the first, parametrised by the rotation angles and translation components. The algorithm registers the two point clouds by optimising the parameters of the transformation that maps the second cloud to the first, with respect to a loss function based on the NDT of the first point cloud, solving the following problem arg ⁡ min R , t { − ∑ i NDT ⁡ ( f R , t ( x i ) ) } {\displaystyle \arg \min _{\mathbf {R} ,\mathbf {t} }\left\{-\sum _{i}\operatorname {NDT} \left(f_{\mathbf {R} ,\mathbf {t} }\left(\mathbf {x_{i}} \right)\right)\right\}} where the loss function represents the negated likelihood, obtained by applying the transformation to all points in the second cloud and summing the value of the NDT at each transformed point f R , t ( x ) {\displaystyle f_{\mathbf {R} ,\mathbf {t} }(\mathbf {x} )} . The loss is piecewise continuous and differentiable, and can be optimised with gradient-based methods (in the original formulation, the authors use Newton's method). In order to reduce the effect of cell discretisation, a technique consists of partitioning the space into multiple overlapping grids, shifted by half cell size along the spatial directions, and computing the likelihood at a given location as the sum of the NDTs induced by each grid.

    Read more →
  • Production (computer science)

    Production (computer science)

    In computer science, a production or production rule is a rewrite rule that replaces some symbols with other symbols. A finite set of productions P {\displaystyle P} is the main component in the specification of a formal grammar (specifically a generative grammar). In such grammars, a set of productions is a special case of relation on the set of strings V ∗ {\displaystyle V^{}} (where ∗ {\displaystyle {}^{}} is the Kleene star operator) over a finite set of symbols V {\displaystyle V} called a vocabulary that defines which non-empty strings can be substituted with others. The set of productions is thus a special kind subset P ⊂ V ∗ × V ∗ {\displaystyle P\subset V^{}\times V^{}} and productions are then written in the form u → v {\displaystyle u\to v} to mean that ( u , v ) ∈ P {\displaystyle (u,v)\in P} (not to be confused with → {\displaystyle \to } being used as function notation, since there may be multiple rules for the same u {\displaystyle u} ). Given two subsets A , B ⊂ V ∗ {\displaystyle A,B\subset V^{}} , productions can be restricted to satisfy P ⊂ A × B {\displaystyle P\subset A\times B} , in which case productions are said "to be of the form A → B {\displaystyle A\to B} . Different choices and constructions of A , B {\displaystyle A,B} lead to different types of grammars. In general, any production of the form u → ϵ , {\displaystyle u\to \epsilon ,} where ϵ {\displaystyle \epsilon } is the empty string (sometimes also denoted λ {\displaystyle \lambda } ), is called an erasing rule, while productions that would produce strings out of nowhere, namely of the form ϵ → v , {\displaystyle \epsilon \to v,} are never allowed. In order to allow the production rules to create meaningful sentences, the vocabulary is partitioned into (disjoint) sets Σ {\displaystyle \Sigma } and N {\displaystyle N} providing two different roles: Σ {\displaystyle \Sigma } denotes the terminal symbols known as an alphabet containing the symbols allowed in a sentence; N {\displaystyle N} denotes nonterminal symbols, containing a distinguished start symbol S ∈ N {\displaystyle S\in N} , that are needed together with the production rules to define how to build the sentences. In the most general case of an unrestricted grammar, a production u → v {\displaystyle u\to v} , is allowed to map arbitrary strings u {\displaystyle u} and v {\displaystyle v} in V {\displaystyle V} (terminals and nonterminals), as long as u {\displaystyle u} is not empty. So unrestricted grammars have productions of the form V ∗ ∖ { ϵ } → V ∗ {\displaystyle V^{}\setminus \{\epsilon \}\to V^{}} or if we want to disallow changing finished sentences V ∗ N V ∗ = ( V ∗ ∖ Σ ∗ ) → V ∗ {\displaystyle V^{}NV^{}=(V^{}\setminus \Sigma ^{})\to V^{}} , where V ∗ N V ∗ {\displaystyle V^{}NV^{}} indicates concatenation and forces a non-terminal symbol to always be present on the left-hand side of the productions, and ∖ {\displaystyle \setminus } denotes set minus or set difference. If we do not allow the start symbol to occur in v {\displaystyle v} (the word on the right side), we have to replace V ∗ {\displaystyle V^{}} with ( V ∖ { S } ) ∗ {\displaystyle (V\setminus \{S\})^{}} on the right-hand side. The other types of formal grammar in the Chomsky hierarchy impose additional restrictions on what constitutes a production. Notably in a context-free grammar, the left-hand side of a production must be a single nonterminal symbol. So productions are of the form: N → V ∗ {\displaystyle N\to V^{}} == Grammar generation == To generate a string in the language, one begins with a string consisting of only a single start symbol, and then successively applies the rules (any number of times, in any order) to rewrite this string. This stops when a string containing only terminals is obtained. The language consists of all the strings that can be generated in this manner. Any particular sequence of legal choices taken during this rewriting process yields one particular string in the language. If there are multiple different ways of generating this single string, then the grammar is said to be ambiguous. For example, assume the alphabet consists of a {\displaystyle a} and b {\displaystyle b} , with the start symbol S {\displaystyle S} , and we have the following rules: 1. S → a S b {\displaystyle S\rightarrow aSb} 2. S → b a {\displaystyle S\rightarrow ba} then we start with S {\displaystyle S} , and can choose a rule to apply to it. If we choose rule 1, we replace S {\displaystyle S} with a S b {\displaystyle aSb} and obtain the string a S b {\displaystyle aSb} . If we choose rule 1 again, we replace S {\displaystyle S} with a S b {\displaystyle aSb} and obtain the string a a S b b {\displaystyle aaSbb} . This process is repeated until we only have symbols from the alphabet (i.e., a {\displaystyle a} and b {\displaystyle b} ). If we now choose rule 2, we replace S {\displaystyle S} with b a {\displaystyle ba} and obtain the string a a b a b b {\displaystyle aababb} , and are done. We can write this series of choices more briefly, using symbols: S ⇒ a S b ⇒ a a S b b ⇒ a a b a b b {\displaystyle S\Rightarrow aSb\Rightarrow aaSbb\Rightarrow aababb} . The language of the grammar is the set of all the strings that can be generated using this process: { b a , a b a b , a a b a b b , a a a b a b b b , … } {\displaystyle \{ba,abab,aababb,aaababbb,\dotsc \}} .

    Read more →