AI Detector Xero

AI Detector Xero — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Anomaly Detection at Multiple Scales

    Anomaly Detection at Multiple Scales

    Anomaly Detection at Multiple Scales, or ADAMS was a $35 million DARPA project designed to identify patterns and anomalies in very large data sets. It is under DARPA's Information Innovation office and began in 2011 and ended in August 2014 The project was intended to detect and prevent insider threats such as "a soldier in good mental health becoming homicidal or suicidal", an "innocent insider becoming malicious", or "a government employee [who] abuses access privileges to share classified information". Specific cases mentioned are Nadal Malik Hasan and WikiLeaks source Chelsea Manning. Commercial applications may include finance. The intended recipients of the system output are operators in the counterintelligence agencies. A final report was published on May 11, 2015, detailing a system known as Anomaly Detection Engine for Networks, or ADEN, developed by the University of Maryland, College Park, whose goal was to "identify malicious users within a network." Using multiple datasets from Wikipedia, Slashdot, and others, researchers were able to identify vandals and malicious users on a website using both conventional algorithms and artificial intelligence. The Proactive Discovery of Insider Threats Using Graph Analysis and Learning was part of the ADAMS project. The Georgia Tech team includes noted high-performance computing researcher David Bader (computer scientist).

    Read more →
  • Image formation

    Image formation

    The study of image formation encompasses the radiometric and geometric processes by which 2D images of 3D objects are formed. In the case of digital images, the image formation process also includes analog to digital conversion and sampling. == Imaging == The imaging process is a mapping of an object to an image plane. Each point on the image corresponds to a point on the object. An illuminated object will scatter light toward a lens and the lens will collect and focus the light to create the image. The ratio of the height of the image to the height of the object is the magnification. The spatial extent of the image surface and the focal length of the lens determines the field of view of the lens. Image formation of mirror these have a center of curvature and its focal length of the mirror is half of the center of curvature. == Illumination == An object may be illuminated by the light from an emitting source such as the sun, a light bulb or a Light Emitting Diode. The light incident on the object is reflected in a manner dependent on the surface properties of the object. For rough surfaces, the reflected light is scattered in a manner described by the Bi-directional Reflectance Distribution Function (BRDF) of the surface. The BRDF of a surface is the ratio of the exiting power per square meter per steradian (radiance) to the incident power per square meter (irradiance). The BRDF typically varies with angle and may vary with wavelength, but a specific important case is a surface that has constant BRDF. This surface type is referred to as Lambertian and the magnitude of the BRDF is R/π, where R is the reflectivity of the surface. The portion of scattered light that propagates toward the lens is collected by the entrance pupil of the imaging lens over the field of view. == Field of view and imagery == The Field of view of a lens is limited by the size of the image plane and the focal length of the lens. The relationship between a location on the image and a location on the object is y = ftan(θ), where y is the max extent of the image plane, f is the focal length of the lens and θ is the field of view. If y is the max radial size of the image then θ is the field of view of the lens. While the image created by a lens is continuous, it can be modeled as a set of discrete field points, each representing a point on the object. The quality of the image is limited by the aberrations in the lens and the diffraction created by the finite aperture stop. == Pupils and stops == The aperture stop of a lens is a mechanical aperture which limits the light collection for each field point. The entrance pupil is the image of the aperture stop created by the optical elements on the object side of the lens. The light scattered by an object is collected by the entrance pupil and focused onto the image plane via a series of refractive elements. The cone of the focused light at the image plane is set by the size of the entrance pupil and the focal length of the lens. This is often referred to as the f-stop or f-number of the lens. f/# = f/D where D is the diameter of the entrance pupil. == Pixelation and color vs. monochrome == In typical digital imaging systems, a sensor is placed at the image plane. The light is focused on to the sensor and the continuous image is pixelated. The light incident on each pixel in the sensor will be integrated within the pixel and a proportional electronic signal will be generated. The angular geometric resolution of a pixel is given by atan(p/f), where p is the pitch of the pixel. This is also called the pixel field of view. The sensor may be monochrome or color. In the case of a monochrome sensor, the light incident on each pixel is integrated and the resulting image is a grayscale like picture. For color images, a mosaic color filter is typically placed over the pixels to create a color image. An example is a Bayer filter. The signal incident on each pixel is then digitized to a bit stream. == Image quality == The quality of an image is dependent upon both geometric and physical items. Geometrically, higher density of pixels across an image will give less blocky pixelation and thus a better geometric image quality. Lens aberrations also contribute to the quality of the image. Physically, diffraction due to the aperture stop will limit the resolvable spatial frequencies as a function of f-number. In the frequency domain, Modulation Transfer Function (MTF) is a measure of the quality of the imaging system. The MTF is a measure of the visibility of a sinusoidal variation in irradiance on the image plane as a function of the frequency of the sinusoid. It includes the effects of diffraction, aberrations and pixelation. For the lens, the MTF is the autocorrelation of the pupil function, so it accounts for the finite pupil extent and the lens aberrations. The sensor MTF is the Fourier Transform of the pixel geometry. For a square pixel, MTF(ξ) = sin(πξp)/πξp where p is the pixel width and ξ is the spatial frequency. The MTF of the combination of the lens and detector is the product of the two component MTFs. == Perception == Color images can be perceived via two means. In the case of computer vision the light incident on the sensor comprises the image. In the case of visual perception, the human eye has a color dependent response to light so this must be accounted for. This is important consideration when converting to grayscale. == Image formation in eye == The principal difference between the lens of the eye and an ordinary optical lens is that the former is flexible. The radius of the curvature of the anterior surface of the lens is greater than the radius of its posterior surface. The shape of the lens is controlled by tension in the fibers of the ciliary body. To focus on distant objects, the controlling muscles cause the lens to be relatively flattened. Similarly, these muscles allow the lens to become thicker in order to focus on objects near the eye. The distance between the center of the lens and the retina (focal length) varies from approximately 17 mm to about 14 mm, as the refractive power of the lens increases from its minimum to its maximum. When the eye focuses on an object farther away than about 3 m, the lens exhibits its lowest refractive power. When the eye focuses on a close object, the lens is most strongly refractive.

    Read more →
  • Attensity

    Attensity

    Attensity was an American company that provided social analytics and engagement applications for social customer relationship management (social CRM). Attensity's text analytics software applications extracted facts, relationships and sentiment from unstructured data. == History == Attensity was founded in 2000. An early investor in Attensity was In-Q-Tel, which funds technology to support the missions of the US Government and the broader DOD. InTTENSITY, an independent company that has combined Inxight with Attensity Software (the only joint development project that combines two InQTel funded software packages), was the exclusive distributor and outlet for Attensity in the Federal Market. In 2009, Attensity Corp., then based in Palo Alto, merged with Germany's Empolis and Living-e AG to form Attensity Group. In 2010, Attensity Group acquired Biz360, a provider of social media monitoring and market intelligence solutions. In early 2012, Attensity Group divested itself of the Empolis business unit via a management buyout; that unit currently conducts business under its pre-merger name. Attensity Group was a closely held private company. Its majority shareholder was Aeris Capital, a private Swiss investment office advising a high-net-worth individual and his charitable foundation. Foundation Capital, Granite Ventures, and Scale Venture Partners were among Biz360's investors and thus became shareholders in Attensity Group. In February 2016, Attensity's IP assets were acquired by InContact, and Attensity closed.

    Read more →
  • 1.58-bit large language model

    1.58-bit large language model

    A 1.58-bit large language model (also known as a ternary LLM) is a type of large language model (LLM) designed to be computationally efficient. It achieves this by using weights that are restricted to only three values: -1, 0, and +1. This restriction significantly reduces the model's memory footprint and allows for faster processing, as computationally expensive multiplication operations can be replaced with lower-cost additions. This contrasts with traditional models that use 16-bit floating-point numbers (FP16 or BF16) for their weights. Studies have shown that for models up to several billion parameters, the performance of 1.58-bit LLMs on various tasks is comparable to their full-precision counterparts. This approach could enable powerful AI to run on less specialized and lower-power hardware. The name "1.58-bit" comes from the fact that a system with three states contains log 2 ⁡ 3 ≈ 1.58 {\displaystyle \log _{2}3\approx 1.58} bits of information. These models are sometimes also referred to as 1-bit LLMs in research papers, although this term can also refer to true binary models (with weights of -1 and +1). == BitNet == In 2024, Ma et al., researchers at Microsoft, declared that their 1.58-bit model, BitNet b1.58 is comparable in performance to the 16-bit Llama 2 and opens the era of 1-bit LLM. BitNet creators did not use the post-training quantization of weights but instead relied on the new BitLinear transform that replaced the nn.Linear layer of the traditional transformer design. In 2025, Microsoft researchers had released an open-weights and open inference code model BitNet b1.58 2B4T demonstrating performance competitive with the full precision models at 2B parameters and 4T training tokens. == Post-training quantization == BitNet derives its performance from being trained natively in 1.58 bit instead of being quantized from a full-precision model after training. Still, training is an expensive process and it would be desirable to be able to somehow convert an existing model to 1.58 bits. In 2024, HuggingFace reported a way to gradually ramp up the 1.58-bit quantization in fine-tuning an existing model down to 1.58 bits. == Critique == Some researchers point out that the scaling laws of large language models favor the low-bit weights only in case of undertrained models. As the number of training tokens increases, the deficiencies of low-bit quantization surface.

    Read more →
  • Luxafor

    Luxafor

    Luxafor () is a brand of office productivity tools designed to improve efficiency and communication in workplaces. The brands main product is LED status indicators for use in office settings. Luxafor is a product line under the company SIA Greynut, based in Riga, Latvia. == History == Luxafor was developed by the technology company SIA Greynut. The brand first gained attention through a Kickstarter campaign in 2015, which aimed to fund its initial product, the Luxafor Flag. Although the campaign was unsuccessful in reaching its funding goal, the product was still brought to market. In 2017, Luxafor launched another Kickstarter campaign for the Luxafor Bluetooth, a wireless version of its LED status indicator. This campaign also did not meet its funding goal, but like its predecessor, the product was still developed and released. Despite initial setbacks, Luxafor Bluetooth has become one of the brand's leading products. == Products == Luxafors main product range is LED status indicators, including: === Luxafor Flag === A USB-powered LED indicator that shows different colors to signal the user's availability. === Luxafor Bluetooth === A wireless LED indicator controlled via Bluetooth, integrating with productivity tools like Slack and Microsoft Teams. === Luxafor Switch === An advanced status indicator designed to manage room and workspace availability. === Other === Other Luxafor products include CO2 Dongle, Smart Button, Mute Button, Pomodoro Timer and others. == Features == Luxafor products are known for their customizable indicators, integration capabilities with IFTTT, Zapier, and remote control features. They are compatible with various operating systems, including Windows and macOS, and can be integrated with numerous communication and productivity platforms, like Microsoft Teams and Cisco Jabber.

    Read more →
  • Image analysis

    Image analysis

    Image analysis or imagery analysis is the extraction of meaningful information from images; mainly from digital images by means of digital image processing techniques. Image analysis tasks can be as simple as reading bar coded tags or as sophisticated as identifying a person from their face. Computers are indispensable for the analysis of large amounts of data, for tasks that require complex computation, or for the extraction of quantitative information. On the other hand, the human visual cortex is an excellent image analysis apparatus, especially for extracting higher-level information, and for many applications — including medicine, security, and remote sensing — human analysts still cannot be replaced by computers. For this reason, many important image analysis tools such as edge detectors and neural networks are inspired by human visual perception models. == Digital == Digital Image Analysis or Computer Image Analysis is when a computer or electrical device automatically studies an image to obtain useful information from it. Note that the device is often a computer but may also be an electrical circuit, a digital camera or a mobile phone. It involves the fields of computer or machine vision, and medical imaging, and makes heavy use of pattern recognition, digital geometry, and signal processing. This field of computer science developed in the 1950s at academic institutions such as the MIT A.I. Lab, originally as a branch of artificial intelligence and robotics. It is the quantitative or qualitative characterization of two-dimensional (2D) or three-dimensional (3D) digital images. 2D images are, for example, to be analyzed in computer vision, and 3D images in medical imaging. The field was established in the 1950s—1970s, for example with pioneering contributions by Azriel Rosenfeld, Herbert Freeman, Jack E. Bresenham, or King-Sun Fu. == Techniques == There are many different techniques used in automatically analysing images. Each technique may be useful for a small range of tasks, however there still aren't any known methods of image analysis that are generic enough for wide ranges of tasks, compared to the abilities of a human's image analysing capabilities. Examples of image analysis techniques in different fields include: 2D and 3D object recognition, image segmentation, motion detection e.g. Single particle tracking, video tracking, optical flow, medical scan analysis, 3D Pose Estimation. == Deep learning == Since the early 2010s, deep learning methods have substantially advanced the field of image analysis. In 2012, a deep convolutional neural network (CNN) known as AlexNet achieved a significant reduction in error rates on the ImageNet large-scale image classification benchmark, demonstrating the effectiveness of deep learning for visual recognition tasks. Subsequent architectures such as ResNet introduced residual connections that enabled training of much deeper networks, further improving accuracy across image analysis tasks. Real-time object detection became practical with frameworks such as YOLO (You Only Look Once), which unified detection and classification into a single network pass. In 2020, the Vision Transformer (ViT) demonstrated that transformer architectures, originally developed for natural language processing, could achieve competitive results on image classification when applied directly to sequences of image patches. More recently, foundation models trained on large-scale datasets have enabled zero-shot generalisation across image analysis tasks. The Segment Anything Model (SAM), trained on over one billion masks, can segment arbitrary objects in images without task-specific fine-tuning. These advances have made image analysis techniques increasingly accessible through browser-based tools and open-source implementations. == Applications == The applications of digital image analysis are continuously expanding through all areas of science and industry, including: anatomy, allows for precise measurements, visualization, and statistical analysis of anatomical structures. assay micro plate reading, such as detecting where a chemical was manufactured. astronomy, such as calculating the size of a planet. automated species identification (e.g. plant and animal species) defense error level analysis filtering machine vision, such as to automatically count items in a factory conveyor belt. materials science, such as determining if a metal weld has cracks. medicine, such as detecting cancer in a mammography scan. metallography, such as determining the mineral content of a rock sample. microscopy, such as counting the germs in a swab. automatic number plate recognition; optical character recognition, such as automatic license plate detection. remote sensing, such as detecting intruders in a house, and producing land cover/land use maps. robotics, such as to avoid steering into an obstacle. security, such as detecting a person's eye color or hair color. == Object-based == Object-based image analysis (OBIA) involves two typical processes, segmentation and classification. Segmentation helps to group pixels into homogeneous objects. The objects typically correspond to individual features of interest, although over-segmentation or under-segmentation is very likely. Classification then can be performed at object levels, using various statistics of the objects as features in the classifier. Statistics can include geometry, context and texture of image objects. Over-segmentation is often preferred over under-segmentation when classifying high-resolution images. Object-based image analysis has been applied in many fields, such as cell biology, medicine, earth sciences, and remote sensing. For example, it can detect changes of cellular shapes in the process of cell differentiation.; it has also been widely used in the mapping community to generate land cover. When applied to earth images, OBIA is known as geographic object-based image analysis (GEOBIA), defined as "a sub-discipline of geoinformation science devoted to (...) partitioning remote sensing (RS) imagery into meaningful image-objects, and assessing their characteristics through spatial, spectral and temporal scale". The international GEOBIA conference has been held biannually since 2006. OBIA techniques are implemented in software such as eCognition or the Orfeo toolbox.

    Read more →
  • Graph cut optimization

    Graph cut optimization

    Graph cut optimization is a combinatorial optimization method applicable to a family of functions of discrete variables, named after the concept of cut in the theory of flow networks. Thanks to the max-flow min-cut theorem, determining the minimum cut over a graph representing a flow network is equivalent to computing the maximum flow over the network. Given a pseudo-Boolean function f {\displaystyle f} , if it is possible to construct a flow network with positive weights such that each cut C {\displaystyle C} of the network can be mapped to an assignment of variables x {\displaystyle \mathbf {x} } to f {\displaystyle f} (and vice versa), and the cost of C {\displaystyle C} equals f ( x ) {\displaystyle f(\mathbf {x} )} (up to an additive constant) then it is possible to find the global optimum of f {\displaystyle f} in polynomial time by computing a minimum cut of the graph. The mapping between cuts and variable assignments is done by representing each variable with one node in the graph and, given a cut, each variable will have a value of 0 if the corresponding node belongs to the component connected to the source, or 1 if it belong to the component connected to the sink. Not all pseudo-Boolean functions can be represented by a flow network, and in the general case the global optimization problem is NP-hard. There exist sufficient conditions to characterise families of functions that can be optimised through graph cuts, such as submodular quadratic functions. Graph cut optimization can be extended to functions of discrete variables with a finite number of values, that can be approached with iterative algorithms with strong optimality properties, computing one graph cut at each iteration. Graph cut optimization is an important tool for inference over graphical models such as Markov random fields or conditional random fields, and it has applications in computer vision problems such as image segmentation, denoising, registration and stereo matching. == Representability == A pseudo-Boolean function f : { 0 , 1 } n → R {\displaystyle f:\{0,1\}^{n}\to \mathbb {R} } is said to be representable if there exists a graph G = ( V , E ) {\displaystyle G=(V,E)} with non-negative weights and with source and sink nodes s {\displaystyle s} and t {\displaystyle t} respectively, and there exists a set of nodes V 0 = { v 1 , … , v n } ⊂ V − { s , t } {\displaystyle V_{0}=\{v_{1},\dots ,v_{n}\}\subset V-\{s,t\}} such that, for each tuple of values ( x 1 , … , x n ) ∈ { 0 , 1 } n {\displaystyle (x_{1},\dots ,x_{n})\in \{0,1\}^{n}} assigned to the variables, f ( x 1 , … , x n ) {\displaystyle f(x_{1},\dots ,x_{n})} equals (up to a constant) the value of the flow determined by a minimum cut C = ( S , T ) {\displaystyle C=(S,T)} of the graph G {\displaystyle G} such that v i ∈ S {\displaystyle v_{i}\in S} if x i = 0 {\displaystyle x_{i}=0} and v i ∈ T {\displaystyle v_{i}\in T} if x i = 1 {\displaystyle x_{i}=1} . It is possible to classify pseudo-Boolean functions according to their order, determined by the maximum number of variables contributing to each single term. All first order functions, where each term depends upon at most one variable, are always representable. Quadratic functions f ( x ) = w 0 + ∑ i w i ( x i ) + ∑ i < j w i j ( x i , x j ) . {\displaystyle f(\mathbf {x} )=w_{0}+\sum _{i}w_{i}(x_{i})+\sum _{i 0 {\displaystyle p>0} then w i j k ( x i , x j , x k ) = w i j k ( 0 , 0 , 0 ) + p 1 ( x i − 1 ) + p 2 ( x j − 1 ) + p 3 ( x k − 1 ) + p 23 ( x j − 1 ) x k + p 31 x i ( x k − 1 ) + p 12 ( x i − 1 ) x j − p x i x j x k {\displaystyle w_{ijk}(x_{i},x_{j},x_{k})=w_{ijk}(0,0,0)+p_{1}(x_{i}-1)+p_{2}(x_{j}-1)+p_{3}(x_{k}-1)+p_{23}(x_{j}-1)x_{k}+p_{31}x_{i}(x_{k}-1)+p_{12}(x_{i}-1)x_{j}-px_{i}x_{j}x_{k}} with p 1 = w i j k ( 1 , 0 , 1 ) − w i j k ( 0 , 0 , 1 ) p 2 = w i j k ( 1 , 1 , 0 ) − w i j k ( 1 , 0 , 1 ) p 3 = w i j k ( 0 , 1 , 1 ) − w i j k ( 0 , 1 , 0 ) p 23 = w i j k ( 0 , 0 , 1 ) + w i j k ( 0 , 1 , 0 ) − w i j k ( 0 , 0 , 0 ) − w i j k ( 0 , 1 , 1 ) p 31 = w i j k ( 0 , 0 , 1 ) + w i j k ( 1 , 0 , 0 ) − w i j k ( 0 , 0 , 0 ) − w i j k ( 1 , 0 , 1 ) p 12 = w i j k ( 0 , 1 , 0 ) + w i j k ( 1 , 0 , 0 ) − w i j k ( 0 , 0 , 0 ) − w i j k ( 1 , 1 , 0 ) . {\displaystyle {\begin{aligned}p_{1}&=w_{ijk}(1,0,1)-w_{ijk}(0,0,1)\\p_{2}&=w_{ijk}(1,1,0)-w_{ijk}(1,0,1)\\p_{3}&=w_{ijk}(0,1,1)-w_{ijk}(0,1,0)\\p_{23}&=w_{ijk}(0,0,1)+w_{ijk}(0,1,0)-w_{ijk}(0,0,0)-w_{ijk}(0,1,1)\\p_{31}&=w_{ijk}(0,0,1)+w_{ijk}(1,0,0)-w_{ijk}(0,0,0)-w_{ijk}(1,0,1)\\p_{12}&=w_{ijk}(0,1,0)+w_{ijk}(1,0,0)-w_{ijk}(0,0,0)-w_{ijk}(1,1

    Read more →
  • DBGallery

    DBGallery

    DBGallery, short for Database Gallery, is a cloud-based Software as a Service (SaaS) and on-prem webserver for teams of various sizes. DBGallery enables users to centrally store, manage, catalog, archive, and securely share image, video, and document files. It facilitates version control, detects duplicates, and offers an intuitive and advanced search functionality, making assets easily accessible to all users. It takes advantage of current AI technologies to automatically add significant metadata to images, facilitates custom-trained AI models, and offers bespoke AI features. Additionally, DBGallery provides team management tools, workflow management, an activity audit trail, and other collaborative features that foster a productive environment for both internal and external stakeholders. == History == DBGallery's first public release was December 2007. Since then each year has seen continuous enhancements. 2013 added support for additional non-English languages in its meta-data. 2014 added support for creating custom data fields for tagging and search. In 2015 included the ability to auto-tag images using Reverse Geocoding. 2018 added artificial intelligence (AI) image recognition as a further addition to auto-tagging. March 2020 added complete image collection management via the web (e.g. file and folder drag and drop), a new collection dashboard, custom data layouts, and an improved audit trail. 2021 saw user experience improvements provided by improved styling and performance enhancements. Version 12 was released in October 2021. It added the ability to upload unlimited file sizes and made significant performance improvements for very large collections. June 2022 saw the release of a global duplicate images search. In late 2022, DBGallery began offering significantly reduced cloud storage cost, at a third of its previous prices, which played into its recent high-volume/high-capacity capabilities and its clients' subsequent demand for additional storage. 2023 saw improvements in user and role management, introduced it's mobile app (PWA), and improved custom-trained object detection. Release 14.0 in the spring of 2024 had large sharing improvements and a new find related images feature. Winter 2025's v15 release introduced AI-generated image descriptions, image-to-text, and facial recognition.

    Read more →
  • Zero-shot learning

    Zero-shot learning

    Zero-shot learning (ZSL) is a problem setup in deep learning where, at test time, a learner observes samples from classes which were not observed during training, and needs to predict the class that they belong to. The name is a play on words based on the earlier concept of one-shot learning, in which classification can be learned from only one, or a few, examples. Zero-shot methods generally work by associating observed and non-observed classes through some form of auxiliary information, which encodes observable distinguishing properties of objects. For example, given a set of images of animals to be classified, along with auxiliary textual descriptions of what animals look like, an artificial intelligence model which has been trained to recognize horses, but has never been given a zebra, can still recognize a zebra when it also knows that zebras look like striped horses. This problem is widely studied in computer vision, natural language processing, and machine perception. == Background and history == The first paper on zero-shot learning in natural language processing appeared in a 2008 paper by Chang, Ratinov, Roth, and Srikumar, at the AAAI'08, but the name given to the learning paradigm there was dataless classification. The first paper on zero-shot learning in computer vision appeared at the same conference, under the name zero-data learning. The term zero-shot learning itself first appeared in the literature in a 2009 paper from Palatucci, Hinton, Pomerleau, and Mitchell at NIPS'09. This terminology was repeated later in another computer vision paper and the term zero-shot learning caught on, as a take-off on one-shot learning that was introduced in computer vision years earlier. In computer vision, zero-shot learning models learned parameters for seen classes along with their class representations and rely on representational similarity among class labels so that, during inference, instances can be classified into new classes. In natural language processing, the key technical direction developed builds on the ability to "understand the labels"—represent the labels in the same semantic space as that of the documents to be classified. This supports the classification of a single example without observing any annotated data, the purest form of zero-shot classification. The original paper made use of the Explicit Semantic Analysis (ESA) representation but later papers made use of other representations, including dense representations. This approach was also extended to multilingual domains, fine entity typing and other problems. Moreover, beyond relying solely on representations, the computational approach has been extended to depend on transfer from other tasks, such as textual entailment and question answering. The original paper also points out that, beyond the ability to classify a single example, when a collection of examples is given, with the assumption that they come from the same distribution, it is possible to bootstrap the performance in a semi-supervised like manner (or transductive learning). Unlike standard generalization in machine learning, where classifiers are expected to correctly classify new samples to classes they have already observed during training, in ZSL, no samples from the classes have been given during training the classifier. It can therefore be viewed as an extreme case of domain adaptation. == Prerequisite information for zero-shot classes == Naturally, some form of auxiliary information has to be given about these zero-shot classes, and this type of information can be of several types. Learning with attributes: classes are accompanied by pre-defined structured description. For example, for bird descriptions, this could include "red head", "long beak". These attributes are often organized in a structured compositional way, and taking that structure into account improves learning. While this approach was used mostly in computer vision, there are some examples for it also in natural language processing. Learning from textual description. As pointed out above, this has been the key direction pursued in natural language processing. Here class labels are taken to have a meaning and are often augmented with definitions or free-text natural-language description. This could include for example a wikipedia description of the class. Class-class similarity. Here, classes are embedded in a continuous space. A zero-shot classifier can predict that a sample corresponds to some position in that space, and the nearest embedded class is used as a predicted class, even if no such samples were observed during training. == Generalized zero-shot learning == The above ZSL setup assumes that at test time, only zero-shot samples are given, namely, samples from new unseen classes. In generalized zero-shot learning, samples from both new and known classes, may appear at test time. This poses new challenges for classifiers at test time, because it is very challenging to estimate if a given sample is new or known. Some approaches to handle this include: a gating module, which is first trained to decide if a given sample comes from a new class or from an old one, and then, at inference time, outputs either a hard decision, or a soft probabilistic decision a generative module, which is trained to generate feature representation of the unseen classes—a standard classifier can then be trained on samples from all classes, seen and unseen. == Domains of application == Zero shot learning has been applied to the following fields: image classification semantic segmentation image generation object detection natural language processing computational biology abstract reasoning

    Read more →
  • Live Transcribe

    Live Transcribe

    Live Transcribe is a mobile app for real-time captioning, developed by Google for the Android operating system. Development on the application began in partnership with Gallaudet University. It was publicly released as a free beta for Android 5.0+ on the Google Play Store on February 4, 2019. As of early 2023 it had been downloaded over 500 million times. == Development == Researchers Dimitri Kanevsky, Sagar Savla and Chet Gnegy at Google developed the app in collaboration with researchers at Gallaudet University, an American university for the education of the deaf and hard of hearing. The app uses machine learning to generate captions, similar to YouTube's auto-generated captions. In August 2019, Google made Live Transcribe an open-source project. == Features == The app uses speech recognition to generate live captions in over 80 languages with varying accuracy. The app, which requires connection to the Internet to function, is available to download on the Google Play Store. A later update to the app displayed information on sounds such as clapping, laughter, music, applause, and whistling. In May 2020, the app started supporting transcription in Albanian, Burmese, Estonian, Macedonian, Mongolian, Punjabi, and Uzbek, supporting 70 languages. In March 2022, the app was updated with support to transcribe offline, without Internet connection, so long as the appropriate language pack has been installed. The offline mode is only available for devices with 6GB of RAM and certain Google Pixel devices.

    Read more →
  • Zero-shot learning

    Zero-shot learning

    Zero-shot learning (ZSL) is a problem setup in deep learning where, at test time, a learner observes samples from classes which were not observed during training, and needs to predict the class that they belong to. The name is a play on words based on the earlier concept of one-shot learning, in which classification can be learned from only one, or a few, examples. Zero-shot methods generally work by associating observed and non-observed classes through some form of auxiliary information, which encodes observable distinguishing properties of objects. For example, given a set of images of animals to be classified, along with auxiliary textual descriptions of what animals look like, an artificial intelligence model which has been trained to recognize horses, but has never been given a zebra, can still recognize a zebra when it also knows that zebras look like striped horses. This problem is widely studied in computer vision, natural language processing, and machine perception. == Background and history == The first paper on zero-shot learning in natural language processing appeared in a 2008 paper by Chang, Ratinov, Roth, and Srikumar, at the AAAI'08, but the name given to the learning paradigm there was dataless classification. The first paper on zero-shot learning in computer vision appeared at the same conference, under the name zero-data learning. The term zero-shot learning itself first appeared in the literature in a 2009 paper from Palatucci, Hinton, Pomerleau, and Mitchell at NIPS'09. This terminology was repeated later in another computer vision paper and the term zero-shot learning caught on, as a take-off on one-shot learning that was introduced in computer vision years earlier. In computer vision, zero-shot learning models learned parameters for seen classes along with their class representations and rely on representational similarity among class labels so that, during inference, instances can be classified into new classes. In natural language processing, the key technical direction developed builds on the ability to "understand the labels"—represent the labels in the same semantic space as that of the documents to be classified. This supports the classification of a single example without observing any annotated data, the purest form of zero-shot classification. The original paper made use of the Explicit Semantic Analysis (ESA) representation but later papers made use of other representations, including dense representations. This approach was also extended to multilingual domains, fine entity typing and other problems. Moreover, beyond relying solely on representations, the computational approach has been extended to depend on transfer from other tasks, such as textual entailment and question answering. The original paper also points out that, beyond the ability to classify a single example, when a collection of examples is given, with the assumption that they come from the same distribution, it is possible to bootstrap the performance in a semi-supervised like manner (or transductive learning). Unlike standard generalization in machine learning, where classifiers are expected to correctly classify new samples to classes they have already observed during training, in ZSL, no samples from the classes have been given during training the classifier. It can therefore be viewed as an extreme case of domain adaptation. == Prerequisite information for zero-shot classes == Naturally, some form of auxiliary information has to be given about these zero-shot classes, and this type of information can be of several types. Learning with attributes: classes are accompanied by pre-defined structured description. For example, for bird descriptions, this could include "red head", "long beak". These attributes are often organized in a structured compositional way, and taking that structure into account improves learning. While this approach was used mostly in computer vision, there are some examples for it also in natural language processing. Learning from textual description. As pointed out above, this has been the key direction pursued in natural language processing. Here class labels are taken to have a meaning and are often augmented with definitions or free-text natural-language description. This could include for example a wikipedia description of the class. Class-class similarity. Here, classes are embedded in a continuous space. A zero-shot classifier can predict that a sample corresponds to some position in that space, and the nearest embedded class is used as a predicted class, even if no such samples were observed during training. == Generalized zero-shot learning == The above ZSL setup assumes that at test time, only zero-shot samples are given, namely, samples from new unseen classes. In generalized zero-shot learning, samples from both new and known classes, may appear at test time. This poses new challenges for classifiers at test time, because it is very challenging to estimate if a given sample is new or known. Some approaches to handle this include: a gating module, which is first trained to decide if a given sample comes from a new class or from an old one, and then, at inference time, outputs either a hard decision, or a soft probabilistic decision a generative module, which is trained to generate feature representation of the unseen classes—a standard classifier can then be trained on samples from all classes, seen and unseen. == Domains of application == Zero shot learning has been applied to the following fields: image classification semantic segmentation image generation object detection natural language processing computational biology abstract reasoning

    Read more →
  • Referring expression generation

    Referring expression generation

    Referring expression generation (REG) is the subtask of natural language generation (NLG) that received most scholarly attention. While NLG is concerned with the conversion of non-linguistic information into natural language, REG focuses only on the creation of referring expressions (noun phrases) that identify specific entities called targets. This task can be split into two sections. The content selection part determines which set of properties distinguish the intended target and the linguistic realization part defines how these properties are translated into natural language. A variety of algorithms have been developed in the NLG community to generate different types of referring expressions. == Types of referring expressions == A referring expression (RE), in linguistics, is any noun phrase, or surrogate for a noun phrase, whose function in discourse is to identify some individual object (thing, being, event...) The technical terminology for identify differs a great deal from one school of linguistics to another. The most widespread term is probably refer, and a thing identified is a referent, as for example in the work of John Lyons. In linguistics, the study of reference relations belongs to pragmatics, the study of language use, though it is also a matter of great interest to philosophers, especially those wishing to understand the nature of knowledge, perception and cognition more generally. Various devices can be used for reference: determiners, pronouns, proper names... Reference relations can be of different kinds; referents can be in a "real" or imaginary world, in discourse itself, and they may be singular, plural, or collective. === Pronouns === The simplest type of referring expressions are pronoun such as he and it. The linguistics and natural language processing communities have developed various models for predicting anaphor referents, such as centering theory, and ideally referring-expression generation would be based on such models. However most NLG systems use much simpler algorithms, for example using a pronoun if the referent was mentioned in the previous sentence (or sentential clause), and no other entity of the same gender was mentioned in this sentence. === Definite noun phrases === There has been a considerable amount of research on generating definite noun phrases, such as the big red book. Much of this builds on the model proposed by Dale and Reiter. This has been extended in various ways, for example Krahmer et al. present a graph-theoretic model of definite NP generation with many nice properties. In recent years a shared-task event has compared different algorithms for definite NP generation, using the TUNA corpus. === Spatial and temporal reference === Recently there has been more research on generating referring expressions for time and space. Such references tend to be imprecise (what is the exact meaning of tonight?), and also to be interpreted in different ways by different people. Hence it may be necessary to explicitly reason about false positive vs false negative tradeoffs, and even calculate the utility of different possible referring expressions in a particular task context. === Criteria for good expressions === Ideally, a good referring expression should satisfy a number of criteria: Referential success: It should unambiguously identify the referent to the reader. Ease of comprehension: The reader should be able to quickly read and understand it. Computational complexity: The generation algorithm should be fast No false inferences: The expression should not confuse or mislead the reader by suggesting false implicatures or other pragmatic inferences. For example, a reader may be confused if he is told Sit by the brown wooden table in a context where there is only one table. == History == === Pre-2000 era === REG goes back to the early days of NLG. One of the first approaches was done by Winograd in 1972 who developed an "incremental" REG algorithm for his SHRDLU program. Afterwards researchers started to model the human abilities to create referring expressions in the 1980s. This new approach to the topic was influenced by the researchers Appelt and Kronfeld who created the programs KAMP and BERTRAND and considered referring expressions as parts of bigger speech acts. Some of their most interesting findings were the fact that referring expressions can be used to add information beyond the identification of the referent as well as the influence of communicative context and the Gricean maxims on referring expressions. Furthermore, its skepticism concerning the naturalness of minimal descriptions made Appelt and Kronfeld's research a foundation of later work on REG. The search for simple, well-defined problems changed the direction of research in the early 1990s. This new approach was led by Dale and Reiter who stressed the identification of the referent as the central goal. Like Appelt they discuss the connection between the Gricean maxims and referring expressions in their culminant paper in which they also propose a formal problem definition. Furthermore, Reiter and Dale discuss the Full Brevity and Greedy Heuristics algorithms as well as their Incremental Algorithm(IA) which became one of the most important algorithms in REG. === Later developments === After 2000 the research began to lift some of the simplifying assumptions, that had been made in early REG research in order to create more simple algorithms. Different research groups concentrated on different limitations creating several expanded algorithms. Often these extend the IA in a single perspective for example in relation to: Reference to Sets like "the t-shirt wearers" or "the green apples and the banana on the left" Relational Descriptions like "the cup on the table" or "the woman who has three children" Context Dependency, Vagueness and Gradeability include statements like "the older man" or "the car on the left" which are often unclear without a context Salience and Generation of Pronouns are highly discourse dependent making for example "she" a reference to "the (most salient) female person" Many simplifying assumptions are still in place or have just begun to be worked on. Also a combination of the different extensions has yet to be done and is called a "non-trivial enterprise" by Krahmer and van Deemter. Another important change after 2000 was the increasing use of empirical studies in order to evaluate algorithms. This development took place due to the emergence of transparent corpora. Although there are still discussions about what the best evaluation metrics are, the use of experimental evaluation has already led to a better comparability of algorithms, a discussion about the goals of REG and more task-oriented research. Furthermore, research has extended its range to related topics such as the choice of Knowledge Representation(KR) Frameworks. In this area the main question, which KR framework is most suitable for the use in REG remains open. The answer to this question depends on how well descriptions can be expressed or found. A lot of the potential of KR frameworks has been left unused so far. Some of the different approaches are the usage of: Graph search which treats relations between targets in the same way as properties. Constraint Satisfaction which allows for a separation between problem specification and the implementation. Modern Knowledge Representation which offers logical inference in for example Description Logic or Conceptual Graphs. == Problem definition == Dale and Reiter (1995) think about referring expressions as distinguishing descriptions. They define: The referent as the entity that should be described The context set as set of salient entities The contrast set or potential distractors as all elements of the context set except the referent A property as a reference to a single attribute–value pair Each entity in the domain can be characterised as a set of attribute–value pairs for example ⟨ {\displaystyle \langle } type, dog ⟩ {\displaystyle \rangle } , ⟨ {\displaystyle \langle } gender, female ⟩ {\displaystyle \rangle } or ⟨ {\displaystyle \langle } age, 10 years ⟩ {\displaystyle \rangle } . The problem then is defined as follows: Let r {\displaystyle r} be the intended referent, and C {\displaystyle C} be the contrast set. Then, a set L {\displaystyle L} of attribute–value pairs will represent a distinguishing description if the following two conditions hold: Every attribute–value pair in L {\displaystyle L} applies to r {\displaystyle r} : that is, every element of L {\displaystyle L} specifies an attribute–value that r {\displaystyle r} possesses. For every member c {\displaystyle c} of C {\displaystyle C} , there is at least one element l {\displaystyle l} of L {\displaystyle L} that does not apply to c {\displaystyle c} : that is, there is an l {\displaystyle l} in L {\displaystyle L} that specifies an attribute–value that c {\displaystyle c} does not possess. l {\displaystyle l} is said

    Read more →
  • Zoho Office Suite

    Zoho Office Suite

    Zoho Office Suite is an online office suite developed by Zoho Corporation. == History == Zoho Office Suite was launched in 2005 with a web-based word processor. Additional products, such as those for spreadsheets and presentations, were incorporated later into the suite. The applications are distributed as software as a service (SaaS). == Products == Zoho uses an open API for its Writer, Sheet, Show, Creator, Meeting, and Planner products. It also has plugins into Microsoft Word and Excel, an OpenOffice.org plugin, and a plugin for Firefox. Zoho Office Suite is free for individuals but offers a plan for teams, which includes Zoho WorkDrive, Zoho Workplace and other Zoho apps. In October 2009, Zoho integrated some of their applications with the Google Apps online suite.

    Read more →
  • Open information extraction

    Open information extraction

    In natural language processing, open information extraction (OIE) is the task of generating a structured, machine-readable representation of the information in text, usually in the form of triples or n-ary propositions. == Overview == A proposition can be understood as truth-bearer, a textual expression of a potential fact (e.g., "Dante wrote the Divine Comedy"), represented in an amenable structure for computers [e.g., ("Dante", "wrote", "Divine Comedy")]. An OIE extraction normally consists of a relation and a set of arguments. For instance, ("Dante", "passed away in" "Ravenna") is a proposition formed by the relation "passed away in" and the arguments "Dante" and "Ravenna". The first argument is usually referred as the subject while the second is considered to be the object. The extraction is said to be a textual representation of a potential fact because its elements are not linked to a knowledge base. Furthermore, the factual nature of the proposition has not yet been established. In the above example, transforming the extraction into a full fledged fact would first require linking, if possible, the relation and the arguments to a knowledge base. Second, the truth of the extraction would need to be determined. In computer science transforming OIE extractions into ontological facts is known as relation extraction. In fact, OIE can be seen as the first step to a wide range of deeper text understanding tasks such as relation extraction, knowledge-base construction, question answering, semantic role labeling. The extracted propositions can also be directly used for end-user applications such as structured search (e.g., retrieve all propositions with "Dante" as subject). OIE was first introduced by TextRunner developed at the University of Washington Turing Center headed by Oren Etzioni. Other methods introduced later such as Reverb, OLLIE, ClausIE or CSD helped to shape the OIE task by characterizing some of its aspects. At a high level, all of these approaches make use of a set of patterns to generate the extractions. Depending on the particular approach, these patterns are either hand-crafted or learned. == OIE systems and contributions == Reverb suggested the necessity to produce meaningful relations to more accurately capture the information in the input text. For instance, given the sentence "Faust made a pact with the devil", it would be erroneous to just produce the extraction ("Faust", "made", "a pact") since it would not be adequately informative. A more precise extraction would be ("Faust", "made a pact with", "the devil"). Reverb also argued against the generation of overspecific relations. OLLIE stressed two important aspects for OIE. First, it pointed to the lack of factuality of the propositions. For instance, in a sentence like "If John studies hard, he will pass the exam", it would be inaccurate to consider ("John", "will pass", "the exam") as a fact. Additionally, the authors indicated that an OIE system should be able to extract non-verb mediated relations, which account for significant portion of the information expressed in natural language text. For instance, in the sentence "Obama, the former US president, was born in Hawaii", an OIE system should be able to recognize a proposition ("Obama", "is", "former US president"). ClausIE introduced the connection between grammatical clauses, propositions, and OIE extractions. The authors stated that as each grammatical clause expresses a proposition, each verb mediated proposition can be identified by solely recognizing the set of clauses expressed in each sentence. This implies that to correctly recognize the set of propositions in an input sentence, it is necessary to understand its grammatical structure. The authors studied the case in the English language that only admits seven clause types, meaning that the identification of each proposition only requires defining seven grammatical patterns. The finding also established a separation between the recognition of the propositions and its materialization. In a first step, the proposition can be identified without any consideration of its final form, in a domain-independent and unsupervised way, mostly based on linguistic principles. In a second step, the information can be represented according to the requirements of the underlying application, without conditioning the identification phase. Consider the sentence "Albert Einstein was born in Ulm and died in Princeton". The first step will recognize the two propositions ("Albert Einstein", "was born", "in Ulm") and ("Albert Einstein", "died", "in Princeton"). Once the information has been correctly identified, the propositions can take the particular form required by the underlying application [e.g., ("Albert Einstein", "was born in", "Ulm") and ("Albert Einstein", "died in", "Princeton")]. CSD introduced the idea of minimality in OIE. It considers that computers can make better use of the extractions if they are expressed in a compact way. This is especially important in sentences with subordinate clauses. In these cases, CSD suggests the generation of nested extractions. For example, consider the sentence "The Embassy said that 6,700 Americans were in Pakistan". CSD generates two extractions [i] ("6,700 Americans", "were", "in Pakistan") and [ii] ("The Embassy", "said", "that [i]"). This is usually known as reification.

    Read more →
  • Superquadrics

    Superquadrics

    In mathematics, the superquadrics or super-quadrics (also superquadratics) are a family of geometric shapes defined by formulas that resemble those of ellipsoids and other quadrics, except that the squaring operations are replaced by arbitrary powers. They can be seen as the three-dimensional relatives of the superellipses. The term may refer to the solid object or to its surface, depending on the context. The equations below specify the surface; the solid is specified by replacing the equality signs by less-than-or-equal signs. The superquadrics include many shapes that resemble cubes, octahedra, cylinders, lozenges and spindles, with rounded or sharp corners. Because of their flexibility and relative simplicity, they are popular geometric modeling tools, especially in computer graphics. It becomes an important geometric primitive widely used in computer vision, robotics, and physical simulation. Some authors, such as Alan Barr, define "superquadrics" as including both the superellipsoids and the supertoroids. In modern computer vision literatures, superquadrics and superellipsoids are used interchangeably, since superellipsoids are the most representative and widely utilized shape among all the superquadrics. Comprehensive coverage of geometrical properties of superquadrics and methods of their recovery from range images and point clouds are covered in several computer vision literatures. == Formulas == === Implicit equation === The surface of the basic superquadric is given by | x | r + | y | s + | z | t = 1 {\displaystyle \left|x\right|^{r}+\left|y\right|^{s}+\left|z\right|^{t}=1} where r, s, and t are positive real numbers that determine the main features of the superquadric. Namely: less than 1: a pointy octahedron modified to have concave faces and sharp edges. exactly 1: a regular octahedron. between 1 and 2: an octahedron modified to have convex faces, blunt edges and blunt corners. exactly 2: a sphere greater than 2: a cube modified to have rounded edges and corners. infinite (in the limit): a cube Each exponent can be varied independently to obtain combined shapes. For example, if r=s=2, and t=4, one obtains a solid of revolution which resembles an ellipsoid with round cross-section but flattened ends. This formula is a special case of the superellipsoid's formula if (and only if) r = s. If any exponent is allowed to be negative, the shape extends to infinity. Such shapes are sometimes called super-hyperboloids. The basic shape above spans from -1 to +1 along each coordinate axis. The general superquadric is the result of scaling this basic shape by different amounts A, B, C along each axis. Its general equation is | x A | r + | y B | s + | z C | t = 1. {\displaystyle \left|{\frac {x}{A}}\right|^{r}+\left|{\frac {y}{B}}\right|^{s}+\left|{\frac {z}{C}}\right|^{t}=1.} === Parametric description === Parametric equations in terms of surface parameters u and v (equivalent to longitude and latitude if m equals 2) are x ( u , v ) = A g ( v , 2 r ) g ( u , 2 r ) y ( u , v ) = B g ( v , 2 s ) f ( u , 2 s ) z ( u , v ) = C f ( v , 2 t ) − π 2 ≤ v ≤ π 2 , − π ≤ u < π , {\displaystyle {\begin{aligned}x(u,v)&{}=Ag\left(v,{\frac {2}{r}}\right)g\left(u,{\frac {2}{r}}\right)\\y(u,v)&{}=Bg\left(v,{\frac {2}{s}}\right)f\left(u,{\frac {2}{s}}\right)\\z(u,v)&{}=Cf\left(v,{\frac {2}{t}}\right)\\&-{\frac {\pi }{2}}\leq v\leq {\frac {\pi }{2}},\quad -\pi \leq u<\pi ,\end{aligned}}} where the auxiliary functions are f ( ω , m ) = sgn ⁡ ( sin ⁡ ω ) | sin ⁡ ω | m g ( ω , m ) = sgn ⁡ ( cos ⁡ ω ) | cos ⁡ ω | m {\displaystyle {\begin{aligned}f(\omega ,m)&{}=\operatorname {sgn}(\sin \omega )\left|\sin \omega \right|^{m}\\g(\omega ,m)&{}=\operatorname {sgn}(\cos \omega )\left|\cos \omega \right|^{m}\end{aligned}}} and the sign function sgn(x) is sgn ⁡ ( x ) = { − 1 , x < 0 0 , x = 0 + 1 , x > 0. {\displaystyle \operatorname {sgn}(x)={\begin{cases}-1,&x<0\\0,&x=0\\+1,&x>0.\end{cases}}} === Spherical product === Barr introduces the spherical product which given two plane curves produces a 3D surface. If f ( μ ) = ( f 1 ( μ ) f 2 ( μ ) ) , g ( ν ) = ( g 1 ( ν ) g 2 ( ν ) ) {\displaystyle f(\mu )={\begin{pmatrix}f_{1}(\mu )\\f_{2}(\mu )\end{pmatrix}},\quad g(\nu )={\begin{pmatrix}g_{1}(\nu )\\g_{2}(\nu )\end{pmatrix}}} are two plane curves then the spherical product is h ( μ , ν ) = f ( μ ) ⊗ g ( ν ) = ( f 1 ( μ ) g 1 ( ν ) f 1 ( μ ) g 2 ( ν ) f 2 ( μ ) ) {\displaystyle h(\mu ,\nu )=f(\mu )\otimes g(\nu )={\begin{pmatrix}f_{1}(\mu )\ g_{1}(\nu )\\f_{1}(\mu )\ g_{2}(\nu )\\f_{2}(\mu )\end{pmatrix}}} This is similar to the typical parametric equation of a sphere: x = x 0 + r sin ⁡ θ cos ⁡ φ y = y 0 + r sin ⁡ θ sin ⁡ φ ( 0 ≤ θ ≤ π , 0 ≤ φ < 2 π ) z = z 0 + r cos ⁡ θ {\displaystyle {\begin{aligned}x&=x_{0}+r\sin \theta \;\cos \varphi \\y&=y_{0}+r\sin \theta \;\sin \varphi \qquad (0\leq \theta \leq \pi ,\;0\leq \varphi <2\pi )\\z&=z_{0}+r\cos \theta \end{aligned}}} which give rise to the name spherical product. Barr uses the spherical product to define quadric surfaces, like ellipsoids, and hyperboloids as well as the torus, superellipsoid, superquadric hyperboloids of one and two sheets, and supertoroids. == Plotting code == The following GNU Octave code generates a mesh approximation of a superquadric:

    Read more →