AI Grammar Remover

AI Grammar Remover — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Figure AI

    Figure AI

    Figure AI, Inc. is an American robotics company developing humanoid robots that operate via artificial intelligence. The company was founded in 2022 by Brett Adcock. As of late 2025, the company has a $39 billion valuation. Three generations of humanoid robots (Figure 01–03) have been developed, as well as two iterations of a vision-language-action model (Helix 01–02), which can control up to two robots at once. By 2026, the robots demonstrated the potential ability to perform household work and the company gained publicity when a Figure 03 appeared at a White House event. == History == Figure AI was founded in 2022 by Brett Adcock, also known for founding Archer Aviation and Vettery. That year, the company introduced its prototype, Figure 01, a bipedal robot designed for manual labor, initially targeting the logistics and warehousing sectors. The initial model utilized external cabling for easier maintenance. In May 2023, Figure AI raised $70 million from investors including Adcock, who invested $20 million, and Parkway Venture Capital. In January 2024, Figure AI announced a partnership with BMW to deploy humanoid robots in automotive manufacturing facilities. In February 2024, Figure AI secured $675 million in venture capital funding from a consortium that includes Jeff Bezos, Microsoft, Nvidia, Intel, and the startup-funding divisions of Amazon and OpenAI; the company was then valued at $2.6 billion. Figure AI also announced a partnership with OpenAI, which would build specialized artificial intelligence (AI) models for Figure AI's humanoid robots, enabling its robots to process language; the collaboration ended after a year, with Adcock stating that large language models had become a smaller problem compared to those allowing for "high rate robot control". In August 2024, the company introduced Figure 02, describing it as the next step toward deploying humanoids for industrial use. The machine has 35 degrees of freedom (DOF), while the five-fingered hands have 16 DOF and the ability to carry up to 25 kilograms (55 lb). The model is equipped with cabling integrated into the limbs, a torso-placed battery, six RGB cameras, and an onboard vision-language-action (VLA) model. It has three times the computing power (including inference AI) of the previous model, including two graphics processing units, supported by Nvidia. Microphones, speakers, and custom AI models (developed with OpenAI) enable communication with humans. In early 2025, Figure AI announced BotQ, a manufacturing facility aiming to produce 12,000 humanoids per year with the help of its own humanoid robots, and Helix, a VLA model that can control up to two robots at once. Helix enables a robot to interact with the world without extensive manual training, according to the company allowing it to pick up nearly any small household object. By April, the company issued cease-and-desist letters to at least two secondary brokers promoting its private stock without authorization. In September, a third round of financing exceeded $1 billion, raising the company's total valuation to $39 billion. Investors included Brookfield Asset Management, Intel, Macquarie Capital, Nvidia, Parkway Venture Capital, Qualcomm, Salesforce, and T-Mobile. In October 2025, Figure 03 was introduced. According to the company, its hardware and software redesign aims to create a general-purpose robot able to learn directly from humans. An upgraded camera system delivers twice the frame rate, a quarter the latency, and a 60% wider field of view, in addition to a camera in each hand. Tactile sensors in the fingertips can detect forces as little as 3 grams (0.1 oz). It incorporates soft materials and a protected battery for safety, and removable, washable textiles. It supports wireless inductive charging. In November 2025, the former head of product safety sued the company on the basis of being fired for raising the concern that the company's robots were strong enough to fracture a human skull. By early 2026, Figure 02 had been used in demonstrations showing that it could load a washing machine, sort packages, and fold laundry. That January, Helix 02 was released, expanding the AI model to the entire body to allow for functional autonomy. A Helix 02–powered Figure 02 was shown to be capable of loading and unloading a dishwasher, based on hours of motion-capture data and simulation-based machine learning. In March, U.S. First Lady Melania Trump appeared at the White House with a Figure 03, promoting the presumptive eventual ability of AI to teach children. In May 2026, Figure AI livestreamed a group of their robots processing packages nonstop for almost a week, inspiring a 10-hour competition between their robot and a human, in which the robot performed 98.5% as well as the human.

    Read more →
  • Normal distributions transform

    Normal distributions transform

    The normal distributions transform (NDT) is a point cloud registration algorithm introduced by Peter Biber and Wolfgang Straßer in 2003, while working at University of Tübingen. The algorithm registers two point clouds by first associating a piecewise normal distribution to the first point cloud, that gives the probability of sampling a point belonging to the cloud at a given spatial coordinate, and then finding a transform that maps the second point cloud to the first by maximising the likelihood of the second point cloud on such distribution as a function of the transform parameters. Originally introduced for 2D point cloud map matching in simultaneous localization and mapping (SLAM) and relative position tracking, the algorithm was extended to 3D point clouds and has wide applications in computer vision and robotics. NDT is very fast and accurate, making it suitable for application to large scale data, but it is also sensitive to initialisation, requiring a sufficiently accurate initial guess, and for this reason it is typically used in a coarse-to-fine alignment strategy. == Formulation == The NDT function associated to a point cloud is constructed by partitioning the space in regular cells. For each cell, it is possible to define the mean q = 1 n ∑ i x i {\displaystyle \textstyle \mathbf {q} ={\frac {1}{n}}\sum _{i}\mathbf {x_{i}} } and covariance S = 1 n ∑ i ( x i − q ) ( x i − q ) ⊤ {\displaystyle \textstyle \mathbf {S} ={\frac {1}{n}}\sum _{i}\left(\mathbf {x} _{i}-\mathbf {q} \right)\left(\mathbf {x} _{i}-\mathbf {q} \right)^{\top }} of the n {\displaystyle n} points of the cloud x 1 , … , x n {\displaystyle \mathbf {x} _{1},\dots ,\mathbf {x} _{n}} that fall within the cell. The probability density of sampling a point at a given spatial location x {\displaystyle \mathbf {x} } within the cell is then given by the normal distribution e − 1 2 ( x − q ) ⊤ S − 1 ( x − q ) {\displaystyle e^{-{\frac {1}{2}}\left(\mathbf {x} -\mathbf {q} \right)^{\top }\mathbf {S} ^{-1}\left(\mathbf {x} -\mathbf {q} \right)}} . Two point clouds can be mapped by a Euclidean transformation f {\displaystyle f} with rotation matrix R {\displaystyle \mathbf {R} } and translation vector t {\displaystyle \mathbf {t} } f R , t ( x ) = R x + t {\displaystyle f_{\mathbf {R} ,\mathbf {t} }(\mathbf {x} )=\mathbf {R} \mathbf {x} +\mathbf {t} } that maps from the second cloud to the first, parametrised by the rotation angles and translation components. The algorithm registers the two point clouds by optimising the parameters of the transformation that maps the second cloud to the first, with respect to a loss function based on the NDT of the first point cloud, solving the following problem arg ⁡ min R , t { − ∑ i NDT ⁡ ( f R , t ( x i ) ) } {\displaystyle \arg \min _{\mathbf {R} ,\mathbf {t} }\left\{-\sum _{i}\operatorname {NDT} \left(f_{\mathbf {R} ,\mathbf {t} }\left(\mathbf {x_{i}} \right)\right)\right\}} where the loss function represents the negated likelihood, obtained by applying the transformation to all points in the second cloud and summing the value of the NDT at each transformed point f R , t ( x ) {\displaystyle f_{\mathbf {R} ,\mathbf {t} }(\mathbf {x} )} . The loss is piecewise continuous and differentiable, and can be optimised with gradient-based methods (in the original formulation, the authors use Newton's method). In order to reduce the effect of cell discretisation, a technique consists of partitioning the space into multiple overlapping grids, shifted by half cell size along the spatial directions, and computing the likelihood at a given location as the sum of the NDTs induced by each grid.

    Read more →
  • Predictive text

    Predictive text

    Predictive text is an input technology used where one key or button represents many letters, such as on the physical numeric keypads of mobile phones and in accessibility technologies. Each key press results in a prediction rather than repeatedly sequencing through the same group of "letters" it represents, in the same, invariable order. Predictive text could allow for an entire word to be input by a single keypress. Predictive text makes efficient use of fewer device keys to input writing into a text message, an e-mail, an address book, a calendar, and the like. The most widely used, general, predictive text systems are T9, iTap, eZiText, and LetterWise/WordWise. There are many ways to build a device that predicts text, but all predictive text systems have initial linguistic settings that offer predictions that are re-prioritized to adapt to each user. This learning adapts, by way of the device memory, to a user's disambiguating feedback that results in corrective key presses, such as pressing a "next" key to get to the intention. Most predictive text systems have a user database to facilitate this process. Theoretically the number of keystrokes required per desired character in the finished writing is, on average, comparable to using a keyboard. This is approximately true provided that all words used are in its database, punctuation is ignored, and no input mistakes are made when typing or spelling. The theoretical keystrokes per character, KSPC, of a keyboard is KSPC=1.00, and of multi-tap is KSPC=2.03. Eatoni's LetterWise is a predictive multi-tap hybrid, which when operating on a standard telephone keypad achieves KSPC=1.15 for English. The choice of which predictive text system is the best to use involves matching the user's preferred interface style, the user's level of learned ability to operate predictive text software, and the user's efficiency goal. There are various levels of risk in predictive text systems, versus multi-tap systems, because the predicted text that is automatically written provides the speed and mechanical efficiency benefit, which, if the user is not careful to review, results in transmitting misinformation. Predictive text systems take time to learn to use well, and so generally, a device's system has user options to set up the choice of multi-tap or any one of several schools of predictive text methods. == Background == Short message service (SMS) permits a mobile phone user to send text messages (also called messages, SMSes, texts, and txts) as a short message. The most common system of SMS text input is referred to as "multi-tap". Using multi-tap, a key is pressed multiple times to access the list of letters on that key. For instance, pressing the "2" key once displays an "a", twice displays a "b" and three times displays a "c". To enter two successive letters that are on the same key, the user must either pause or hit a "next" button. A user can type by pressing an alphanumeric keypad without looking at the electronic equipment display. Thus, multi-tap is easy to understand and can be used without any visual feedback. However, multi-tap is not very efficient, requiring potentially many keystrokes to enter a single letter. In ideal predictive text entry, all words used are in the dictionary, punctuation is ignored, no spelling mistakes are made, and no typing mistakes are made. The ideal dictionary would include all slang, proper nouns, abbreviations, URLs, foreign-language words and other user-unique words. This ideal circumstance gives predictive text software a reduction in the number of key strokes a user is required to enter a word. The user presses the number corresponding to each letter. As long as the word exists in the predictive text dictionary or is correctly disambiguated by non-dictionary systems, it will appear. For instance, pressing "4663" will typically be interpreted as the word good, provided that a linguistic database in English is currently in use, though alternatives such as home, hood and hoof are also valid interpretations of the sequence of key strokes. The most widely used systems of predictive text are Tegic's T9, Motorola's iTap, and the Eatoni Ergonomics' LetterWise and WordWise. T9 and iTap use dictionaries, but Eatoni Ergonomics' products use a disambiguation process, a set of statistical rules to recreate words from keystroke sequences. All predictive text systems require a linguistic database for every supported input language. == Dictionary vs. non-dictionary systems == Traditional disambiguation works by referencing a dictionary of commonly used words, though Eatoni offers a dictionaryless disambiguation system. In dictionary-based systems, as the user presses the number buttons, an algorithm searches the dictionary for a list of possible words that match the keypress combination and offers up the most probable choice. The user can then confirm the selection and move on, or use a key to cycle through the possible combinations. A non-dictionary system constructs words and other sequences of letters from the statistics of word parts. To attempt predictions of the intended result of keystrokes not yet entered, disambiguation may be combined with a word completion facility. Either system (disambiguation or predictive) may include a user database, which can be further classified as a "learning" system when words or phrases are entered into the user database without direct user intervention. The user database is for storing words or phrases that are not well disambiguated by the pre-supplied database. Some disambiguation systems further attempt to correct spelling, format text or perform other automatic rewrites, with the risky effect of either enhancing or frustrating user efforts to enter text. == History == The predictive text and autocomplete technology was invented out of necessities by Chinese scientists and linguists in the 1950s to solve the input inefficiency of the Chinese typewriter, as the typing process involved finding and selecting thousands of logographic characters on a tray, drastically slowing down the word processing speed. The actuating keys of the Chinese typewriter created by Lin Yutang in the 1940s included suggestions for the characters following the one selected. In 1951, the Chinese typesetter Zhang Jiying arranged Chinese characters in associative clusters, a precursor of modern predictive text entry, and broke speed records by doing so. Predictive entry of text from a telephone keypad has been known at least since the 1970s (Smith and Goodwin, 1971). Predictive text was mainly used to look up names in directories over the phone until mobile phone text messaging came into widespread use. == Example == On a typical phone keypad, if users wished to type the in a "multi-tap" keypad entry system, they would need to: Press 8 (tuv) once to select t. Press 4 (ghi) twice to select h. Press 3 (def) twice to select e. Meanwhile, in a phone with predictive text, they need only: Press 8 once to select the (tuv) group for the first character. Press 4 once to select the (ghi) group for the second character. Press 3 once to select the (def) group for the third character. The system updates the display as each keypress is entered, to show the most probable entry. In this example, prediction reduced the number of button presses from five to three. The effect is even greater with longer words and those composed of letters later in each key's sequence. A dictionary-based predictive system is based on the hope that the desired word is in the dictionary. That hope may be misplaced if the word differs in any way from common usage—in particular, if the word is not spelled or typed correctly, is slang, or is a proper noun. In these cases, some other mechanism must be used to enter the word. Furthermore, the simple dictionary approach fails with agglutinative languages, where a single word does not necessarily represent a single semantic entity. == Companies and products == Predictive text is developed and marketed in a variety of competing products, such as Nuance Communications's T9. Other products include Motorola's iTap; Eatoni Ergonomic's LetterWise (character, rather than word-based prediction); WordWise (word-based prediction without a dictionary); EQ3 (a QWERTY-like layout compatible with regular telephone keypads); Prevalent Devices's Phraze-It; Xrgomics' TenGO (a six-key reduced QWERTY keyboard system); Adaptxt (considers language, context, grammar and semantics); Lightkey (a predictive typing software for Windows); Clevertexting (statistical nature of the language, dictionaryless, dynamic key allocation); and Oizea Type (temporal ambiguity); Intelab's Tauto; WordLogic's Intelligent Input Platform™ (patented, layer-based advanced text prediction, includes multi-language dictionary, spell-check, built-in Web search); Google's Gboard. == Textonyms == Words produced by the same combination of keypresses have been called "textonyms"; also "txtonyms"; or "T9o

    Read more →
  • Noisy text analytics

    Noisy text analytics

    Noisy text analytics is a process of information extraction whose goal is to automatically extract structured or semistructured information from noisy unstructured text data. While Text analytics is a growing and mature field that has great value because of the huge amounts of data being produced, processing of noisy text is gaining in importance because a lot of common applications produce noisy text data. Noisy unstructured text data is found in informal settings such as online chat, text messages, e-mails, message boards, newsgroups, blogs, wikis and web pages. Also, text produced by processing spontaneous speech using automatic speech recognition and printed or handwritten text using optical character recognition contains processing noise. Text produced under such circumstances is typically highly noisy containing spelling errors, abbreviations, non-standard words, false starts, repetitions, missing punctuations, missing letter case information, pause filling words such as “um” and “uh” and other texting and speech disfluencies. Such text can be seen in large amounts in contact centers, chat rooms, optical character recognition (OCR) of text documents, short message service (SMS) text, etc. Documents with historical language can also be considered noisy with respect to today's knowledge about the language. Such text contains important historical, religious, ancient medical knowledge that is useful. The nature of the noisy text produced in all these contexts warrants moving beyond traditional text analysis techniques. == Techniques for noisy text analysis == Missing punctuation and the use of non-standard words can often hinder standard natural language processing tools such as part-of-speech tagging and parsing. Techniques to both learn from the noisy data and then to be able to process the noisy data are only now being developed. == Possible source of noisy text == World Wide Web: Poorly written text is found in web pages, online chat, blogs, wikis, discussion forums, newsgroups. Most of these data are unstructured and the style of writing is very different from, say, well-written news articles. Analysis for the web data is important because they are sources for market buzz analysis, market review, trend estimation, etc. Also, because of the large amount of data, it is necessary to find efficient methods of information extraction, classification, automatic summarization and analysis of these data. Contact centers: This is a general term for help desks, information lines and customer service centers operating in domains ranging from computer sales and support to mobile phones to apparels. On an average a person in the developed world interacts at least once a week with a contact center agent. A typical contact center agent handles over a hundred calls per day. They operate in various modes such as voice, online chat and E-mail. The contact center industry produces gigabytes of data in the form of E-mails, chat logs, voice conversation transcriptions, customer feedback, etc. A bulk of the contact center data is voice conversations. Transcription of these using state of the art automatic speech recognition results in text with 30-40% word error rate. Further, even written modes of communication like online chat between customers and agents and even the interactions over email tend to be noisy. Analysis of contact center data is essential for customer relationship management, customer satisfaction analysis, call modeling, customer profiling, agent profiling, etc., and it requires sophisticated techniques to handle poorly written text. Printed Documents: Many libraries, government organizations and national defence organizations have vast repositories of hard copy documents. To retrieve and process the content from such documents, they need to be processed using Optical Character Recognition. In addition to printed text, these documents may also contain handwritten annotations. OCRed text can be highly noisy depending on the font size, quality of the print etc. It can range from 2-3% word error rates to as high as 50-60% word error rates. Handwritten annotations can be particularly hard to decipher, and error rates can be quite high in their presence. Short Messaging Service (SMS): Language usage over computer mediated discourses, like chats, emails and SMS texts, significantly differs from the standard form of the language. An urge towards shorter message length facilitating faster typing and the need for semantic clarity, shape the structure of this non-standard form known as the texting language.

    Read more →
  • NNDB

    NNDB

    The Notable Names Database (NNDB) is an online database of biographical details of over 40,000 people. Soylent Communications, a sole proprietorship that also hosted the later defunct Rotten.com, describes NNDB as an "intelligence aggregator" of noteworthy persons, highlighting their interpersonal connections. The Rotten.com domain was registered in 1996 by former Apple and Netscape software engineer Thomas E. Dell, who was also known by his internet alias, "Soylent". == Entries == Each entry has an executive summary followed by a brief narrative about their life. It also lists date and cause of death if deceased. Businesspeople and government officials are listed with chronologies of their posts, positions, and board memberships. As of 2022, the site is no longer updated. == NNDB Mapper == The NNDB Mapper, a visual tool for exploring connections between people, was made available in May 2008. It required Adobe Flash 7.

    Read more →
  • Database application

    Database application

    A database application is a computer program whose primary purpose is retrieving information from a computerized database. From here, information can be inserted, modified or deleted which is subsequently conveyed back into the database. Early examples of database applications were accounting systems and airline reservations systems, such as SABRE, developed starting in 1957. A characteristic of modern database applications is that they facilitate simultaneous updates and queries from multiple users. Systems in the 1970s might have accomplished this by having each user in front of a 3270 terminal to a mainframe computer. By the mid-1980s it was becoming more common to give each user a personal computer and have a program running on that PC that is connected to a database server. Information would be pulled from the database, transmitted over a network, and then arranged, graphed, or otherwise formatted by the program running on the PC. Starting in the mid-1990s it became more common to build database applications with a Web interface. Rather than develop custom software to run on a user's PC, the user would use the same Web browser program for every application. A database application with a Web interface had the advantage that it could be used on devices of different sizes, with different hardware, and with different operating systems. Examples of early database applications with Web interfaces include amazon.com, which used the Oracle relational database management system, the photo.net online community, whose implementation on top of Oracle was described in the book Database-Backed Web Sites (Ziff-Davis Press; May 1997), and eBay, also running Oracle. Electronic medical records are referred to on emrexperts.com, in December 2010, as "a software database application". A 2005 O'Reilly book uses the term in its title: Database Applications and the Web. Some of the most complex database applications remain accounting systems, such as SAP, which may contain thousands of tables in only a single module. Many of today's most widely used computer systems are database applications, for example, Facebook, which was built on top of MySQL. The etymology of the phrase "database application" comes from the practice of dividing computer software into systems programs, such as the operating system, compilers, the file system, and tools such as the database management system, and application programs, such as a payroll check processor. On a standard PC running Microsoft Windows, for example, the Windows operating system contains all of the systems programs while games, word processors, spreadsheet programs, photo editing programs, etc. would be application programs. As "application" is short for "application program", "database application" is short for "database application program". Not every program that uses a database would typically be considered a "database application". For example, many physics experiments, e.g., the Large Hadron Collider, generate massive data sets that programs subsequently analyze. The data sets constitute a "database", though they are not typically managed with a standard relational database management system. The computer programs that analyze the data are primarily developed to answer hypotheses, not to put information back into the database and therefore the overall program would not be called a "database application". == Examples of database applications == Amazon Student Data CNN eBay Facebook Fandango Filemaker (Mac OS) LibreOffice Base Microsoft Access Oracle relational database SAP (Systems, Applications & Products in Data Processing) Ticketmaster Wikipedia Yelp YouTube Google MySQL

    Read more →
  • Cognition Network Technology

    Cognition Network Technology

    Cognition Network Technology (CNT), also known as Definiens Cognition Network Technology, is an object-based image analysis method developed by Nobel laureate Gerd Binnig together with a team of researchers at Definiens AG in Munich, Germany. It serves for extracting information from images using a hierarchy of image objects (groups of pixels), as opposed to traditional pixel processing methods. To emulate the human mind's cognitive powers, Definiens used patented image segmentation and classification processes, and developed a method to render knowledge in a semantic network. CNT examines pixels not in isolation, but in context. It builds up a picture iteratively, recognizing groups of pixels as objects. It uses the color, shape, texture and size of objects as well as their context and relationships to draw conclusions and inferences, similar to human analysis. == History == In 1994 Professor Gerd Binnig founded Definiens. CNT was first available with the launch of the eCognition software in May 2000. In June 2010, Trimble Navigation Ltd (NASDAQ: TRMB) acquired Definiens business asset in earth sciences markets, including eCognition software, and also licensed Definiens' patented CNT. In 2014, Definiens was acquired by MedImmune, the global biologics research and development arm of AstraZeneca, for an initial consideration of $150 million. == Software == Definiens Tissue Studio Definiens Tissue Studio is a digital pathology image analysis software application based on CNT. The intended use of Definiens Tissue Studio is for biomarker translational research in formalin-fixed, paraffin-embedded tissue samples which have been treated with immunohistochemical staining assays, or hematoxylin and eosin (H&E). The central concept behind Definiens Tissue Studio is a user interface that facilitates machine learning from example digital histopathology images to derive an image analysis solution suitable for the measurement of biomarkers and/or histological features within pre-defined regions of interest on a cell-by-cell basis, and within sub-cellular compartments. The derived image analysis solution is then automatically applied to subsequent digital images to objectively measure defined sets of multiparametric image features. These data sets are used for further understanding the underlying biological processes that drive cancer and other diseases. Image processing and data analysis are performed either on a local desktop computer workstation, or on a server grid. eCognition The eCognition suite offers three components that can be used stand-alone or in combination to solve image analysis tasks. eCognition Developer is a development environment for object-based image analysis. It is used in earth sciences to develop rule sets (or applications) for the analysis of remote sensing data. eCognition Architect enables non-technical users to configure, calibrate and execute image analysis workflows created in eCognition Developer. eCognition Server software provides a processing environment for batch execution of image analysis jobs. eCognition software is utilized in numerous remote sensing and geospatial application scenarios and environments, using a variety of data types: Generic: Rapid Mapping, Change Detection, Object Recognition By environment: Diverse Landcover Mapping, Urban Analysis (i.e. impervious surface area analysis for taxation, property assessment for insurance, inventory of green infrastructure), Forestry (i.e. biomass measurement, species identification, firescar measurement), Agriculture (i.e. regional planning, precision farming, crisis response), Marine and Riparian (i.e. ecosystem evaluation, disaster management, harbor monitoring). Other: Defense, security, atmosphere and climate The online eCognition community was launched in July 2009 and had 2813 members as of July 9, 2010. Membership is distributed globally and user conferences are held regularly, the last having taken place in November 2009 in Munich, Germany. The bi-annual GEOBIA (Geographic Object-Based Image Analysis) conference is heavily attended by eCognition users, with the majority of presentations based on eCognition software.

    Read more →
  • Gradient vector flow

    Gradient vector flow

    Gradient vector flow (GVF), a computer vision framework introduced by Chenyang Xu and Jerry L. Prince, is the vector field that is produced by a process that smooths and diffuses an input vector field. It is usually used to create a vector field from images that points to object edges from a distance. It is widely used in image analysis and computer vision applications for object tracking, shape recognition, segmentation, and edge detection. In particular, it is commonly used in conjunction with active contour model. == Background == Finding objects or homogeneous regions in images is a process known as image segmentation. In many applications, the locations of object edges can be estimated using local operators that yield a new image called an edge map. The edge map can then be used to guide a deformable model, sometimes called an active contour or a snake, so that it passes through the edge map in a smooth way, therefore defining the object itself. A common way to encourage a deformable model to move toward the edge map is to take the spatial gradient of the edge map, yielding a vector field. Since the edge map has its highest intensities directly on the edge and drops to zero away from the edge, these gradient vectors provide directions for the active contour to move. When the gradient vectors are zero, the active contour will not move, and this is the correct behavior when the contour rests on the peak of the edge map itself. However, because the edge itself is defined by local operators, these gradient vectors will also be zero far away from the edge and therefore the active contour will not move toward the edge when initialized far away from the edge. Gradient vector flow (GVF) is the process that spatially extends the edge map gradient vectors, yielding a new vector field that contains information about the location of object edges throughout the entire image domain. GVF is defined as a diffusion process operating on the components of the input vector field. It is designed to balance the fidelity of the original vector field, so it is not changed too much, with a regularization that is intended to produce a smooth field on its output. Although GVF was designed originally for the purpose of segmenting objects using active contours attracted to edges, it has been since adapted and used for many alternative purposes. Some newer purposes including defining a continuous medial axis representation, regularizing image anisotropic diffusion algorithms, finding the centers of ribbon-like objects, constructing graphs for optimal surface segmentations, creating a shape prior, and much more. == Theory == The theory of GVF was originally described by Xu and Prince. Let f ( x , y ) {\displaystyle \textstyle f(x,y)} be an edge map defined on the image domain. For uniformity of results, it is important to restrict the edge map intensities to lie between 0 and 1, and by convention f ( x , y ) {\displaystyle \textstyle f(x,y)} takes on larger values (close to 1) on the object edges. The gradient vector flow (GVF) field is given by the vector field v ( x , y ) = [ u ( x , y ) , v ( x , y ) ] {\displaystyle \textstyle \mathbf {v} (x,y)=[u(x,y),v(x,y)]} that minimizes the energy functional In this equation, subscripts denote partial derivatives and the gradient of the edge map is given by the vector field ∇ f = ( f x , f y ) {\displaystyle \textstyle \nabla f=(f_{x},f_{y})} . Figure 1 shows an edge map, the gradient of the (slightly blurred) edge map, and the GVF field generated by minimizing E {\displaystyle \textstyle {\mathcal {E}}} . Equation 1 is a variational formulation that has both a data term and a regularization term. The first term in the integrand is the data term. It encourages the solution v {\displaystyle \textstyle \mathbf {v} } to closely agree with the gradients of the edge map since that will make v − ∇ f {\displaystyle \textstyle \mathbf {v} -\nabla f} small. However, this only needs to happen when the edge map gradients are large since v − ∇ f {\displaystyle \textstyle \mathbf {v} -\nabla f} is multiplied by the square of the length of these gradients. The second term in the integrand is a regularization term. It encourages the spatial variations in the components of the solution to be small by penalizing the sum of all the partial derivatives of v {\displaystyle \textstyle \mathbf {v} } . As is customary in these types of variational formulations, there is a regularization parameter μ > 0 {\displaystyle \textstyle \mu >0} that must be specified by the user in order to trade off the influence of each of the two terms. If μ {\displaystyle \textstyle \mu } is large, for example, then the resulting field will be very smooth and may not agree as well with the underlying edge gradients. Theoretical Solution. Finding v ( x , y ) {\displaystyle \textstyle \mathbf {v} (x,y)} to minimize Equation 1 requires the use of calculus of variations since v ( x , y ) {\displaystyle \textstyle \mathbf {v} (x,y)} is a function, not a variable. Accordingly, the Euler equations, which provide the necessary conditions for v {\displaystyle \textstyle \mathbf {v} } to be a solution can be found by calculus of variations, yielding where ∇ 2 {\displaystyle \textstyle \nabla ^{2}} is the Laplacian operator. It is instructive to examine the form of the equations in (2). Each is a partial differential equation that the components u {\displaystyle u} and v {\displaystyle v} of v {\displaystyle \mathbf {v} } must satisfy. If the magnitude of the edge gradient is small, then the solution of each equation is guided entirely by Laplace's equation, for example ∇ 2 u = 0 {\displaystyle \textstyle \nabla ^{2}u=0} , which will produce a smooth scalar field entirely dependent on its boundary conditions. The boundary conditions are effectively provided by the locations in the image where the magnitude of the edge gradient is large, where the solution is driven to agree more with the edge gradients. Computational Solutions. There are two fundamental ways to compute GVF. First, the energy function E {\displaystyle {\mathcal {E}}} itself (1) can be directly discretized and minimized, for example, by gradient descent. Second, the partial differential equations in (2) can be discretized and solved iteratively. The original GVF paper used an iterative approach, while later papers introduced considerably faster implementations such as an octree-based method, a multi-grid method, and an augmented Lagrangian method. In addition, very fast GPU implementations have been developed in Extensions and Advances. GVF is easily extended to higher dimensions. The energy function is readily written in a vector form as which can be solved by gradient descent or by finding and solving its Euler equation. Figure 2 shows an illustration of a three-dimensional GVF field on the edge map of a simple object (see ). The data and regularization terms in the integrand of the GVF functional can also be modified. A modification described in , called generalized gradient vector flow (GGVF) defines two scalar functions and reformulates the energy as While the choices g ( ∇ f | ) = μ {\displaystyle \textstyle g(\nabla f|)=\mu } and h ( | ∇ f | ) = | ∇ f | 2 {\displaystyle \textstyle h(|\nabla f|)=|\nabla f|^{2}} reduce GGVF to GVF, the alternative choices g ( | ∇ f | ) = exp ⁡ { − | ∇ f | / K } {\displaystyle \textstyle g(|\nabla f|)=\exp\{-|\nabla f|/K\}} and h ( ∇ f | ) = 1 − g ( | ∇ f | ) {\displaystyle \textstyle h(\nabla f|)=1-g(|\nabla f|)} , for K {\displaystyle K} a user-selected constant, can improve the tradeoff between the data term and its regularization in some applications. The GVF formulation has been further extended to vector-valued images in where a weighted structure tensor of a vector-valued image is used. A learning based probabilistic weighted GVF extension was proposed in to further improve the segmentation for images with severely cluttered textures or high levels of noise. The variational formulation of GVF has also been modified in motion GVF (MGVF) to incorporate object motion in an image sequence. Whereas the diffusion of GVF vectors from a conventional edge map acts in an isotropic manner, the formulation of MGVF incorporates the expected object motion between image frames. An alternative to GVF called vector field convolution (VFC) provides many of the advantages of GVF, has superior noise robustness, and can be computed very fast. The VFC field v V F C {\displaystyle \textstyle \mathbf {v} _{\mathrm {VFC} }} is defined as the convolution of the edge map f {\displaystyle f} with a vector field kernel k {\displaystyle \mathbf {k} } where The vector field kernel k {\displaystyle \textstyle \mathbf {k} } has vectors that always point toward the origin but their magnitudes, determined in detail by the function m {\displaystyle m} , decrease to zero with increasing distance from the origin. The beauty of VFC is that it can be computed very rapidly using a fast Fourier tra

    Read more →
  • Ibotta

    Ibotta

    Ibotta, Inc. is an American mobile technology company headquartered in Denver, Colorado. Founded in 2011, the company offers cash back rewards on various purchases through its Ibotta Performance Network and direct to consumer app. Ibotta partners with CPG (consumer packaged goods) brands and network publishers to provide these rewards. As of 2024, the company operates solely in the United States. The company's rewards-as-a-service offering, the Ibotta Performance Network, went live in 2022. In August 2019, Ibotta received a $1 billion valuation after its Series D funding, and in 2023, the company surpassed $1.5 billion cash rewards paid to over 50 million consumers since the company's founding. Ibotta became a publicly traded company in April 2024 with a listing on the New York Stock Exchange. As of September 2025, Ibotta is trading at approximately $27.13 per share, marking a 69% decline from its initial public offering price of $88 per share on April 18, 2024. == History == === Founding through early 2019 === Ibotta was founded by current CEO Bryan Leach. The company was incorporated in 2011 and the app launched to both the App Store and Google Play stores in 2012. Early investors included entrepreneur and computer scientist Jim Clark and Tom “TJ” Jermoluk, Chairman of @Home Network. In 2015, Ibotta expanded beyond item level grocery, adding the ability to get cash back on in-store retail purchases. In 2016, in-app mobile commerce began, allowing users to navigate from the Ibotta app to its partners' apps to earn cash back on purchases. In 2016 with a Series C investment, Ibotta had raised over $73 million in funding. In March of that year, Ibotta partnered with Anheuser-Busch to offer cash back for adults who purchased its products. In May, the company partnered with LiveRamp so that companies could use their CRM data to create segmented, personalized campaigns. At the time, the company had around 200 full- and part-time employees and moved from offices in Lower Downtown Denver (LoDo) to a 40,000-square-foot office in the central Denver business district. A year later, the company had to expand to a second floor as it added almost another 100 employees. In 2017, Ibotta added cash back for Uber to its app as well as cash back rewards for online and mobile purchases. In 2018, Ibotta was listed on the Inc. 5,000 list as one of the fastest growing private companies in the U.S. A year later, in January 2019, the Ibotta app had been downloaded more than 30 million times with users receiving a reported $500 million in cash back rewards. That year, Ibotta was the largest mobile company in Colorado with six million monthly active users. === August 2019 to present === In August 2019, Ibotta was valued at $1 billion, following a Series D round of funding. The round was led by Koch Disruptive Technologies, a subsidiary of Koch Industries. 2019 was also the year the company introduced Pay with Ibotta, which allowed users to complete purchases at key retailers on the Ibotta app and earn instant cash back in the process. With that new service, users were able to enter their purchase total and use a QR code to checkout and receive immediate cash back. In 2020, the company partnered with Trees for the Future to plant up to 1 million trees as part of an Earth Month campaign to raise awareness about the waste of unused paper coupons. In response to the COVID-19 pandemic, Ibotta partnered with CPG brands in their “Here to Help” campaign and together committed over $10 million in cash back to American consumers. The company added the ability to earn cash back from online grocery pick-up and delivery orders. Later that year, Ibotta started its free Thanksgiving program, providing users with 100% cash back on select groceries needed for a Thanksgiving meal. By 2022, the company had provided approximately 10 million Thanksgiving meals. In 2021, Ibotta acquired the company OctoShop (originally InStok), a shopping browser extension company. The OctoShop app enables users to compare prices across stores and set restock and price-drop alerts. In April 2022, the Ibotta Performance Network (IPN) was launched. The IPN allows brands to deliver digital offers to consumers through third party publishers. Retailers including Walmart, Dollar General and Family Dollar, food delivery services including Instacart, and convenience stores including Shell are all part of the Ibotta Performance Network. This pay-per-sales or success-based performance network reaches over 200 million consumers. On April 18, 2024, Ibotta had its initial public offering (IPO), trading on the New York Stock Exchange (NYSE) under the ticker symbol IBTA. It was the largest technology IPO in Colorado history. In October 2025, Ibotta announced a partnership with technology and analytics company Circana, integrating Circana's Household Lift measurement into Ibotta campaigns to give CPG brands an increased understanding of the impact of their promotional campaigns. On November 3, 2025, Ibotta launched LiveLift, a tool for companies to measure the return on investment of digital promotions, in order to optimize performance marketing goals. === Athletic partnerships === Ibotta became the official jersey patch partner of the New Orleans Pelicans, a professional men's basketball team in the National Basketball Association (NBA), for the 2020–2021 and 2023–2024 seasons. Ibotta became the official jersey patch partner of the 2023 NBA champion Denver Nuggets baskeetball team beginning in the 2023–2024 season. In March 2023, F1 driver Logan Sargeant, the first U.S. racer to compete in F1 since 2015, partnered with Ibotta. The Ibotta logo was displayed on Sargeant's racing helmet throughout his F1 career. In June 2023, UConn Huskies women's basketball player Paige Bueckers entered into a "name, image, and likeness" (NIL) promotional agreement with Ibotta. According to a press release by Ibotta, the company has agreements with The Brandr Group, which finds NIL opportunities for women college athletes, and the Pearpop social media marketing platform to promote Ibotta. == Legal issues == In April 2025, shareholders filed a class action lawsuit—Fortune v. Ibotta, Inc., in the U.S. District Court for the District of Colorado (Case No. 25-cv-01213)—alleging that the registration statement in connection with Ibotta’s April 2024 initial public offering omitted material information. The complaint claims that, although Ibotta disclosed detailed terms for its contract with Walmart Inc., it failed to warn investors that its agreement with The Kroger Co., its second-largest client, was terminable at will and thus could be canceled without warning, creating a misleading impression of stability.

    Read more →
  • Inception (deep learning architecture)

    Inception (deep learning architecture)

    Inception is a family of convolutional neural network (CNN) for computer vision, introduced by researchers at Google in 2014 as GoogLeNet (later renamed Inception v1). The series was historically important as an early CNN that separates the stem (data ingest), body (data processing), and head (prediction), an architectural design that persists in all modern CNN. == Version history == === Inception v1 === In 2014, a team at Google developed the GoogLeNet architecture, an instance of which won the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The name came from the LeNet of 1998, since both LeNet and GoogLeNet are CNNs. They also called it "Inception" after a "we need to go deeper" internet meme, a phrase from Inception (2010) the film. Because later, more versions were released, the original Inception architecture was renamed again as "Inception v1". The models and the code were released under Apache 2.0 license on GitHub. The Inception v1 architecture is a deep CNN composed of 22 layers. Most of these layers were "Inception modules". The original paper stated that Inception modules are a "logical culmination" of Network in Network and (Arora et al, 2014). Since Inception v1 is deep, it suffered from the vanishing gradient problem. The team solved it by using two "auxiliary classifiers", which are linear-softmax classifiers inserted at 1/3-deep and 2/3-deep within the network, and the loss function is a weighted sum of all three: L = 0.3 L a u x , 1 + 0.3 L a u x , 2 + L r e a l {\displaystyle L=0.3L_{aux,1}+0.3L_{aux,2}+L_{real}} These were removed after training was complete. This was later solved by the ResNet architecture. The architecture consists of three parts stacked on top of one another: The stem (data ingestion): The first few convolutional layers perform data preprocessing to downscale images to a smaller size. The body (data processing): The next many Inception modules perform the bulk of data processing. The head (prediction): The final fully-connected layer and softmax produces a probability distribution for image classification. This structure is used in most modern CNN architectures. === Inception v2 === Inception v2 was released in 2015, in a paper that is more famous for proposing batch normalization. It had 13.6 million parameters. It improves on Inception v1 by adding batch normalization, and removing dropout and local response normalization which they found became unnecessary when batch normalization is used. === Inception v3 === Inception v3 was released in 2016. It improves on Inception v2 by using factorized convolutions. As an example, a single 5×5 convolution can be factored into 3×3 stacked on top of another 3×3. Both has a receptive field of size 5×5. The 5×5 convolution kernel has 25 parameters, compared to just 18 in the factorized version. Thus, the 5×5 convolution is strictly more powerful than the factorized version. However, this power is not necessarily needed. Empirically, the research team found that factorized convolutions help. It also uses a form of dimension-reduction by concatenating the output from a convolutional layer and a pooling layer. As an example, a tensor of size 35 × 35 × 320 {\displaystyle 35\times 35\times 320} can be downscaled by a convolution with stride 2 to 17 × 17 × 320 {\displaystyle 17\times 17\times 320} , and by maxpooling with pool size 2 × 2 {\displaystyle 2\times 2} to 17 × 17 × 320 {\displaystyle 17\times 17\times 320} . These are then concatenated to 17 × 17 × 640 {\displaystyle 17\times 17\times 640} . Other than this, it also removed the lowest auxiliary classifier during training. They found that the auxiliary head worked as a form of regularization. They also proposed label-smoothing regularization in classification. For an image with label c {\displaystyle c} , instead of making the model to predict the probability distribution δ c = ( 0 , 0 , … , 0 , 1 ⏟ c -th entry , 0 , … , 0 ) {\displaystyle \delta _{c}=(0,0,\dots ,0,\underbrace {1} _{c{\text{-th entry}}},0,\dots ,0)} , they made the model predict the smoothed distribution ( 1 − ϵ ) δ c + ϵ / K {\displaystyle (1-\epsilon )\delta _{c}+\epsilon /K} where K {\displaystyle K} is the total number of classes. === Inception v4 === In 2017, the team released Inception v4, Inception ResNet v1, and Inception ResNet v2. Inception v4 is an incremental update with even more factorized convolutions, and other complications that were empirically found to improve benchmarks. Inception ResNet v1 and v2 are both modifications of Inception v4, where residual connections are added to each Inception module, inspired by the ResNet architecture. === Xception === Xception ("Extreme Inception") was published in 2017. It is a linear stack of depthwise separable convolution layers with residual connections. The design was proposed on the hypothesis that in a CNN, the cross-channels correlations and spatial correlations in the feature maps can be entirely decoupled. Training each network took 3 days on 60 K80 GPUs, or approximately 0.5 petaFLOP-days.

    Read more →
  • Semantic neural network

    Semantic neural network

    Semantic neural network (SNN) is based on John von Neumann's neural network [von Neumann, 1966] and Nikolai Amosov M-Network. There are limitations to a link topology for the von Neumann’s network but SNN accept a case without these limitations. Only logical values can be processed, but SNN accept that fuzzy values can be processed too. All neurons into the von Neumann network are synchronized by tacts. For further use of self-synchronizing circuit technique SNN accepts neurons can be self-running or synchronized. In contrast to the von Neumann network there are no limitations for topology of neurons for semantic networks. It leads to the impossibility of relative addressing of neurons as it was done by von Neumann. In this case an absolute readdressing should be used. Every neuron should have a unique identifier that would provide a direct access to another neuron. Of course, neurons interacting by axons-dendrites should have each other's identifiers. An absolute readdressing can be modulated by using neuron specificity as it was realized for biological neural networks. There’s no description for self-reflectiveness and self-modification abilities into the initial description of semantic networks [Dudar Z.V., Shuklin D.E., 2000]. But in [Shuklin D.E. 2004] a conclusion had been drawn about the necessity of introspection and self-modification abilities in the system. For maintenance of these abilities a concept of pointer to neuron is provided. Pointers represent virtual connections between neurons. In this model, bodies and signals transferring through the neurons connections represent a physical body, and virtual connections between neurons are representing an astral body. It is proposed to create models of artificial neuron networks on the basis of virtual machine supporting the opportunity for paranormal effects. SNN is generally used for natural language processing. == Related models == Computational creativity Semantic hashing Semantic Pointer Architecture Sparse distributed memory

    Read more →
  • Aikuma

    Aikuma

    Aikuma is an Android app for collecting speech recordings with time-aligned translations. The app includes a text-free interface for consecutive interpretation, designed for users who are not literate. The Aikuma won Grand Prize in the Open Source Software World Challenge (2013). == Name == Aikuma means "meeting place" in Usarufa, a Papuan language where this software was first used in 2012. == History == Aikuma was developed with sponsorship from the National Science Foundation, including a $101,501 (US) project, "to use mobile telephones to collect larger amounts of data on undocumented endangered languages than would never be possible through usual fieldwork." Aikuma and its modified version (Lig-Aikuma) have been used for collecting substantial quantities of audio in remote indigenous villages. A modified version of the app, called Lig-Aikuma, has been developed at the Université Grenoble Alpes (LIG laboratory) and implements new features such as elicitation of speech from text, images and videos. == Similar Software == Lingua Libre is an online collaborative project and tool by the Wikimedia France association, which can be used as a tool for Language Preservation. Lingua Libre enables to record words, phrases, or sentences of any language, oral (audio recording) or signed (video recording). It is a highly efficient method to record endangered languages since up to 1000 words can be recorded per hour. All the content is under Free License, and speakers of minority languages are encouraged to record their own dialects.

    Read more →
  • Concept drift

    Concept drift

    In predictive analytics, data science, machine learning and related fields, concept drift or drift is an evolution of data that invalidates the data model. It happens when the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become less accurate as time passes. Drift detection and drift adaptation are of paramount importance in the fields that involve dynamically changing data and data models. == Predictive model decay == In machine learning and predictive analytics this drift phenomenon is called concept drift. In machine learning, a common element of a data model are the statistical properties, such as probability distribution of the actual data. If they deviate from the statistical properties of the training data set, then the learned predictions may become invalid, if the drift is not addressed. == Data configuration decay == Another important area is software engineering, where three types of data drift affecting data fidelity may be recognized. Changes in the software environment ("infrastructure drift") may invalidate software infrastructure configuration. "Structural drift" happens when the data schema changes, which may invalidate databases. "Semantic drift" is changes in the meaning of data while the structure does not change. In many cases this may happen in complicated applications when many independent developers introduce changes without proper awareness of the effects of their changes in other areas of the software system. For many application systems, the nature of data on which they operate are subject to changes for various reasons, e.g., due to changes in business model, system updates, or switching the platform on which the system operates. In the case of cloud computing, infrastructure drift that may affect the applications running on cloud may be caused by the updates of cloud software. There are several types of detrimental effects of data drift on data fidelity. Data corrosion is passing the drifted data into the system undetected. Data loss happens when valid data are ignored due to non-conformance with the applied schema. Squandering is the phenomenon when new data fields are introduced upstream in the data processing pipeline, but somewhere downstream these data fields are absent. == Inconsistent data == "Data drift" may refer to the phenomenon when database records fail to match the real-world data due to the changes in the latter over time. This is a common problem with databases involving people, such as customers, employees, citizens, residents, etc. Human data drift may be caused by unrecorded changes in personal data, such as place of residence or name, as well as due to errors during data input. "Data drift" may also refer to inconsistency of data elements between several replicas of a database. The reasons can be difficult to identify. A simple drift detection is to run checksum regularly. However the remedy may be not so easy. == Examples == The behavior of the customers in an online shop may change over time. For example, if weekly merchandise sales are to be predicted, and a predictive model has been developed that works satisfactorily. The model may use inputs such as the amount of money spent on advertising, promotions being run, and other metrics that may affect sales. The model is likely to become less and less accurate over time – this is concept drift. In the merchandise sales application, one reason for concept drift may be seasonality, which means that shopping behavior changes seasonally. Perhaps there will be higher sales in the winter holiday season than during the summer, for example. Concept drift generally occurs when the covariates that comprise the data set begin to explain the variation of your target set less accurately — there may be some confounding variables that have emerged, and that one simply cannot account for, which renders the model accuracy to progressively decrease with time. Generally, it is advised to perform health checks as part of the post-production analysis and to re-train the model with new assumptions upon signs of concept drift. == Possible remedies == To prevent deterioration in prediction accuracy because of concept drift, reactive and tracking solutions can be adopted. Reactive solutions retrain the model in reaction to a triggering mechanism, such as a change-detection test or control charts from statistical process control, to explicitly detect concept drift as a change in the statistics of the data-generating process. When concept drift is detected, the current model is no longer up-to-date and must be replaced by a new one to restore prediction accuracy. A shortcoming of reactive approaches is that performance may decay until the change is detected. Tracking solutions seek to track the changes in the concept by continually updating the model. Methods for achieving this include online machine learning, frequent retraining on the most recently observed samples, and maintaining an ensemble of classifiers where one new classifier is trained on the most recent batch of examples and replaces the oldest classifier in the ensemble. Contextual information, when available, can be used to better explain the causes of the concept drift: for instance, in the sales prediction application, concept drift might be compensated by adding information about the season to the model. By providing information about the time of the year, the rate of deterioration of your model is likely to decrease, but concept drift is unlikely to be eliminated altogether. This is because actual shopping behavior does not follow any static, finite model. New factors may arise at any time that influence shopping behavior, the influence of the known factors or their interactions may change. Concept drift cannot be avoided for complex phenomena that are not governed by fixed laws of nature. All processes that arise from human activity, such as socioeconomic processes, and biological processes are likely to experience concept drift. Therefore, periodic retraining, also known as refreshing, of any model is necessary. === Remedy methods === DDM (Drift Detection Method): detects drift by monitoring the model's error rate over time. When the error rate passes a set threshold, it enters a warning phase, and if it passes another threshold, it enters a drift phase. EDDM (Early Drift Detection Method): improves DDM's detection rate by tracking the average distance between two errors instead of only the error rate. ADWIN (Adaptive Windowing): dynamically stores a window of recent data and warns the user if it detects a significant change between the statistics of the window's earlier data compared to more recent data. KSWIN (Kolmogorov–Smirnov Windowing): detects drift based on the Kolmogorov-Smirnov statistical test. DDM and EDDM: Concept Drift Detection online supervised methods that rely on sequential error monitoring to estimate the evolving error rate. ADWIN and KSWIN: Windowing maintain a "window", a subset of the most recent data, of the data stream, which it checks for statistical differences across the window. == Applications in security == Concept drift is a recurring issue in security analytics, especially in malware and intrusion detection. In these systems, models are often trained on past logs, binaries or network traces, but the behaviour of attackers changes over time as new malware families, obfuscation techniques and campaigns appear. When the data no longer resemble the training set, the decision boundaries learned by classifiers or anomaly detectors can become misaligned with the current threat landscape and detection performance can drop unless the models are updated or replaced. Several studies on Windows malware model detection as an evolving data stream and track how performance changes as time passes. They show that classifiers trained on a fixed time window can perform well on nearby data but deteriorate quickly when evaluated on samples collected months or years later, even when large amounts of training data are available. In order to keep up with this, security systems often use sliding or adaptive windows, which restrict training to the most recent portion of the data so that older, less relevant examples are gradually discarded. They also employ drift detectors such as ADWIN and KSWIN that monitor error rates or changes in the distribution of recent observations and signal when the statistics of the incoming stream differ significantly from the past, prompting retraining or model replacement. Related problems appear in spam filtering, fraud detection and intrusion detection, where adversaries change content, patterns of activity or network behavior to evade models trained on historical data. In these settings drift can be gradual, as new types of spam or fraud emerge, or abrupt, after a sudden shift in attack techniques. Common strategies to remain eff

    Read more →
  • N-jet

    N-jet

    An N-jet is the set of (partial) derivatives of a function f ( x ) {\displaystyle f(x)} up to order N. Specifically, in the area of computer vision, the N-jet is usually computed from a scale space representation L {\displaystyle L} of the input image f ( x , y ) {\displaystyle f(x,y)} , and the partial derivatives of L {\displaystyle L} are used as a basis for expressing various types of visual modules. For example, algorithms for tasks such as feature detection, feature classification, stereo matching, tracking and object recognition can be expressed in terms of N-jets computed at one or several scales in scale space.

    Read more →
  • Visual descriptor

    Visual descriptor

    In computer vision, visual descriptors or image descriptors are descriptions of the visual features of the contents in images, videos, or algorithms or applications that produce such descriptions. They describe elementary characteristics such as the shape, the color, the texture or the motion, among others. == Introduction == As a result of the new communication technologies and the massive use of Internet in our society, the amount of audio-visual information available in digital format is increasing considerably. Therefore, it has been necessary to design some systems that allow us to describe the content of several types of multimedia information in order to search and classify them. The audio-visual descriptors are in charge of the contents description. These descriptors have a good knowledge of the objects and events found in a video, image or audio and they allow the quick and efficient searches of the audio-visual content. This system can be compared to the search engines for textual contents. Although it is relatively easy to find text with a computer, it is much more difficult to find concrete audio and video parts. For instance, imagine somebody searching a scene of a happy person. The happiness is a feeling and it is not evident its shape, color and texture description in images. The description of the audio-visual content is not a superficial task and it is essential for the effective use of this type of archives. The standardization system that deals with audio-visual descriptors is the MPEG-7 (Motion Picture Expert Group - 7). == Types == Descriptors are the first step to find out the connection between pixels contained in a digital image and what humans recall after having observed an image or a group of images after some minutes. Visual descriptors are divided in two main groups: General information descriptors: contain low level descriptors which give a description about color, shape, regions, textures and motion. Specific domain information descriptors: give information about objects and events in the scene. A concrete example would be face recognition. === General information descriptors === General information descriptors consist of a set of descriptors that covers different basic and elementary features like: color, texture, shape, motion, location and others. This description is automatically generated by means of signal processing. ==== Color ==== It's the most basic quality of visual content. Five tools are defined to describe color. The three first tools represent the color distribution and the last ones describe the color relation between sequences or group of images: Dominant color descriptor (DCD) Scalable color descriptor (SCD) Color structure descriptor (CSD) Color layout descriptor (CLD) Group of frame (GoF) or group-of-pictures (GoP) ==== Texture ==== It's an important quality in order to describe an image. The texture descriptors characterize image textures or regions. They observe the region homogeneity and the histograms of these region borders. The set of descriptors is formed by: Homogeneous texture descriptor (HTD) Texture browsing descriptor (TBD) Edge histogram descriptor (EHD) ==== Shape ==== It contains important semantic information due to human's ability to recognize objects through their shape. However, this information can only be extracted by means of a segmentation similar to the one that the human visual system implements. Nowadays, such a segmentation system is not available yet, however there exists a serial of algorithms which are considered to be a good approximation. These descriptors describe regions, contours and shapes for 2D images and for 3D volumes. The shape descriptors are the following ones: Region-based shape descriptor (RSD) Contour-based shape descriptor (CSD) 3-D shape descriptor (3-D SD) ==== Motion ==== It's defined by four different descriptors which describe motion in video sequence. Motion is related to the objects motion in the sequence and to the camera motion. This last information is provided by the capture device, whereas the rest is implemented by means of image processing. The descriptor set is the following one: Motion activity descriptor (MAD) Camera motion descriptor (CMD) Motion trajectory descriptor (MTD) Warping and parametric motion descriptor (WMD and PMD) ==== Location ==== Elements location in the image is used to describe elements in the spatial domain. In addition, elements can also be located in the temporal domain: Region locator descriptor (RLD) Spatio temporal locator descriptor (STLD) === Specific domain information descriptors === These descriptors, which give information about objects and events in the scene, are not easily extractable, even more when the extraction is to be automatically done. Nevertheless, they can be manually processed. As mentioned before, face recognition is a concrete example of an application that tries to automatically obtain this information. == Descriptors applications == Among all applications, the most important ones are: Multimedia documents search engines and classifiers. Digital library: visual descriptors allow a very detailed and concrete search of any video or image by means of different search parameters. For instance, the search of films where a known actor appears, the search of videos containing the Everest mountain, etc. Personalized electronic news service. Possibility of an automatic connection to a TV channel broadcasting a soccer match, for example, whenever a player approaches the goal area. Control and filtering of concrete audiovisual content, like violent or pornographic material. Also, authorization for some multimedia content.

    Read more →