AI Art Creator Free

AI Art Creator Free — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Apache Hama

    Apache Hama

    Apache Hama is a distributed computing framework based on bulk synchronous parallel computing techniques for massive scientific computations e.g., matrix, graph and network algorithms. Originally a sub-project of Hadoop, it became an Apache Software Foundation top level project in 2012. It was created by Edward J. Yoon, who named it (short for "Hadoop Matrix Algebra"), and Hama also means hippopotamus in Yoon's native Korean language (하마), following the trend of naming Apache projects after animals and zoology (such as Apache Pig). Hama was inspired by Google's Pregel large-scale graph computing framework described in 2010. When executing graph algorithms, Hama showed a fifty-fold performance increase relative to Hadoop. Retired in April 2020, project resources are made available as part of the Apache Attic. Yoon cited issues of installation, scalability, and a difficult programming model for its lack of adoption. == Architecture == Hama consists of three major components: BSPMaster, GroomServers and Zookeeper. === BSPMaster === BSPMaster is responsible for: Maintaining groom server status Controlling super steps in a cluster Maintaining job progress information Scheduling jobs and assigning tasks to groom servers Disseminating execution class across groom servers Controlling fault Providing users with the cluster control interface. A BSP Master and multiple grooms are started by the script. Then, the bsp master starts up with a RPC server for groom servers. Groom servers starts up with a BSPPeer instance and a RPC proxy to contact the bsp master. After started, each groom periodically sends a heartbeat message that encloses its groom server status, including maximum task capacity, unused memory, and so on. Each time the BSP master receives a heartbeat message, it brings the groom server status up-to-date. The bsp master makes use of groom servers' status in order to assign tasks to idle groom servers - and returns a heartbeat response containing assigned tasks and others actions for a groom server to do. Currently BSP master has a FIFO job scheduler and simple task assignment algorithms. === GroomServer === A groom server (shortly referred to as groom) is a process that performs BSP tasks assigned by BSPMaster. Each groom contacts the BSPMaster, and it takes assigned tasks and reports its status by means of periodical piggybacks with BSPMaster. Each groom is designed to run with HDFS or other distributed storages. Basically, a groom server and a data node should be run on one physical node. === Zookeeper === A Zookeeper is used to manage the efficient barrier synchronisation of the BSPPeers.

    Read more →
  • GasBuddy

    GasBuddy

    GasBuddy is a technology company headquartered in Dallas, United States, that offers mobile applications and websites for tracking crowd-sourced locations and prices of gas stations and convenience stores in the United States and Canada. Their platforms offer information sourced from users, gas station operators, and partner companies. They also provide business-to-business services to gas stations and convenience store owners. == History == GasBuddy was founded in Minneapolis in 2000 by Dustin Coupal, Jason Toews as a community website for sharing gas prices. In 2004, they filed as a for-profit corporation in Minnesota under the name GasBuddy Organization Inc. In 2009, GasBuddy launched OpenStore, a platform that allows convenience stores to build and manage their own mobile apps. In 2010, the company launched its own mobile apps that allowed users to input gas prices from their smartphones. In 2013, Oil Price Information Service (OPIS), a subsidiary of UCG, acquired GasBuddy. OPIS is a provider of petroleum pricing and news for businesses. In 2016, IHS acquired OPIS, separating from GasBuddy, which remained with UCG as a subsidiary company. Initially only available in the United States and Canada, GasBuddy launched in Australia in March 2016. Also in that year, GasBuddy released a completely redesigned app, its first major redesign since its release in 2010. GasBuddy also unveiled a new logo and launched GasBuddy Business Pages. GasBuddy shut down the Australian version of their app in 2022. In 2017, GasBuddy launched a gas savings program titled "Pay with GasBuddy" intended to let consumers save at gas stations in the United States. In the same year, GasBuddy was involved in a lawsuit with Reveal Mobile, a location-based marketing company, over the sale of user location data. It was revealed that GasBuddy sold information on more than 4.5 million users to Reveal each month for $9.50 per 1000 users. According to CNET, that information included "users' latitude, longitude, IP address, and time stamps on the data collected," which sparked concern in the media and between its users. In 2021, the GasBuddy app rose to the most popular app on both Android and iPhone platforms in the wake of the Colonial Pipeline ransomware attack PDI acquired GasBuddy in 2021.

    Read more →
  • Text Retrieval Conference

    Text Retrieval Conference

    The Text REtrieval Conference (TREC) is an ongoing series of workshops focusing on a list of different information retrieval (IR) research areas, or tracks. It is co-sponsored by the National Institute of Standards and Technology (NIST) and the Intelligence Advanced Research Projects Activity (part of the office of the Director of National Intelligence), and began in 1992 as part of the TIPSTER Text program. Its purpose is to support and encourage research within the information retrieval community by providing the infrastructure necessary for large-scale evaluation of text retrieval methodologies and to increase the speed of lab-to-product transfer of technology. TREC's evaluation protocols have improved many search technologies. A 2010 study estimated that "without TREC, U.S. Internet users would have spent up to 3.15 billion additional hours using web search engines between 1999 and 2009." Hal Varian the Chief Economist at Google wrote that "The TREC data revitalized research on information retrieval. Having a standard, widely available, and carefully constructed set of data laid the groundwork for further innovation in this field." Each track has a challenge wherein NIST provides participating groups with data sets and test problems. Depending on track, test problems might be questions, topics, or target extractable features. Uniform scoring is performed so the systems can be fairly evaluated. After evaluation of the results, a workshop provides a place for participants to collect together thoughts and ideas and present current and future research work.Text Retrieval Conference started in 1992, funded by DARPA (US Defense Advanced Research Project) and run by NIST. Its purpose was to support research within the information retrieval community by providing the infrastructure necessary for large-scale evaluation of text retrieval methodologies. == Goals == Encourage retrieval search based on large text collections Increase communication among industry, academia, and government by creating an open forum for the exchange of research ideas Speed the transfer of technology from research labs into commercial products by demonstrating substantial improvements retrieval methodologies on real world problems To increase the availability of appropriate evaluation techniques for use by industry and academia including development of new evaluation techniques more applicable to current systems TREC is overseen by a program committee consisting of representatives from government, industry, and academia. For each TREC, NIST provide a set of documents and questions. Participants run their own retrieval system on the data and return to NIST a list of retrieved top-ranked documents. NIST pools the individual result judges the retrieved documents for correctness and evaluates the results. The TREC cycle ends with a workshop that is a forum for participants to share their experiences. == Relevance judgments in TREC == TREC defines relevance as: "If you were writing a report on the subject of the topic and would use the information contained in the document in the report, then the document is relevant." Most TREC retrieval tasks use binary relevance: a document is either relevant or not relevant. Some TREC tasks use graded relevance, capturing multiple degrees of relevance. Most TREC collections are too large to perform complete relevance assessment; for these collections it is impossible to calculate the absolute recall for each query. To decide which documents to assess, TREC usually uses a method call pooling. In this method, the top-ranked n documents from each contributing run are aggregated, and the resulting document set is judged completely. == Various TRECs == In 1992 TREC-1 was held at NIST. The first conference attracted 28 groups of researchers from academia and industry. It demonstrated a wide range of different approaches to the retrieval of text from large document collections .Finally TREC1 revealed the facts that automatic construction of queries from natural language query statements seems to work. Techniques based on natural language processing were no better no worse than those based on vector or probabilistic approach. TREC2 Took place in August 1993. 31 group of researchers participated in this. Two types of retrieval were examined. Retrieval using an ‘ad hoc’ query and retrieval using a ‘routing' query In TREC-3 a small group experiments worked with Spanish language collection and others dealt with interactive query formulation in multiple databases TREC-4 they made even shorter to investigate the problems with very short user statements TREC-5 includes both short and long versions of the topics with the goal of carrying out deeper investigation into which types of techniques work well on various lengths of topics In TREC-6 Three new tracks speech, cross language, high precision information retrieval were introduced. The goal of cross language information retrieval is to facilitate research on system that are able to retrieve relevant document regardless of language of the source document TREC-7 contained seven tracks out of which two were new Query track and very large corpus track. The goal of the query track was to create a large query collection TREC-8 contain seven tracks out of which two –question answering and web tracks were new. The objective of QA query is to explore the possibilities of providing answers to specific natural language queries TREC-9 Includes seven tracks In TREC-10 Video tracks introduced Video tracks design to promote research in content based retrieval from digital video In TREC-11 Novelty tracks introduced. The goal of novelty track is to investigate systems abilities to locate relevant and new information within the ranked set of documents returned by a traditional document retrieval system TREC-12 held in 2003 added three new tracks; Genome track, robust retrieval track, HARD (Highly Accurate Retrieval from Documents) == Tracks == === Current tracks === New tracks are added as new research needs are identified, this list is current for TREC 2018. CENTRE Track – Goal: run in parallel CLEF 2018, NTCIR-14, TREC 2018 to develop and tune an IR reproducibility evaluation protocol (new track for 2018). Common Core Track – Goal: an ad hoc search task over news documents. Complex Answer Retrieval (CAR) – Goal: to develop systems capable of answering complex information needs by collating information from an entire corpus. Incident Streams Track – Goal: to research technologies to automatically process social media streams during emergency situations (new track for TREC 2018). The News Track – Goal: partnership with The Washington Post to develop test collections in news environment (new for 2018). Precision Medicine Track – Goal: a specialization of the Clinical Decision Support track to focus on linking oncology patient data to clinical trials. Real-Time Summarization Track (RTS) – Goal: to explore techniques for real-time update summaries from social media streams. === Past tracks === Chemical Track – Goal: to develop and evaluate technology for large scale search in chemistry-related documents, including academic papers and patents, to better meet the needs of professional searchers, and specifically patent searchers and chemists. Clinical Decision Support Track – Goal: to investigate techniques for linking medical cases to information relevant for patient care Contextual Suggestion Track – Goal: to investigate search techniques for complex information needs that are highly dependent on context and user interests. Crowdsourcing Track – Goal: to provide a collaborative venue for exploring crowdsourcing methods both for evaluating search and for performing search tasks. Genomics Track – Goal: to study the retrieval of genomic data, not just gene sequences but also supporting documentation such as research papers, lab reports, etc. Last ran on TREC 2007. Dynamic Domain Track – Goal: to investigate domain-specific search algorithms that adapt to the dynamic information needs of professional users as they explore in complex domains. Enterprise Track – Goal: to study search over the data of an organization to complete some task. Last ran on TREC 2008. Entity Track – Goal: to perform entity-related search on Web data. These search tasks (such as finding entities and properties of entities) address common information needs that are not that well modeled as ad hoc document search. Cross-Language Track – Goal: to investigate the ability of retrieval systems to find documents topically regardless of source language. After 1999, this track spun off into CLEF. FedWeb Track – Goal: to select best resources to forward a query to, and merge the results so that most relevant are on the top. Federated Web Search Track – Goal: to investigate techniques for the selection and combination of search results from a large number of real on-line web search services. Filtering Track – Goal: to binarily decide retrieval of new

    Read more →
  • ELIZA

    ELIZA

    ELIZA is an early natural language processing computer program developed from 1964 to 1967 at MIT by Joseph Weizenbaum. Created to explore communication between humans and machines, ELIZA simulated conversation by using a pattern matching and substitution methodology that gave users an illusion of understanding on the part of the program, but gave no response that could be considered really understanding what was being said by either party. Whereas the ELIZA program itself was written (originally) in MAD-SLIP, the pattern matching directives that contained most of its language capability were provided in separate "scripts", represented in a Lisp-like expression. The most famous script, DOCTOR, simulated a psychotherapist of the Rogerian school (in which the therapist often reflects back the patient's words to the patient), and used rules, dictated in the script, to respond with non-directional questions to user inputs. As such, ELIZA was one of the first chatbots (originally "chatterbots") and one of the first programs capable of attempting the Turing test. Weizenbaum intended the program as a method to explore communication between humans and machines. He was surprised that some people, including his secretary, attributed human-like feelings to the computer program, a phenomenon that came to be called the ELIZA effect. Many academics believed that the program would be able to positively influence the lives of many people, particularly those with psychological issues, and that it could aid doctors working on such patients' treatment. While ELIZA was capable of engaging in discourse, it could not converse with true understanding. However, many early users were convinced of ELIZA's intelligence and understanding, despite Weizenbaum's insistence to the contrary. The original ELIZA source code had been missing since its creation in the 1960s, as it was not common to publish articles that included source code at that time. However, more recently the MAD-SLIP source code was discovered in the MIT archives and published on various platforms, such as the Internet Archive. The source code is of high historical interest since it demonstrates not only the specificity of programming languages and techniques at that time, but also the beginning of software layering and abstraction as a means of achieving sophisticated software programming. == Overview == Joseph Weizenbaum's ELIZA, running the DOCTOR script, created a conversational interaction somewhat similar to what might take place in the office of "a [non-directive] psychotherapist in an initial psychiatric interview" and to "demonstrate that the communication between man and machine was superficial". While ELIZA is best known for acting in the manner of a psychotherapist, the speech patterns are due to the data and instructions supplied by the DOCTOR script. ELIZA itself examined the text for keywords, applied values to said keywords, and transformed the input into an output; the script that ELIZA ran determined the keywords, set the values of keywords, and set the rules of transformation for the output. Weizenbaum chose to make the DOCTOR script in the context of psychotherapy to "sidestep the problem of giving the program a data base of real-world knowledge", allowing it to reflect back the patient's statements to carry the conversation forward. The result was a somewhat intelligent-seeming response that reportedly deceived some early users of the program. Weizenbaum named his program ELIZA after Eliza Doolittle, a working-class character in George Bernard Shaw's Pygmalion (also appearing in the musical My Fair Lady, which was based on the play and was hugely popular at the time). According to Weizenbaum, ELIZA's ability to be "incrementally improved" by various users made it similar to Eliza Doolittle, since Eliza Doolittle was taught to speak with an upper-class accent in Shaw's play. However, unlike the human character in Shaw's play, ELIZA is incapable of learning new patterns of speech or new words through interaction alone. Edits must be made directly to ELIZA's active script in order to change the manner by which the program operates. Weizenbaum first implemented ELIZA in his own SLIP list-processing language, where, depending upon the initial entries by the user, the illusion of human intelligence could appear, or be dispelled through several interchanges. Some of ELIZA's responses were so convincing that Weizenbaum and several others have anecdotes of users becoming emotionally attached to the program, occasionally forgetting that they were conversing with a computer. Weizenbaum's own secretary reportedly asked Weizenbaum to leave the room so that she and ELIZA could have a real conversation. Weizenbaum was surprised by this, later writing: "I had not realized ... that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people." In 1966, interactive computing (via a teletype) was new. It was 11 years before the personal computer became familiar to the general public, and three decades before most people encountered attempts at natural language processing in Internet services like Ask.com or PC help systems such as Microsoft Office Clippit. Although those programs included years of research and work, ELIZA remains a milestone because it was the first time a programmer had attempted such a human-machine interaction with the goal of creating the illusion (however brief) of human–human interaction. At the ICCC 1972, ELIZA was brought together with another early artificial-intelligence program named PARRY for a computer-only conversation. While ELIZA was built to speak as a doctor, PARRY was intended to simulate a patient with schizophrenia. == Design and implementation == Weizenbaum originally wrote ELIZA in MAD-SLIP for CTSS on an IBM 7094 as a program to make natural-language conversation possible with a computer. To accomplish this, Weizenbaum identified five "fundamental technical problems" for ELIZA to overcome: the identification of key words, the discovery of a minimal context, the choice of appropriate transformations, the generation of responses in the absence of key words, and the provision of an editing capability for ELIZA scripts. Weizenbaum solved these problems and made ELIZA such that it had no built-in contextual framework or universe of discourse. However, this required ELIZA to have a script of instructions on how to respond to inputs from users. ELIZA starts its process of responding to an input by a user by first examining the text input for a "keyword". A "keyword" is a word designated as important by the acting ELIZA script, which assigns to each keyword a precedence number, or a RANK, designed by the programmer. If such words are found, they are put into a "keystack", with the keyword of the highest RANK at the top. The input sentence is then manipulated and transformed as the rule associated with the keyword of the highest RANK directs. For example, when the DOCTOR script encounters words such as "alike" or "same", it would output a message pertaining to similarity, in this case "In what way?", as these words had high precedence number. This also demonstrates how certain words, as dictated by the script, can be manipulated regardless of contextual considerations, such as switching first-person pronouns and second-person pronouns and vice versa, as these too had high precedence numbers. Such words with high precedence numbers are deemed superior to conversational patterns and are treated independently of contextual patterns. Following the first examination, the next step of the process is to apply an appropriate transformation rule, which includes two parts: the "decomposition rule" and the "reassembly rule". First, the input is reviewed for syntactical patterns in order to establish the minimal context necessary to respond. Using the keywords and other nearby words from the input, different disassembly rules are tested until an appropriate pattern is found. Using the script's rules, the sentence is then "dismantled" and arranged into sections of the component parts as the "decomposition rule for the highest-ranking keyword" dictates. The example that Weizenbaum gives is the input "You are very helpful", which is transformed to "I are very helpful". This is then broken into (1) empty (2) "I" (3) "are" (4) "very helpful". The decomposition rule has broken the phrase into four small segments that contain both the keywords and the information in the sentence. The decomposition rule then designates a particular reassembly rule, or set of reassembly rules, to follow when reconstructing the sentence. The reassembly rule takes the fragments of the input that the decomposition rule had created, rearranges them, and adds in programmed words to create a response. Using Weizenbaum's example previously stated, such a reassembly rule would take the fragments and apply them to the phrase "What makes

    Read more →
  • Tuber (app)

    Tuber (app)

    Tuber (Chinese: Tuber浏览器) was a web browser mobile app developed by Shanghai Fengxuan Information Technology that allowed users within mainland China to view filtered versions of certain websites normally blocked by the Great Firewall. Filtered versions of websites such as Google, Facebook, Instagram, YouTube, Twitter, Netflix, IMDb, and Wikipedia could be viewed. The app was backed by cybersecurity company Qihoo 360 which served as the parent company. The app required phone number registration. Sensitive keywords were blocked by the app. On October 9, 2020, Global Times editor Rita Bai Yunyi tweeted that the move represented "a great step for China's opening up". The app was removed from China domestic app stores and operations ceased as of October 10, 2020. On October 12, when questioned by a Bloomberg News reporter on the topic, Foreign Ministry spokesperson Zhao Lijian replied, "This is not a diplomatic issue, and I do not have the relevant information you mentioned. China has always managed the Internet in accordance with the law. I suggest you ask the competent department for the specific situation."

    Read more →
  • Spleak

    Spleak

    Spleak was an IM platform where users could publish and rate content. It existed in the form of six bots covering as many subject areas: CelebSpleak, SportSpleak, VoteSpleak, TVSpleak, GameSpleak, and StyleSpleak. == Overview == Users can add a "multi-Spleak" (which contains all of the different Spleak bots in one) or add the separate bots to their IM buddy lists on MSN and AIM. Users are also allowed access to Spleak online by using a CelebSpleak, SportSpleak, or VoteSpleak widget, or through the CelebSpleak and SportSpleak applications with Facebook. Spleak was an alternate reality game and is moving to its own company, Spleak Media Network. "Celebrate Spleak" was introduced throughout 2007, launched in 2008, and was forced to retire in 2009. == Key people == Spleak was co-founded by Morten Lund and Nicolaj Reffstrup. The company's chief executive officer is Morrie Eisenburg; Josh Scott is Vice President in Product and Tyler Wells is Vice President in Engineering.

    Read more →
  • Query understanding

    Query understanding

    Query understanding is the process of inferring the intent of a search engine user by extracting semantic meaning from the searcher’s keywords. Query understanding methods generally take place before the search engine retrieves and ranks results. It is related to natural language processing but specifically focused on the understanding of search queries. == Methods == === Stemming and lemmatization === Many languages inflect words to reflect their role in the utterance they appear in. The variation between various forms of a word is likely to be of little importance for the relatively coarse-grained model of meaning involved in a retrieval system, and for this reason the task of conflating the various forms of a word is a potentially useful technique to increase recall of a retrieval system. Stemming algorithms, also known as stemmers, typically use a collection of simple rules to remove suffixes intended to model the language’s inflection rules. For some languages, there are simple lemmatisation methods to reduce a word in query to its lemma or root form or its stem; for others, this operation involves non-trivial string processing and may require recognizing the word's part of speech or referencing a lexical database. The effectiveness of stemming and lemmatization varies across languages. === Query Segmentation === Query segmentation is a key component of query understanding, aiming to divide a query into meaningful segments. Traditional approaches, such as the bag-of-words model, treat individual words as independent units, which can limit interpretative accuracy. For languages like Chinese, where words are not separated by spaces, segmentation is essential, as individual characters often lack standalone meaning. Even in English, the BOW model may not capture the full meaning, as certain phrases—such as "New York"—carry significance as a whole rather than as isolated terms. By identifying phrases or entities within queries, query segmentation enhances interpretation, enabling search engines to apply proximity and ordering constraints, ultimately improving search accuracy and user satisfaction. === Entity recognition === Entity recognition is the process of locating and classifying entities within a text string. Named-entity recognition specifically focuses on named entities, such as names of people, places, and organizations. In addition, entity recognition includes identifying concepts in queries that may be represented by multi-word phrases. Entity recognition systems typically use grammar-based linguistic techniques or statistical machine learning models. === Query rewriting === Query rewriting is the process of automatically reformulating a search query to more accurately capture its intent. Query expansion adds additional query terms, such as synonyms, in order to retrieve more documents and thereby increase recall. Query relaxation removes query terms to reduce the requirements for a document to match the query, thereby also increasing recall. Other forms of query rewriting, such as automatically converting consecutive query terms into phrases and restricting query terms to specific fields, aim to increase precision. === Spelling Correction === Automatic spelling correction is a critical feature of modern search engines, designed to address common spelling errors in user queries. Such errors are especially frequent as users often search for unfamiliar topics. By correcting misspelled queries, search engines enhance their understanding of user intent, thereby improving the relevance and quality of search results and overall user experience.

    Read more →
  • Neural network Gaussian process

    Neural network Gaussian process

    A Neural Network Gaussian Process (NNGP) is a Gaussian process (GP) obtained as the limit of a certain type of sequence of neural networks. Specifically, a wide variety of network architectures converges to a GP in the infinitely wide limit, in the sense of distribution. The concept constitutes an intensional definition, i.e., a NNGP is just a GP, but distinguished by how it is obtained. == Motivation == Bayesian networks are a modeling tool for assigning probabilities to events, and thereby characterizing the uncertainty in a model's predictions. Deep learning and artificial neural networks are approaches used in machine learning to build computational models which learn from training examples. Bayesian neural networks merge these fields. They are a type of neural network whose parameters and predictions are both probabilistic. While standard neural networks often assign high confidence even to incorrect predictions, Bayesian neural networks can more accurately evaluate how likely their predictions are to be correct. Computation in artificial neural networks is usually organized into sequential layers of artificial neurons. The number of neurons in a layer is called the layer width. When we consider a sequence of Bayesian neural networks with increasingly wide layers (see figure), they converge in distribution to a NNGP. This large width limit is of practical interest, since the networks often improve as layers get wider. And the process may give a closed form way to evaluate networks. NNGPs also appears in several other contexts: It describes the distribution over predictions made by wide non-Bayesian artificial neural networks after random initialization of their parameters, but before training; it appears as a term in neural tangent kernel prediction equations; it is used in deep information propagation to characterize whether hyperparameters and architectures will be trainable. It is related to other large width limits of neural networks. === Scope === The first correspondence result had been established in the 1995 PhD thesis of Radford M. Neal, then supervised by Geoffrey Hinton at University of Toronto. Neal cites David J. C. MacKay as inspiration, who worked in Bayesian learning. Today the correspondence is proven for: Single hidden layer Bayesian neural networks; deep fully connected networks as the number of units per layer is taken to infinity; convolutional neural networks as the number of channels is taken to infinity; transformer networks as the number of attention heads is taken to infinity; recurrent networks as the number of units is taken to infinity. In fact, this NNGP correspondence holds for almost any architecture: Generally, if an architecture can be expressed solely via matrix multiplication and coordinatewise nonlinearities (i.e., a tensor program), then it has an infinite-width GP. This in particular includes all feedforward or recurrent neural networks composed of multilayer perceptron, recurrent neural networks (e.g., LSTMs, GRUs), (nD or graph) convolution, pooling, skip connection, attention, batch normalization, and/or layer normalization. === Illustration === Every setting of a neural network's parameters θ {\displaystyle \theta } corresponds to a specific function computed by the neural network. A prior distribution p ( θ ) {\displaystyle p(\theta )} over neural network parameters therefore corresponds to a prior distribution over functions computed by the network. As neural networks are made infinitely wide, this distribution over functions converges to a Gaussian process for many architectures. The notation used in this section is the same as the notation used below to derive the correspondence between NNGPs and fully connected networks, and more details can be found there. The figure to the right plots the one-dimensional outputs z L ( ⋅ ; θ ) {\displaystyle z^{L}(\cdot ;\theta )} of a neural network for two inputs x {\displaystyle x} and x ∗ {\displaystyle x^{}} against each other. The black dots show the function computed by the neural network on these inputs for random draws of the parameters from p ( θ ) {\displaystyle p(\theta )} . The red lines are iso-probability contours for the joint distribution over network outputs z L ( x ; θ ) {\displaystyle z^{L}(x;\theta )} and z L ( x ∗ ; θ ) {\displaystyle z^{L}(x^{};\theta )} induced by p ( θ ) {\displaystyle p(\theta )} . This is the distribution in function space corresponding to the distribution p ( θ ) {\displaystyle p(\theta )} in parameter space, and the black dots are samples from this distribution. For infinitely wide neural networks, since the distribution over functions computed by the neural network is a Gaussian process, the joint distribution over network outputs is a multivariate Gaussian for any finite set of network inputs. == Discussion == === Infinitely wide fully connected network === This section expands on the correspondence between infinitely wide neural networks and Gaussian processes for the specific case of a fully connected architecture. It provides a proof sketch outlining why the correspondence holds, and introduces the specific functional form of the NNGP for fully connected networks. The proof sketch closely follows the approach by Novak and coauthors. ==== Network architecture specification ==== Consider a fully connected artificial neural network with inputs x {\displaystyle x} , parameters θ {\displaystyle \theta } consisting of weights W l {\displaystyle W^{l}} and biases b l {\displaystyle b^{l}} for each layer l {\displaystyle l} in the network, pre-activations (pre-nonlinearity) z l {\displaystyle z^{l}} , activations (post-nonlinearity) y l {\displaystyle y^{l}} , pointwise nonlinearity ϕ ( ⋅ ) {\displaystyle \phi (\cdot )} , and layer widths n l {\displaystyle n^{l}} . For simplicity, the width n L + 1 {\displaystyle n^{L+1}} of the readout vector z L {\displaystyle z^{L}} is taken to be 1. The parameters of this network have a prior distribution p ( θ ) {\displaystyle p(\theta )} , which consists of an isotropic Gaussian for each weight and bias, with the variance of the weights scaled inversely with layer width. This network is illustrated in the figure to the right, and described by the following set of equations: x ≡ input y l ( x ) = { x l = 0 ϕ ( z l − 1 ( x ) ) l > 0 z i l ( x ) = ∑ j W i j l y j l ( x ) + b i l W i j l ∼ N ( 0 , σ w 2 n l ) b i l ∼ N ( 0 , σ b 2 ) ϕ ( ⋅ ) ≡ nonlinearity y l ( x ) , z l − 1 ( x ) ∈ R n l × 1 n L + 1 = 1 θ = { W 0 , b 0 , … , W L , b L } {\displaystyle {\begin{aligned}x&\equiv {\text{input}}\\y^{l}(x)&=\left\{{\begin{array}{lcl}x&&l=0\\\phi \left(z^{l-1}(x)\right)&&l>0\end{array}}\right.\\z_{i}^{l}(x)&=\sum _{j}W_{ij}^{l}y_{j}^{l}(x)+b_{i}^{l}\\W_{ij}^{l}&\sim {\mathcal {N}}\left(0,{\frac {\sigma _{w}^{2}}{n^{l}}}\right)\\b_{i}^{l}&\sim {\mathcal {N}}\left(0,\sigma _{b}^{2}\right)\\\phi (\cdot )&\equiv {\text{nonlinearity}}\\y^{l}(x),z^{l-1}(x)&\in \mathbb {R} ^{n^{l}\times 1}\\n^{L+1}&=1\\\theta &=\left\{W^{0},b^{0},\dots ,W^{L},b^{L}\right\}\end{aligned}}} ==== ==== z l | y l {\displaystyle z^{l}|y^{l}} is a Gaussian process We first observe that the pre-activations z l {\displaystyle z^{l}} are described by a Gaussian process conditioned on the preceding activations y l {\displaystyle y^{l}} . This result holds even at finite width. Each pre-activation z i l {\displaystyle z_{i}^{l}} is a weighted sum of Gaussian random variables, corresponding to the weights W i j l {\displaystyle W_{ij}^{l}} and biases b i l {\displaystyle b_{i}^{l}} , where the coefficients for each of those Gaussian variables are the preceding activations y j l {\displaystyle y_{j}^{l}} . Because they are a weighted sum of zero-mean Gaussians, the z i l {\displaystyle z_{i}^{l}} are themselves zero-mean Gaussians (conditioned on the coefficients y j l {\displaystyle y_{j}^{l}} ). Since the z l {\displaystyle z^{l}} are jointly Gaussian for any set of y l {\displaystyle y^{l}} , they are described by a Gaussian process conditioned on the preceding activations y l {\displaystyle y^{l}} . The covariance or kernel of this Gaussian process depends on the weight and bias variances σ w 2 {\displaystyle \sigma _{w}^{2}} and σ b 2 {\displaystyle \sigma _{b}^{2}} , as well as the second moment matrix K l {\displaystyle K^{l}} of the preceding activations y l {\displaystyle y^{l}} , z i l ∣ y l ∼ G P ( 0 , σ w 2 K l + σ b 2 ) K l ( x , x ′ ) = 1 n l ∑ i y i l ( x ) y i l ( x ′ ) {\displaystyle {\begin{aligned}z_{i}^{l}\mid y^{l}&\sim {\mathcal {GP}}\left(0,\sigma _{w}^{2}K^{l}+\sigma _{b}^{2}\right)\\K^{l}(x,x')&={\frac {1}{n^{l}}}\sum _{i}y_{i}^{l}(x)y_{i}^{l}(x')\end{aligned}}} The effect of the weight scale σ w 2 {\displaystyle \sigma _{w}^{2}} is to rescale the contribution to the covariance matrix from K l {\displaystyle K^{l}} , while the bias is shared for all inputs, and so σ b 2 {\displaystyle \sigma _{b}^{2}} makes the z i l {\displaystyle z_{i}^{l}} for different datapoints more similar and

    Read more →
  • Iubenda

    Iubenda

    iubenda (stylized in lowercase; Italian pronunciation: [juˈbɛnda]) is an Italian software company that develops tools intended to support website and application compliance with data protection and privacy regulations, including consent management platforms. The company was founded in 2011 in Milan by Andrea Giannangelo. In February 2022, the company was acquired by team.blue. == History == iubenda was founded in 2011 in Milan, Italy, initially focusing on automated privacy policy generation. In 2015, the company expanded its services to include cookie compliance tools following the implementation of ePrivacy regulations in Italy. In 2018, following the introduction of the General Data Protection Regulation (GDPR) in the European Union, iubenda expanded its products to include consent management and compliance documentation services. In February 2022, iubenda was acquired by team.blue, which obtained a majority stake in the company. Italian media described the acquisition as one of the largest Italian technology startup exits in recent years. In October 2022, iubenda acquired consentmanager, a Sweden-based consent management provider. In 2025, the company acquired CookieFirst, a Netherlands-based consent management platform. In 2025, iubenda partnered with AccessiWay, a digital accessibility company owned by team.blue. == Activities == iubenda develops software tools intended to support compliance with data protection and privacy regulations. Its products include generators for privacy policies, cookie banners, terms and conditions documents, and consent management platforms. The company’s consent management platform integrates with frameworks used for online advertising and privacy compliance, including Google's Consent Mode. The platform is designed to support compliance with regulatory frameworks including the GDPR in the European Union, the UK GDPR, Brazil’s LGPD, Switzerland’s FADP and privacy laws in the United States. Its tools can be integrated with content management systems, web applications, and other digital platforms, including WordPress. The company operates internationally, with a customer base of more than 150,000 organisations, primarily in Europe and the Americas.

    Read more →
  • Microsoft Teams

    Microsoft Teams

    Microsoft Teams is a team collaboration platform developed by Microsoft as part of the Microsoft 365 suite. It offers features such as workspace chat, video conferencing, file storage, and integration with both Microsoft and third-party applications and services. Teams gradually replaced earlier Microsoft messaging and collaboration platforms, including Skype for Business, Skype, Flip, and Microsoft Classroom. The platform saw significant growth during the COVID-19 pandemic, alongside competitors such as Zoom, Slack, and Google Meet, as organizations shifted to remote work and virtual meetings. As of January 2023, Microsoft reported approximately 280 million monthly active users. == History == On August 29, 2007, Microsoft acquired Parlano, the developer of the persistent group chat tool MindAlign. Years later, on March 4, 2016, Microsoft considered acquiring Slack for $8 billion. However, the proposal was reportedly opposed by Bill Gates, who advocated for focusing on enhancing Skype for Business instead. Lu Qi, then executive vice president of Applications and Services, had led the initiative to pursue the Slack acquisition. Following Lu's departure later that year, Microsoft announced Microsoft Teams on November 2, 2016, at an event in New York City, positioning it as a direct competitor to Slack. Teams launched worldwide on March 14, 2017. The service was initially led by corporate vice president Brian MacDonald. In response to the launch, Slack published a full-page advertisement in The New York Times welcoming the competition and outlining its product philosophy. Although Slack was used by 28 companies in the Fortune 100, The Verge wrote that executives would question paying for the service if Teams provides a similar function in their company's existing Office 365 subscription. However, ZDNET noted that the platforms initially served different markets, as Teams did not support external users, making it less appealing to small businesses and freelancers, a limitation Microsoft later addressed. In response to Teams' announcement, Slack deepened in-product integration with Google services. In May 2017, Microsoft announced that Teams would replace Microsoft Classroom in Office 365 Education. A free version of Teams was released on July 12, 2018, offering most core features at no cost, albeit with limits on users and storage. In January 2019, Microsoft introduced updates targeting "Firstline Workers" to improve Teams’ performance across shared or limited-access devices. In September 2019, Microsoft announced the retirement of Skype for Business in favor of Teams, which took effect on July 31, 2021. In early 2020, Microsoft introduced a push-to-talk "Walkie Talkie" feature aimed at firstline workers using smartphones and tablets over Wi-Fi or cellular networks. The COVID-19 pandemic significantly boosted usage of Teams. On March 19, 2020, Microsoft reported 44 million daily active users. In April, the platform logged 4.1 billion meeting minutes in a single day. A public preview of Microsoft Teams for Linux was released in December 2019, but the Linux client was discontinued in 2022. In July 2020, Microsoft shut down its video game livestreaming platform Mixer, and announced that some of its technologies would be repurposed for use in Teams. On February 28, 2025, Microsoft announced that Skype would be fully retired on May 5, 2025, with users given options to export their data or transition to Microsoft Teams. In October 2025, together with other Microsoft 365 suite apps, Teams had its logo updated. == Usage == == Underlying software == Microsoft Teams, as part of the Microsoft 365 suite, utilizes SharePoint and Exchange Online. Each Team, Shared Channel, and Private Channel has its own Microsoft 365 Group and SharePoint Site used for file storage. Messages are stored in Cosmos DB and are journaled to Exchange Online mailboxes. Private messages, including messages in Private Channels, are journaled to the sender and recipients' mailboxes. Public Channel messages are journaled to their corresponding Team's group mailbox, whereas, messages from Shared Channels are journaled to their own mailboxes. Contacts and voicemail are stored in Exchange Online. Microsoft Teams client is a web-based desktop app, originally developed on top of the Electron framework which combines the Chromium rendering engine and the Node.js JavaScript platform. Version 2.0 client was rebuilt using the Evergreen version of Microsoft Edge WebView2 in place of Electron. == Features == === Chats === Teams allows users to communicate in two-way persistent chats with one or multiple participants. Participants can message using text, emojis, stickers and gifs, as well as sharing links and files. In August 2022, the chat feature was updated for "chat with yourself"; allowing for the organization of files, notes, comments, images, and videos within a private chat tab. === Teams === Teams allows communities, groups, or teams to contribute in a shared workspace where messages and digital content on a specific topic are shared. Team members can join through an invitation sent by a team administrator or owner or sharing of a specific URL. Teams for Education allows admins and teachers to set up groups for classes, professional learning communities (PLCs), staff members, and everyone. === Channels === Channels allow team members to communicate without the use of email or group SMS (texting). Users can reply to posts with text, images, GIFs, and image macros. Direct messages send private messages to designated users rather than the entire channel. Connectors can be used within a channel to submit information contacted through a third-party service. Connectors include Mailchimp, Facebook Pages, Twitter, Power BI and Bing News. === Group conversations === Ad-hoc groups can be created to share instant messaging, audio calls (VoIP), and video calls inside the client software. === Telephone replacement === A feature on one of the higher cost licencing tiers allows connectivity to the public switched telephone network (PSTN) telephone system. This allows users to use Teams as if it were a telephone, making and receiving calls over the PSTN, including the ability to host "conference calls" with multiple participants. === Meeting === Meetings can be scheduled with multiple participants able to share audio, video, chat and presented content with all participants. Multiple users can connect via a meeting link. Automated minutes are possible using the recording and transcript features. Teams has a plugin for Microsoft Outlook to schedule a Teams Meeting in Outlook for a specific date and time and invite others to attend. If a meeting is scheduled within a channel, users visiting the channel are able to see if a meeting is in progress. ==== Teams Live Events ==== Teams Live Events replaces Skype Meeting Broadcast for users to broadcast to 10,000 participants on Teams, Yammer, or Microsoft Stream. ==== Breakout Rooms ==== Breakout rooms split a meeting into small groups. This is often utilized for collaboration during trainings or any environment where having all participants speak at once could be disruptive or unfeasible. Breakout rooms can be set by the hosts to a certain length of time, after which all participants will automatically rejoin the main meeting room. ==== Front Row ==== Front Row adjusts the layout of the viewer's screen, placing the speaker or content in the center of the gallery with other meeting participant's video feeds reduced in size and located below the speaker. === Education === Microsoft Teams for Education allows teachers to distribute, provide feedback, and grade student assignments turned in via Teams using the Assignments tab through Office 365 for Education subscribers. Quizzes can also be assigned to students through an integration with Office Forms. === Protocols === Microsoft Teams is based on a number of Microsoft-specific protocols. Video conferences are realized over the protocol MNP24, known from the Skype consumer version. VoIP and video conference clients based on SIP and H.323 need special gateways to connect to Microsoft Teams servers. With the help of Interactive Connectivity Establishment (ICE), clients behind Network address translation routers and restrictive firewalls are also able to connect, if peer-to-peer is not possible. === Integrations === Microsoft Teams has integrations through Microsoft AppSource, its integration marketplace. In 2020, Microsoft partnered with KUDO, a cloud-based solution with language interpretation, to allow integrated language meeting controls. In June 2022, an update was released using AI to improve call audio through the elimination of background feedback loops and cancelling non-vocal audio. == Anti-trust controversy == In July 2023, the European Commission opened an anti-trust investigation into the possibility that Microsoft unfairly used its office suite market power to increase sales of Teams and hurt

    Read more →
  • Structured-light 3D scanner

    Structured-light 3D scanner

    A structured-light 3D scanner is a device used to capture the three-dimensional shape of an object by projecting light patterns, such as grids or stripes, onto its surface. The deformation of these patterns is recorded by cameras and processed using specialized algorithms to generate a detailed 3D model. Structured-light 3D scanning is widely employed in fields such as industrial design, quality control, cultural heritage preservation, augmented reality gaming, and medical imaging. Compared to laser-based 3D scanning, structured-light scanners use non-coherent light sources, such as LEDs or projectors, which enable faster data acquisition and eliminate potential safety concerns associated with lasers. However, the accuracy of structured-light scanning can be influenced by external factors, including ambient lighting conditions and the reflective properties of the scanned object. == Principle == Projecting a narrow band of light onto a three-dimensional surface creates a line of illumination that appears distorted when viewed from perspectives other than that of the projector. This distortion can be analyzed to reconstruct the geometry of the surface, a technique known as light sectioning. Projecting patterns composed of multiple stripes or arbitrary fringes simultaneously enables the acquisition of numerous data points at once, improving scanning speed. While various structured light projection techniques exist, parallel stripe patterns are among the most commonly used. By analyzing the displacement of these stripes, the three-dimensional coordinates of surface details can be accurately determined. === Generation of light patterns === Two major methods of stripe pattern generation have been established: Laser interference and projection. The laser interference method works with two wide planar laser beam fronts. Their interference results in regular, equidistant line patterns. Different pattern sizes can be obtained by changing the angle between these beams. The method allows for the exact and easy generation of very fine patterns with unlimited depth of field. Disadvantages are high cost of implementation, difficulties providing the ideal beam geometry, and laser typical effects like speckle noise and the possible self interference with beam parts reflected from objects. Typically, there is no means of modulating individual stripes, such as with Gray codes. The projection method uses incoherent light and basically works like a video projector. Patterns are usually generated by passing light through a digital spatial light modulator, typically based on one of the three currently most widespread digital projection technologies, transmissive liquid crystal, reflective liquid crystal on silicon (LCOS) or digital light processing (DLP; moving micro mirror) modulators, which have various comparative advantages and disadvantages for this application. Other methods of projection could be and have been used, however. Patterns generated by digital display projectors have small discontinuities due to the pixel boundaries in the displays. Sufficiently small boundaries however can practically be neglected as they are evened out by the slightest defocus. A typical measuring assembly consists of one projector and at least one camera. For many applications, two cameras on opposite sides of the projector have been established as useful. Invisible (or imperceptible) structured light uses structured light without interfering with other computer vision tasks for which the projected pattern will be confusing. Example methods include the use of infrared light or of extremely high framerates alternating between two exact opposite patterns. === Calibration === Geometric distortions by optics and perspective must be compensated by a calibration of the measuring equipment, using special calibration patterns and surfaces. A mathematical model is used for describing the imaging properties of projector and cameras. Essentially based on the simple geometric properties of a pinhole camera, the model also has to take into account the geometric distortions and optical aberration of projector and camera lenses. The parameters of the camera as well as its orientation in space can be determined by a series of calibration measurements, using photogrammetric bundle adjustment. === Analysis of stripe patterns === There are several depth cues contained in the observed stripe patterns. The displacement of any single stripe can directly be converted into 3D coordinates. For this purpose, the individual stripe has to be identified, which can for example be accomplished by tracing or counting stripes (pattern recognition method). Another common method projects alternating stripe patterns, resulting in binary Gray code sequences identifying the number of each individual stripe hitting the object. An important depth cue also results from the varying stripe widths along the object surface. Stripe width is a function of the steepness of a surface part, i.e. the first derivative of the elevation. Stripe frequency and phase deliver similar cues and can be analyzed by a Fourier transform. Finally, the wavelet transform has recently been discussed for the same purpose. In many practical implementations, series of measurements combining pattern recognition, Gray codes and Fourier transform are obtained for a complete and unambiguous reconstruction of shapes. Another method also belonging to the area of fringe projection has been demonstrated, utilizing the depth of field of the camera. It is also possible to use projected patterns primarily as a means of structure insertion into scenes, for an essentially photogrammetric acquisition. === Precision and range === The optical resolution of fringe projection methods depends on the width of the stripes used and their optical quality. It is also limited by the wavelength of light. An extreme reduction of stripe width proves inefficient due to limitations in depth of field, camera resolution and display resolution. Therefore, the phase shift method has been widely established: A number of at least 3, typically about 10 exposures are taken with slightly shifted stripes. The first theoretical deductions of this method relied on stripes with a sine wave shaped intensity modulation, but the methods work with "rectangular" modulated stripes, as delivered from LCD or DLP displays as well. By phase shifting, surface detail of e.g. 1/10 the stripe pitch can be resolved. Current optical stripe pattern profilometry hence allows for detail resolutions down to the wavelength of light, below 1 micrometer in practice or, with larger stripe patterns, to approx. 1/10 of the stripe width. Concerning level accuracy, interpolating over several pixels of the acquired camera image can yield a reliable height resolution and also accuracy, down to 1/50 pixel. Arbitrarily large objects can be measured with accordingly large stripe patterns and setups. Practical applications are documented involving objects several meters in size. Typical accuracy figures are: Planarity of a 2-foot (0.61 m) wide surface, to 10 micrometres (0.00039 in). Shape of a motor combustion chamber to 2 micrometres (7.9×10−5 in) (elevation), yielding a volume accuracy 10 times better than with volumetric dosing. Shape of an object 2 inches (51 mm) large, to about 1 micrometre (3.9×10−5 in) Radius of a blade edge of e.g. 10 micrometres (0.00039 in), to ±0.4 μm === Navigation === As the method can measure shapes from only one perspective at a time, complete 3D shapes have to be combined from different measurements in different angles. This can be accomplished by attaching marker points to the object and combining perspectives afterwards by matching these markers. The process can be automated, by mounting the object on a motorized turntable on robotic inspection cell, or CNC positioning device. Markers can as well be applied on a positioning device instead of the object itself. The 3D data gathered can be used to retrieve CAD (computer aided design) data and models from existing components (reverse engineering), hand formed samples or sculptures, natural objects or artifacts. === Challenges === As with all optical methods, reflective or transparent surfaces raise difficulties. Reflections cause light to be reflected either away from the camera or right into its optics. In both cases, the dynamic range of the camera can be exceeded. Transparent or semi-transparent surfaces also cause major difficulties. In these cases, coating the surfaces with a thin opaque lacquer just for measuring purposes is a common practice. A recent method handles highly reflective and specular objects by inserting a 1-dimensional diffuser between the light source (e.g., projector) and the object to be scanned. Alternative optical techniques have been proposed for handling perfectly transparent and specular objects. Double reflections and inter-reflections can cause the stripe pattern to be overlaid with unwanted ligh

    Read more →
  • Retrieval-augmented generation

    Retrieval-augmented generation

    Retrieval-augmented generation (RAG) is a technique that enables large language models (LLMs) to retrieve and incorporate new information from external data sources. With RAG, LLMs first refer to a specified set of documents, then respond to user queries. These documents supplement information from the LLM's pre-existing training data. This allows LLMs to use domain-specific and/or updated information that is not available in the training data. For example, this enables LLM-based chatbots to access internal company data or generate responses based on authoritative sources. RAG improves LLMs by incorporating information retrieval before generating responses. Unlike LLMs that rely on static training data, RAG pulls relevant text from databases, uploaded documents, or web sources. According to Ars Technica, "RAG is a way of improving LLM performance, in essence by blending the LLM process with a web search or other document look-up process to help LLMs stick to the facts." This method helps reduce AI hallucinations, which have caused chatbots to describe policies that don't exist, or recommend nonexistent legal cases to lawyers that are looking for citations to support their arguments. RAG also reduces the need to retrain LLMs with new data, saving on computational and financial costs. Beyond efficiency gains, RAG also allows LLMs to include sources in their responses, so users can verify the cited sources. This provides greater transparency, as users can cross-check retrieved content to ensure accuracy and relevance. The term retrieval-augmented generation (RAG) was introduced in a 2020 paper that described combining a parametric language model with a non-parametric external memory accessed through retrieval at inference time. == RAG and LLM limitations == LLMs can provide incorrect information. For example, when Google first demonstrated its LLM tool "Google Bard" (later re-branded to Gemini), the LLM provided incorrect information about the James Webb Space Telescope. This error contributed to a $100 billion decline in Google's stock value. RAG is used to prevent these errors, but it does not solve all the problems. For example, LLMs can generate misinformation even when pulling from factually correct sources if they misinterpret the context. MIT Technology Review gives the example of an AI-generated response stating, "The United States has had one Muslim president, Barack Hussein Obama." The model retrieved this from an academic book rhetorically titled Barack Hussein Obama: America's First Muslim President? The LLM did not "know" or "understand" the context of the title, generating a false statement. LLMs with RAG are programmed to prioritize new information. This technique has been called "prompt stuffing." Without prompt stuffing, the LLM's input is generated by a user; with prompt stuffing, additional relevant context is added to this input to guide the model's response. This approach provides the LLM with key information early in the prompt, encouraging it to prioritize the supplied data over pre-existing training knowledge. == Process == Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating an information-retrieval mechanism that allows models to access and utilize additional data beyond their original training set. Ars Technica notes that "when new information becomes available, rather than having to retrain the model, all that's needed is to augment the model's external knowledge base with the updated information" ("augmentation"). IBM states that "in the generative phase, the LLM draws from the augmented prompt and its internal representation of its training data to synthesize" an answer. === RAG key stages === Typically, the data to be referenced is converted into LLM embeddings, numerical representations in the form of a large vector space. RAG can be used on unstructured (usually text), semi-structured, or structured data (for example knowledge graphs). These embeddings are then stored in a vector database to allow for document retrieval. Given a user query, a document retriever is first called to select the most relevant documents that will be used to augment the query. This comparison can be done using a variety of methods, which depend in part on the type of indexing used. The model feeds this relevant retrieved information into the LLM via prompt engineering of the user's original query. Newer implementations (as of 2023) can also incorporate specific augmentation modules with abilities such as expanding queries into multiple domains and using memory and self-improvement to learn from previous retrievals. Finally, the LLM can generate output based on both the query and the retrieved documents. Some models incorporate extra steps to improve output, such as the re-ranking of retrieved information, context selection, and fine-tuning. == Applications == Retrieval-augmented generation is used in applications where generated responses need to be grounded in external or frequently updated information. Commonly cited use cases include search engines, question-answering systems, customer support chatbots, enterprise knowledge assistants, content generation, recommendation systems, retail and e-commerce, and industrial or manufacturing workflows. In healthcare, RAG has been studied as a way to ground large language model outputs in external medical knowledge sources, although reviews have noted continuing challenges around evaluation, ethics, and clinical reliability. == Improvements == Improvements to the basic process above can be applied at different stages in the RAG flow. === Encoder === These methods focus on the encoding of text as either dense or sparse vectors. Sparse vectors, which encode the identity of a word, are typically dictionary-length and contain mostly zeros. Dense vectors, which encode meaning, are more compact and contain fewer zeros. Various enhancements can improve the way similarities are calculated in the vector stores (databases). Performance improves by optimizing how vector similarities are calculated. Dot products enhance similarity scoring, while approximate nearest neighbor (ANN) searches improve retrieval efficiency over K-nearest neighbors (KNN) searches. Accuracy may be improved with Late Interactions, which allow the system to compare words more precisely after retrieval. This helps refine document ranking and improve search relevance. Hybrid vector approaches may be used to combine dense vector representations with sparse one-hot vectors, taking advantage of the computational efficiency of sparse dot products over dense vector operations. Other retrieval techniques focus on improving accuracy by refining how documents are selected. Some retrieval methods combine sparse representations, such as SPLADE, with query expansion strategies to improve search accuracy and recall. === Retriever-centric methods === These methods aim to enhance the quality of document retrieval in vector databases: Pre-training the retriever using the Inverse Cloze Task (ICT), a technique that helps the model learn retrieval patterns by predicting masked text within documents. Supervised retriever optimization aligns retrieval probabilities with the generator model's likelihood distribution. This involves retrieving the top-k vectors for a given prompt, scoring the generated response's perplexity, and minimizing KL divergence between the retriever's selections and the model's likelihoods to refine retrieval. Reranking techniques can refine retriever performance by prioritizing the most relevant retrieved documents during training. === Language model === By redesigning the language model with the retriever in mind, a 25-time smaller network can get comparable perplexity as its much larger counterparts. Because it is trained from scratch, this method (Retro) incurs the high cost of training runs that the original RAG scheme avoided. The hypothesis is that by giving domain knowledge during training, Retro needs less focus on the domain and can devote its smaller weight resources only to language semantics. The redesigned language model is shown here. It has been reported that Retro is not reproducible, so modifications were made to make it so. The more reproducible version is called Retro++ and includes in-context RAG. === Chunking === Chunking involves various strategies for breaking up the data into vectors so the retriever can find details in it. Three types of chunking strategies are: Fixed length with overlap. This is fast and easy. Overlapping consecutive chunks helps to maintain semantic context across chunks. Syntax-based chunks can break the document up into sentences. Libraries such as spaCy or NLTK can also help. File format-based chunking. Certain file types have natural chunks built in, and it's best to respect them. For example, code files are best chunked and vectorized as whole functions or classes. HTML files should leave

    or base64 encoded elements

    Read more →
  • Nolot

    Nolot

    Nolot is a chess test suite with 11 positions from real games. They were compiled by Pierre Nolot (French: [nɔ.lo]) for the French chess magazine Gambisco and posted on the rec.games.chess Usenet group in 1994. They were designed to be particularly hard to solve for chess engines to solve at the time, although modern engines can find a solution near-instantaneously. == Problem 1 == FEN: r3qb1k/1b4p1/p2pr2p/3n4/Pnp1N1N1/6RP/1B3PP1/1B1QR1K1 w - - 0 1 26.Nxh6!! c3 (26... Rxh6 27.Nxd6 Qh5 (best) 28.Rg5! Qxd1 29.Nf7+ Kg8 30.Nxh6+ Kh8 31.Rxd1 c3 32.Nf7+ Kg8 33.Bg6! Nf4 34.Bxc3 Nxg6 35.Bxb4 Kxf7 36.Rd7+ Kf6 37.Rxg6+ Kxg6 38.Rxb7 ±) 27.Nf5! cxb2 28.Qg4 Bc8 (28... g6!? 29.Kh2! 29.Qd7 30.Nh4 Bc6 31.Nc5! dxc 32.Rxe6 Nf6 33.Nxg6+ Kg7 34.Qg5 Nbd5 35.Ne5 Kh8 36.Nxd7 ±) 29.Qh4+ Rh6 30.Nxh6 gxh6 31.Kh2! Qe5 32.Ng5 Qf6 33.Re8 Bf5 34.Qxh6 (missing a mate in 6: 34.Nf7+ Qxf7 35.Qxh6+ Bh7 36.Rxa8 Nf6 37.Rxf8 Qxf8 38.Qxf8+ Ng8 39.Qg7#) 34...Qxh6 35.Nf7+ Kh7 36.Bxf5+ Qg6 37.Bxg6+ Kg7 38.Rxa8 Be7 39.Rb8 a5 40.Be4+ Kxf7 41.Bxd5+ 1–0 The best Novag computer, the Diablo 68000, finds 26. Nxh6 after seven and a half months (Pierre Nolot has let it run on the position for 14 months and one day, until a power failure stopped an analysis of over 80,000,000,000 nodes.) but for wrong reasons: it evaluates white's position as inferior and thinks this move would enable it to draw. Today Gambit Tiger 2.0 for example can find it quite quickly: Most free engines running on 64-bit processors in 2010 could solve this problem and the others in a few seconds. 1.Qd4 c3 2.Bxc3 Nxc3 3.Qxb4 Nxe4 4.Qxb7 Rb8 5.Qxb8 Qxb8 6.Bxe4 d5 7.Rb1 μ (-1.20) Depth: 12 00:00:09 6055 kN 1.Nxh6 c3 2.Nf5 cxb2 3.Qg4 Rb8 4.Nxg7 Rg6 5.Qxg6 Qxg6 6.Rxg6 Bxg7 7.Nxd6 ³ (-0.48) Depth: 12 00:00:21 14368 kN 1.Nxh6 c3 2.Nf5 cxb2 3.Qg4 Rc8 4.Nxg7 Rg6 5.Nxe8 Rxg4 6.Rxg4 Rxe8 7.Rg6 μ (-0.74) Depth: 13 00:00:55 38455 kN 1.Ne3 Rxe4 2.Bxe4 Qxe4 3.Nxd5 Qxd5 4.Qc1 Qf5 5.Qxh6+ Qh7 6.Qe6 Nd3 7.Re2 Nxb2 8.Rxb2 ³ (-0.58) Depth: 13 00:01:30 62979 kN 1.Ne3 Rxe4 ³ (-0.58) Depth: 14 00:02:02 84941 kN 1.Ne3 Nxe3 2.Rexe3 Bxe4 3.Qg4 Rg6 4.Qxe4 Qxe4 5.Bxe4 Rxg3 6.Rxg3 d5 7.Bf5 Re8 8.Bc3 ³ (-0.30) Depth: 15 00:03:05 128968 kN 1.Nxh6 ² (0.32) Depth: 15 00:07:58 350813 kN With the next ply showing a clear advantage. Stockfish 14dev 64bit 4CPU running on 2020 hardware recognises the significance of Nxh6!! in 1 second. Stockfish_21092606_x64_avx2: NNUE evaluation using nn-13406b1dcbe0.nnue enabled. 19/32 00:01 7708k 4882k +3,00 Nxh6 Rxh6 Nxd6 Qh5 Bg6 Qxd1 Nf7+ Kg8 Nxh6+ gxh6 Bh5+ Kh7 Rxd1 c3 Bxc3 Nxc3 Rd7+ Kh8 Rxb7 Ne4 Re3 Nxf2 Kxf2 Bc5 Ke2 Bxe3 Kxe3 Nd5+ Kf2 49/73 15:02 5118270k 5673k +6,15 Nxh6 Rxh6 Nxd6 Qh5 Rg5 Qxd1 Nf7+ Kg8 Nxh6+ Kh8 Rxd1 c3 Nf7+ Kg8 Bg6 Nf4 Bxc3 Nbd5 Rb1 Bc6 Bd2 Nxg6 Rxg6 Ne7 Rxc6 Nxc6 Rb6 Rc8 Ng5 a5 Ra6 Bb4 Be3 Ne5 Bd4 Nc6 Bb6 Bd2 h4 Kf8 Bc5+ Kg8 Be3 Bxe3 fxe3 Kf8 Kf2 Ke7 Nf3 Kd7 Rb6 Ne7 Rb5 Kd6 Rxa5 Rc2+ Kg3 Re2 Nd4 Rxe3+ Kf4 Rd3 Nf5+ Kc7 Nxe7 == Problem 2 == FEN: r4rk1/pp1n1p1p/1nqP2p1/2b1P1B1/4NQ2/1B3P2/PP2K2P/2R5 w - - 0 1 22.Rxc5!! Nxc5 23.Nf6+ Kh8 24.Qh4 Qb5+ (computers think there is perpetual check here, but...) 25.Ke3! 25... h5 26.Nxh5 Qxb3+ (26... d5+ 27.Bxd5 Qd3 28.Kf2 Ne4+ 29.Bxe4 Qd4+ 30.Kg2 Qxb2+ 31.Kh3 ±) and White won in 41 moves. Today Deep Junior 8.ZX for example finds it very quickly (around 1 minute): 1.Kd1 Rac8 2.Bh6 Qb5 3.Rc3 Qf1+ 4.Kc2 Rc6 5.Bxf8 −+ (-2.11) Depth: 12 00:00:04 10422 kN 1.Nxc5 Nxc5 2.Rxc5 Qxc5 3.e6 Rae8 4.e7 Nc8 5.Kf1 Nxd6 6.Bf6 b5 −+ (-2.10) Depth: 12 00:00:14 25054 kN 1.Bf6! μ (-1.35) Depth: 12 00:00:17 34601 kN 1.Bf6 Qb5+ 2.Ke1 Bb4+ 3.Kf2 Bc5+ = (0.00) Depth: 12 00:00:20 34601 kN 1.Bf6 Qb5+ 2.Ke1 Nxf6 3.Nxf6+ Kg7 4.Nh5+ gxh5 5.Qf6+ Kg8 6.Qg5+ Kh8 7.Qf6+ = (0.00) Depth: 15 00:01:01 130544 kN 1.Rxc5! = (0.15) Depth: 15 00:01:12 145875 kN 1.Rxc5 Nxc5 2.Nf6+ Kh8 3.Qh4 Qb5+ 4.Ke3 h5 5.Nxh5 Qd3+ 6.Kf2 Ne4+ 7.fxe4 Qd4+ 8.Kf1 Qd3+ 9.Ke1 Qb1+ 10.Bd1 ± (2.18) Depth: 15 00:01:18 145875 kN Stockfish 14dev 64bit 4CPU running on 2020 hardware recognises the significance of Rxc5!! in 1 second. Stockfish_21092606_x64_avx2: NNUE evaluation using nn-13406b1dcbe0.nnue enabled. 21/25 00:01 5822k 5545k +6,61 Rxc5 Qxc5 Nxc5 Nxc5 Bh6 Nbd7 Bxf8 Rxf8 Qe3 Rc8 f4 Nxe5 Qxe5 Ne6 Bxe6 Rc2+ Kd3 Rxh2 46/86 11:27 5057055k 7355k +7,61 Rxc5 Qxc5 Nxc5 Nxc5 Bf6 Ne6 Qh6 Nd4+ Kf2 Nf5 Qg5 Nd7 h4 Nxf6 Qxf6 Ng7 d7 b5 Bd5 Rab8 b4 Nh5 Bxf7+ Rxf7 d8R+ Rxd8 Qxd8+ Rf8 Qd5+ Kg7 e6 Kf6 Qd7 Ng7 Qd4+ Kxe6 Qxg7 Rf7 Qc3 Ke7 Qc5+ Ke8 Qc8+ Ke7 h5 gxh5 Kg3 h4+ Kh2 h6 Qc5+ Kf6 Qxb5 Kg7 f4 Rxf4 Qe5+ Rf6 b5 h3 Qd4 Kg8 Qxf6 h5 Blacks 22. .. Nxc5 is suboptimal and leads faster mate 77/44 09:18 6987714k 12518k +M22 Nf6+ Kh8 Qh4 Qb5+ Ke3 Qxb3+ axb3 h5 Nxh5 Nd5+ Kd4 Ne6+ Kxd5 Nxg5 Qxg5 gxh5 f4 Rad8 f5 f6 Qxh5+ Kg7 Qg6+ Kh8 e6 b6 e7 Rb8 exf8Q+ Rxf8 Ke6 b5 Ke7 Rb8 Qh5+ Kg7 Qf7+ Kh8 Kxf6 Rf8 Qxf8+ Kh7 Qg7+ == Problem 3 == FEN: r2qk2r/ppp1b1pp/2n1p3/3pP1n1/3P2b1/2PB1NN1/PP4PP/R1BQK2R w KQkq - 0 1 12.Nxg5!! Bxd1 13.Nxe6 Qb8 14.Nxg7+!! Kf8 15.Bh6! Bg4 16.0-0+ Kg8 17.Rf4 ± White wins with a queen sac but black has defensive resources. Stockfish 8 64bit 3CPU running on 2016 hardware recognizes the significance of Nxg5!! in 55 seconds. Stockfish 14 dev (Stockfish_21092606_x64_avx2) 64bit 4CPU running on 2020 hardware recognizes the significance of Nxg5!! in 1 second. NNUE evaluation using nn-13406b1dcbe0.nnue enabled. 21/34 00:01 8291k 4530k +2,78 Nxg5 Bxd1 Nxe6 Qb8 Nxg7+ Kd8 Kxd1 b5 N3f5 Bf8 Rf1 Kc8 Nh5 Kb7 Bxb5 Ne7 g4 a6 Ba4 Nxf5 gxf5 Ka7 Nf4 c5 47/59 37:49 10390430k 4578k +3,16 Nxg5 Bxd1 Nxe6 Qb8 Nxg7+ Kd8 Kxd1 b5 Rf1 Kc8 N3f5 Bf8 Ne6 Kd7 Nf4 Ne7 g4 a5 Ke2 Qb7 h4 Ra6 a3 Kc8 Be3 Kb8 Kf3 Rb6 Bd2 Qc8 Kg3 c5 Be3 c4 Nxe7 Bxe7 Bf5 Qd8 h5 Qg8 Kh3 Bg5 Rf3 Ra6 Raf1 b4 Nxd5 Qxd5 Bxg5 bxc3 bxc3 Rb6 Be3 Rb3 Blacks 14 .. Kf8 is suboptimal and leads loss fast 41/68 06:31 3269727k 8350k +9,28 Bh6 Kg8 Rxd1 Bf8 N3h5 Bxg7 Nxg7 Qf8 Nf5 Ne7 Bxf8 Nxf5 Bxf5 Rxf8 Be6+ Kg7 Rd3 Rf4 Bxd5 c6 Rg3+ Kf8 Rf3 Rxf3 Bxf3 Kg7 Rf1 Re8 Be4 Re6 Ke2 a5 Ke3 Rh6 h3 a4 Kf4 Re6 h4 Re8 Ke3 h6 h5 Rf8 Rxf8 Kxf8 == Problem 4 == FEN: r1b1kb1r/1p1n1ppp/p2ppn2/6BB/2qNP3/2N5/PPP2PPP/R2Q1RK1 w kq - 0 1 10.Nxe6!! Qxe6 11.Nd5 Kd8 12.Bg4 Qe5 13.f4 Qxe4 (13...Qxb2 stronger but not sufficient: 14.Bxd7 Bxd7 15.Rb1 Qa3 16.Nxf6 Bb5 17.Qd4 Qc5 18.Rfd1 ±) 14.Bxd7 Bxd7 15.Nxf6 gxf6 16.Bxf6+ Kc7 17.Bxh8 and Black resigned on move 27. Stockfish 14dev 64bit 4CPU running on 2020 hardware recognises the significance of 10.Nxe6 in 1 second. Stockfish_21092606_x64_avx2: NNUE evaluation using nn-13406b1dcbe0.nnue enabled. 22/37 00:01 6955k 5367k +4,00 Nxe6 Qxe6 Nd5 Kd8 Bg4 Qe5 f4 Qxb2 Rb1 Qa3 Bxd7 Bxd7 Nxf6 Bb5 Rf3 Qxa2 c4 Bxc4 Rf2 Qa5 Nd5+ f6 Nxf6 Kc7 Rc1 b5 Qd5 gxf6 Bxf6 Kb8 Rxc4 Qe1+ Rf1 51/70 47:10 14538911k 5137k +5,76 Nxe6 Qxe6 Nd5 Kd8 Bg4 Qe5 f4 Qxe4 Bxd7 Bxd7 Nxf6 Qf5 Qd4 Kc8 Nd5 Bc6 c4 f6 Nb6+ Kb8 Bh4 Be7 Rae1 Bd8 Nxa8 Kxa8 Bf2 Kb8 Qxd6+ Bc7 Ba7+ Kc8 Qe6+ Qxe6 Rxe6 h5 h4 Rd8 Re7 g6 Be3 Ba5 Kf2 Rd6 Rc1 Bd8 Rg7 Be4 Rg8 Kd7 c5 Rd3 Rc4 Bd5 Rg7+ Ke6 Rd4 Rxd4 Bxd4 Kf5 Rd7 Bc6 Rxd8 Kxf4 Bxf6 == Problem 5 == FEN: r2qrb1k/1p1b2p1/p2ppn1p/8/3NP3/1BN5/PPP3QP/1K3RR1 w - - 0 1 21.e5!! dxe5 22.Ne4! Nh5 23.Qg6!? (stronger is 23.Qg4!! Nf4 24.Nf3 Qc7 25.Nh4 ± ) 23...exd4? (23...Nf4 24.Rxf4! exf4 25.Nf3! Qb6 26.Rg5!! covering b5 and threatening Nf6 or Ne5-f7+) 24.Ng5 1−0 Stockfish 8 64bit 3CPU running on 2016 hardware recognises the significance of 21.e5 in 5 seconds. Stockfish 12 dev (Stockfish_20062212_x64_modern) 64bit 1CPU running on 2016 hardware recognizes the significance of 21.e5 in 11 seconds. 25/42 00:06 7 963k 1309k +6,93 e5 Nh5 Ne4 dxe5 Nf3 Nf4 Qg4 Qc7 Nh4 Bc6 Nf6 g5 Rxf4 exf4 Qh5 Qe7 Ng6+ Kg7 Nxe7 Rxe7 Ng4 37/62 03:12 298 083k 1545k +10,70 e5 Ng4 Qxg4 Qg5 Qh3 Qxe5 Nde2 g5 Rxf8+ Kg7 Rff1 Rf8 Re1 Qf5 Qg3 Rad8 Nd4 Qf4 Nxe6+ Bxe6 Rxe6 Qxg3 == Problem 6 == FEN: rnbqk2r/1p3ppp/p7/1NpPp3/QPP1P1n1/P4N2/4KbPP/R1B2B1R b kq - 0 1 13... axb5!! offers an exchange to keep the white queen out of play. 14.Qxa8 Bd4 15.Nxd4 cxd4 16.Qxb8 0-0! 17.Ke1 Qh4 18.g3 Qf6 19.Bf4 g5? (Ivanchuk found 19...d3! during post-game analysis.) 20.Rc1 exf4 21.Qxf4 Qd4 22.Rd1 bxc4 23.e5 Qc3+ 24.Rd2 Re8 25.Bxd3 cxd3 −+ Tasc R30 finds 19... d3! in 2 1/2 hours. 19... Bf5!! is even stronger than 19... d3. Position is already lost at 19... d3 +8.00 for black, ... Bf5 not much better Stockfish 14dev 64bit 4CPU running on 2020 hardware recognises the significance of axb5!! in 1 second. Stockfish_21092606_x64_avx2: NNUE evaluation using nn-13406b1dcbe0.nnue enabled. 21/28 00:01 9264k 4714k -1,22 axb5 Qxa8 Bd4 Nxd4 cxd4 h3 Nf6 Bg5 0-0 cxb5 h6 Bxf6 Qxf6 Re1 Nd7 Kd1 Qg6 Qa4 Qg3 Qc2 Qxa3 Bd3 Qxb4 Qb1 46/67 1:05:00 18113493k 4644k -2,40 axb5 Qxa8 Bd4 h3 Nf6 Nxd4 exd4 Kf2 Nxe4+ Kg1 Nd7 Bg5 Qxg5 Qxc8+ Ke7 Qc7 Qe5 d6+ Qxd6 Qxd6+ Kxd6 bxc5+ Ndxc5 cxb5 d3 h4 d2 Rh3 Ke5 Be2 f5 Ra2 Rd8 Bd1 Rd4 Re3 f4 Re2 b6 a4 Kd6 Rc2 Kd5 Ra2 h6 Rb2 Nxa4 Bxa4 Rxa4 Rexd2+ Nxd2 Rxd2+ Kc4 Rd7 g6 == Problem 7 == FEN 1r1bk2r/2R2ppp/p3p3/1b2P2q/4QP2/4N3/1B4PP/3R2K1 w k - 0 1 1.Rxd8+!! Rxd8 (1...Kxd8 2.Ra7! Qe2 3.Qd4+ Ke8 4.h3 Qe1+ 5.Kh2 Rd8 6.Qc5 Qh4 7.Ba3 Rd7 8.Ra8+ Rd8 9.g3 1−0)

    Read more →
  • Point distribution model

    Point distribution model

    The point distribution model is a model for representing the mean geometry of a shape and some statistical modes of geometric variation inferred from a training set of shapes. == Background == The point distribution model concept has been developed by Cootes, Taylor et al. and became a standard in computer vision for the statistical study of shape and for segmentation of medical images where shape priors really help interpretation of noisy and low-contrasted pixels/voxels. The latter point leads to active shape models (ASM) and active appearance models (AAM). Point distribution models rely on landmark points. A landmark is an annotating point posed by an anatomist onto a given locus for every shape instance across the training set population. For instance, the same landmark will designate the tip of the index finger in a training set of 2D hands outlines. Principal component analysis (PCA), for instance, is a relevant tool for studying correlations of movement between groups of landmarks among the training set population. Typically, it might detect that all the landmarks located along the same finger move exactly together across the training set examples showing different finger spacing for a flat-posed hands collection. == Details == First, a set of training images are manually landmarked with enough corresponding landmarks to sufficiently approximate the geometry of the original shapes. These landmarks are aligned using the generalized procrustes analysis, which minimizes the least squared error between the points. k {\displaystyle k} aligned landmarks in two dimensions are given as X = ( x 1 , y 1 , … , x k , y k ) {\displaystyle \mathbf {X} =(x_{1},y_{1},\ldots ,x_{k},y_{k})} . It's important to note that each landmark i ∈ { 1 , … k } {\displaystyle i\in \lbrace 1,\ldots k\rbrace } should represent the same anatomical location. For example, landmark #3, ( x 3 , y 3 ) {\displaystyle (x_{3},y_{3})} might represent the tip of the ring finger across all training images. Now the shape outlines are reduced to sequences of k {\displaystyle k} landmarks, so that a given training shape is defined as the vector X ∈ R 2 k {\displaystyle \mathbf {X} \in \mathbb {R} ^{2k}} . Assuming the scattering is gaussian in this space, PCA is used to compute normalized eigenvectors and eigenvalues of the covariance matrix across all training shapes. The matrix of the top d {\displaystyle d} eigenvectors is given as P ∈ R 2 k × d {\displaystyle \mathbf {P} \in \mathbb {R} ^{2k\times d}} , and each eigenvector describes a principal mode of variation along the set. Finally, a linear combination of the eigenvectors is used to define a new shape X ′ {\displaystyle \mathbf {X} '} , mathematically defined as: X ′ = X ¯ + P b {\displaystyle \mathbf {X} '={\overline {\mathbf {X} }}+\mathbf {P} \mathbf {b} } where X ¯ {\displaystyle {\overline {\mathbf {X} }}} is defined as the mean shape across all training images, and b {\displaystyle \mathbf {b} } is a vector of scaling values for each principal component. Therefore, by modifying the variable b {\displaystyle \mathbf {b} } an infinite number of shapes can be defined. To ensure that the new shapes are all within the variation seen in the training set, it is common to only allow each element of b {\displaystyle \mathbf {b} } to be within ± {\displaystyle \pm } 3 standard deviations, where the standard deviation of a given principal component is defined as the square root of its corresponding eigenvalue. PDM's can be extended to any arbitrary number of dimensions, but are typically used in 2D image and 3D volume applications (where each landmark point is R 2 {\displaystyle \mathbb {R} ^{2}} or R 3 {\displaystyle \mathbb {R} ^{3}} ). == Discussion == An eigenvector, interpreted in euclidean space, can be seen as a sequence of k {\displaystyle k} euclidean vectors associated to corresponding landmark and designating a compound move for the whole shape. Global nonlinear variation is usually well handled provided nonlinear variation is kept to a reasonable level. Typically, a twisting nematode worm is used as an example in the teaching of kernel PCA-based methods. Due to the PCA properties: eigenvectors are mutually orthogonal, form a basis of the training set cloud in the shape space, and cross at the 0 in this space, which represents the mean shape. Also, PCA is a traditional way of fitting a closed ellipsoid to a Gaussian cloud of points (whatever their dimension): this suggests the concept of bounded variation. The idea behind PDMs is that eigenvectors can be linearly combined to create an infinity of new shape instances that will 'look like' the one in the training set. The coefficients are bounded alike the values of the corresponding eigenvalues, so as to ensure the generated 2n/3n-dimensional dot will remain into the hyper-ellipsoidal allowed domain—allowable shape domain (ASD).

    Read more →
  • Deep image prior

    Deep image prior

    Deep image prior is a type of convolutional neural network used to enhance a given image with no prior training data other than the image itself. A neural network is randomly initialized and used as prior to solve inverse problems such as noise reduction, super-resolution, and inpainting. Image statistics are captured by the structure of a convolutional image generator rather than by any previously learned capabilities. == Method == === Background === Inverse problems such as noise reduction, super-resolution, and inpainting can be formulated as the optimization task x ∗ = m i n x E ( x ; x 0 ) + R ( x ) {\displaystyle x^{}=min_{x}E(x;x_{0})+R(x)} , where x {\displaystyle x} is an image, x 0 {\displaystyle x_{0}} a corrupted representation of that image, E ( x ; x 0 ) {\displaystyle E(x;x_{0})} is a task-dependent data term, and R(x) is the regularizer. Deep neural networks learn a generator/decoder x = f θ ( z ) {\displaystyle x=f_{\theta }(z)} which maps a random code vector z {\displaystyle z} to an image x {\displaystyle x} . The image corruption method used to generate x 0 {\displaystyle x_{0}} is selected for the specific application. === Specifics === In this approach, the R ( x ) {\displaystyle R(x)} prior is replaced with the implicit prior captured by the neural network (where R ( x ) = 0 {\displaystyle R(x)=0} for images that can be produced by a deep neural networks and R ( x ) = + ∞ {\displaystyle R(x)=+\infty } otherwise). This yields the equation for the minimizer θ ∗ = a r g m i n θ E ( f θ ( z ) ; x 0 ) {\displaystyle \theta ^{}=argmin_{\theta }E(f_{\theta }(z);x_{0})} and the result of the optimization process x ∗ = f θ ∗ ( z ) {\displaystyle x^{}=f_{\theta ^{}}(z)} . The minimizer θ ∗ {\displaystyle \theta ^{}} (typically a gradient descent) starts from a randomly initialized parameters and descends into a local best result to yield the x ∗ {\displaystyle x^{}} restoration function. ==== Overfitting ==== A parameter θ may be used to recover any image, including its noise. However, the network is reluctant to pick up noise because it contains high impedance while useful signal offers low impedance. This results in the θ parameter approaching a good-looking local optimum so long as the number of iterations in the optimization process remains low enough not to overfit data. === Deep Neural Network Model === Typically, the deep neural network model for deep image prior uses a U-Net like model without the skip connections that connect the encoder blocks with the decoder blocks. The authors in their paper mention that "Our findings here (and in other similar comparisons) seem to suggest that having deeper architecture is beneficial, and that having skip-connections that work so well for recognition tasks (such as semantic segmentation) is highly detrimental." == Applications == === Denoising === The principle of denoising is to recover an image x {\displaystyle x} from a noisy observation x 0 {\displaystyle x_{0}} , where x 0 = x + ϵ {\displaystyle x_{0}=x+\epsilon } . The distribution ϵ {\displaystyle \epsilon } is sometimes known (e.g.: profiling sensor and photon noise) and may optionally be incorporated into the model, though this process works well in blind denoising. The quadratic energy function E ( x , x 0 ) = | | x − x 0 | | 2 {\displaystyle E(x,x_{0})=||x-x_{0}||^{2}} is used as the data term, plugging it into the equation for θ ∗ {\displaystyle \theta ^{}} yields the optimization problem m i n θ | | f θ ( z ) − x 0 | | 2 {\displaystyle min_{\theta }||f_{\theta }(z)-x_{0}||^{2}} . === Super-resolution === Super-resolution is used to generate a higher resolution version of image x. The data term is set to E ( x ; x 0 ) = | | d ( x ) − x 0 | | 2 {\displaystyle E(x;x_{0})=||d(x)-x_{0}||^{2}} where d(·) is a downsampling operator such as Lanczos that decimates the image by a factor t. === Inpainting === Inpainting is used to reconstruct a missing area in an image x 0 {\displaystyle x_{0}} . These missing pixels are defined as the binary mask m ∈ { 0 , 1 } H × V {\displaystyle m\in \{0,1\}^{H\times V}} . The data term is defined as E ( x ; x 0 ) = | | ( x − x 0 ) ⊙ m | | 2 {\displaystyle E(x;x_{0})=||(x-x_{0})\odot m||^{2}} (where ⊙ {\displaystyle \odot } is the Hadamard product). The intuition behind this is that the loss is computed only on the known pixels in the image, and the network is going to learn enough about the image to fill in unknown parts of the image even though the computed loss doesn't include those pixels. This strategy is used to remove image watermarks by treating the watermark as missing pixels in the image. === Flash–no-flash reconstruction === This approach may be extended to multiple images. A straightforward example mentioned by the author is the reconstruction of an image to obtain natural light and clarity from a flash–no-flash pair. Video reconstruction is possible but it requires optimizations to take into account the spatial differences. == Implementations == A reference implementation rewritten in Python 3.6 with the PyTorch 0.4.0 library was released by the author under the Apache 2.0 license: deep-image-prior A TensorFlow-based implementation written in Python 2 and released under the CC-SA 3.0 license: deep-image-prior-tensorflow A Keras-based implementation written in Python 2 and released under the GPLv3: machine_learning_denoising == Example == See Astronomy Picture of the Day (APOD) of 2024-02-18

    Read more →