AI Face Lift

AI Face Lift — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Supper (Spotify)

    Supper (Spotify)

    Supper is a web-based application on the Spotify digital music streaming platform. The Supper app was born from a group of friends who had backgrounds in the music and gastronomy industries. Digital music solutions company Artisan Council later executed it. The app now sits in the top 40 applications on Spotify. == About == The Supper Spotify application matches recipes for all occasions and skill levels with a playlist for both preparation and presentation, as envisioned by the chefs themselves. Supper is credited with being one of the first apps to pair music with food. Playing on the social nature of music and food culture, users can seamlessly experience both for the first time with real time music streaming. == Supper.mx == In May 2014 Supper was launched outside of the Spotify streaming platform. Though still in partnership with Spotify, supper.mx allows users to view Supper's music + food collaborations on mobile, tablet and desktop, without the need to download Spotify directly. == Curators == All of the recipes and playlists featured on the Supper app come straight from a growing network of tastemakers, including chefs, musicians and institutions around the world. Each month the recipes and playlists are updated in conjunction with current holidays, events and seasons. === Launch === Launching in October 2013 the first edition of Supper featured content from a range of eating institutions and culture makers from the US and Australia. Brooklyn Bowl (Brooklyn) Roberta's Pizza (Brooklyn) Fancy Hanks (Melbourne) The Foresters/Queenies Upstairs (Sydney) Hipstamatic Panama House (Bondi) Sweetwater Inn (Melbourne) Soul Clap (Syd record label) Yellow Birds (Melbourne) === November 2013 === Yardbird (Hong Kong) Sonoma Bakery (Sydney) Do or Dine (Brooklyn) Cameo Gallery (Brooklyn) Hypertrak (Blog) Blue Smoke (NYC) The Crepes of Wrath (Blog) Willin Low // Wild Rocket - Wild Oats - Relish === December 2013 === The Copper Mill (Sydney) Thug Kitchen Mamak (Sydney) Tutu's (Brooklyn) Chin Chin (Melbourne) Flat Iron Steak (London) Greasy Spoon (Copenhagen) === January 2014 === Mexicali Taco & Co. (LA) Church & State (LA) Salts Cure (LA) Nopa (SF) L & E Oyster (LA) 4100 bar (LA) Golden Gopher (LA) The Pie Hole (LA) State Bird Provisions (SF) === Momofuku === In February 2014 Supper teamed up with restaurant heavy weights Momofuku. The recipes featured came from their iconic New York, Toronto and Sydney restaurants. Head office also got involved with an instructional from Brand Director Sue Chan on how to paint Momofuku vibes on to any party. === SXSW === March sees the Supper team migrate to Austin, Texas for SXSW, bringing together the best eateries the city has to offer as well as the music that has influenced them. Restaurants and eateries on board in 2014 included: The Backspace Kelis Swifts Attic Uchi Jackalope Paul Qui/East Side King Thai Kun Wonderland Hole in the Wall Justine's Brasserie The Liberty === Kelis === In April 2014 Kelis presented 5 of her recipes paired with a personal playlist for Supper. Kelis shared her recipes for apple farro, jerk ribs, New York vanilla bean cheesecake and Jerk Ribs. The Kelis/Supper collaboration coincided with the release of Kelis' 2014 album titled 'Food'. === Roberta's Pizza === In May 2014 Bushwick's Roberta's Pizza was guest curator on the Supper app and website. Included in their selections were restaurants and bars from across New York including Bun-ker Vietnamese, Old Stanley's Bar, St. Anselm, Chuko, Frank's Cocktail Lounge, Junior's Cheesecake, Xi'an Famous Foods, Xe Lua, 124 Old Rabbit and Yuji Ramen.

    Read more →
  • Linear genetic programming

    Linear genetic programming

    "Linear genetic programming" is unrelated to "linear programming". Linear genetic programming (LGP) is a particular method of genetic programming wherein computer programs in a population are represented as a sequence of register-based instructions from an imperative programming language or machine language. The adjective "linear" stems from the fact that each LGP program is a sequence of instructions and the sequence of instructions is normally executed sequentially. Like in other programs, the data flow in LGP can be modeled as a graph that will visualize the potential multiple usage of register contents and the existence of structurally noneffective code (introns) which are two main differences of this genetic representation from the more common tree-based genetic programming (TGP) variant. Like other Genetic Programming methods, Linear genetic programming requires the input of data to run the program population on. Then, the output of the program (its behaviour) is judged against some target behaviour, using a fitness function. However, LGP is generally more efficient than tree genetic programming due to its two main differences mentioned above: Intermediate results (stored in registers) can be reused and a simple intron removal algorithm exists that can be executed to remove all non-effective code prior to programs being run on the intended data. These two differences often result in compact solutions and substantial computational savings compared to the highly constrained data flow in trees and the common method of executing all tree nodes in TGP. Furthermore, LGP naturally has multiple outputs by defining multiple output registers and easily cooperates with control flow operations. Linear genetic programming has been applied in many domains, including system modeling and system control with considerable success. Linear genetic programming should not be confused with linear tree programs in tree genetic programming, program composed of a variable number of unary functions and a single terminal. Note that linear tree GP differs from bit string genetic algorithms since a population may contain programs of different lengths and there may be more than two types of functions or more than two types of terminals. == Examples of LGP programs == Because LGP programs are basically represented by a linear sequence of instructions, they are simpler to read and to operate on than their tree-based counterparts. For example, a simple program written to solve a Boolean function problem with 3 inputs (in R1, R2, R3) and one output (in R0), could read like this: R1, R2, R3 have to be declared as input (read-only) registers, while R0 and R4 are declared as calculation (read-write) registers. This program is very simple, having just 5 instructions. But mutation and crossover operators could work to increase the length of the program, as well as the content of each of its instructions. Note that one instruction is non-effective or an intron (marked), since it does not impact the output register R0. Recognition of those instructions is the basis for the intron removal algorithm which is used analyze code prior to execution. Technically, this happens by copying an individual and then run the intron removal once. The copy with removed introns is then executed as many times as dictated by the number of training cases. Notably, the original individual is left intact, so as to continue participating in the evolutionary process. It is only the copy that is executed that is compressed by removing these "structural" introns. Another simple program, this one written in the LGP language Slash/A looks like a series of instructions separated by a slash: By representing such code in bytecode format, i.e. as an array of bytes each representing a different instruction, one can make mutation operations simply by changing an element of such an array.

    Read more →
  • Quantum neural network

    Quantum neural network

    Quantum neural networks are computational neural network models which are based on the principles of quantum mechanics. The first ideas on quantum neural computation were published independently in 1995 by Subhash Kak and Ron Chrisley, engaging with the theory of quantum mind, which posits that quantum effects play a role in cognitive function. However, typical research in quantum neural networks involves combining classical artificial neural network models (which are widely used in machine learning for the important task of pattern recognition) with the advantages of quantum information in order to develop more efficient algorithms. One important motivation for these investigations is the difficulty to train classical neural networks, especially in big data applications. The hope is that features of quantum computing such as quantum parallelism or the effects of interference and entanglement can be used as resources. Since the technological implementation of a quantum computer is still in a premature stage, such quantum neural network models are mostly theoretical proposals that await their full implementation in physical experiments. Most Quantum neural networks are developed as feed-forward networks. Similar to their classical counterparts, this structure intakes input from one layer of qubits, and passes that input onto another layer of qubits. This layer of qubits evaluates this information and passes on the output to the next layer. Eventually the path leads to the final layer of qubits. The layers do not have to be of the same width, meaning they don't have to have the same number of qubits as the layer before or after it. This structure is trained on which path to take similar to classical artificial neural networks. This is discussed in a lower section. Quantum neural networks refer to three different categories: Quantum computer with classical data, classical computer with quantum data, and quantum computer with quantum data. == Examples == Quantum neural network research is still in its infancy, and a conglomeration of proposals and ideas of varying scope and mathematical rigor have been put forward. Most of them are based on the idea of replacing classical binary or McCulloch-Pitts neurons with a qubit (which can be called a "quron"), resulting in neural units that can be in a superposition of the state 'firing' and 'resting'. === Quantum perceptrons === A lot of proposals attempt to find a quantum equivalent for the perceptron unit from which neural nets are constructed. A problem is that nonlinear activation functions do not immediately correspond to the mathematical structure of quantum theory, since a quantum evolution is described by linear operations and leads to probabilistic observation. Ideas to imitate the perceptron activation function with a quantum mechanical formalism reach from special measurements to postulating non-linear quantum operators (a mathematical framework that is disputed). A direct implementation of the activation function using the circuit-based model of quantum computation has recently been proposed by Schuld, Sinayskiy and Petruccione based on the quantum phase estimation algorithm. === Quantum networks === At a larger scale, researchers have attempted to generalize neural networks to the quantum setting. One way of constructing a quantum neuron is to first generalise classical neurons and then generalising them further to make unitary gates. Interactions between neurons can be controlled quantumly, with unitary gates, or classically, via measurement of the network states. This high-level theoretical technique can be applied broadly, by taking different types of networks and different implementations of quantum neurons, such as photonically implemented neurons and quantum reservoir processor (quantum version of reservoir computing). Most learning algorithms follow the classical model of training an artificial neural network to learn the input-output function of a given training set and use classical feedback loops to update parameters of the quantum system until they converge to an optimal configuration. Learning as a parameter optimisation problem has also been approached by adiabatic models of quantum computing. Quantum neural networks can be applied to algorithmic design: given qubits with tunable mutual interactions, one can attempt to learn interactions following the classical backpropagation rule from a training set of desired input-output relations, taken to be the desired output algorithm's behavior. The quantum network thus 'learns' an algorithm. === Quantum associative memory === The first quantum associative memory algorithm was introduced by Dan Ventura and Tony Martinez in 1999. The authors do not attempt to translate the structure of artificial neural network models into quantum theory, but propose an algorithm for a circuit-based quantum computer that simulates associative memory. The memory states (in Hopfield neural networks saved in the weights of the neural connections) are written into a superposition, and a Grover-like quantum search algorithm retrieves the memory state closest to a given input. As such, this is not a fully content-addressable memory, since only incomplete patterns can be retrieved. The first truly content-addressable quantum memory, which can retrieve patterns also from corrupted inputs, was proposed by Carlo A. Trugenberger. Both memories can store an exponential (in terms of n qubits) number of patterns but can be used only once due to the no-cloning theorem and their destruction upon measurement. Trugenberger, however, has shown that his probabilistic model of quantum associative memory can be efficiently implemented and re-used multiples times for any polynomial number of stored patterns, a large advantage with respect to classical associative memories. === Classical neural networks inspired by quantum theory === A substantial amount of interest has been given to a "quantum-inspired" model that uses ideas from quantum theory to implement a neural network based on fuzzy logic. == Training == Quantum Neural Networks can be theoretically trained similarly to training classical/artificial neural networks. A key difference lies in communication between the layers of a neural networks. For classical neural networks, at the end of a given operation, the current perceptron copies its output to the next layer of perceptron(s) in the network. However, in a quantum neural network, where each perceptron is a qubit, this would violate the no-cloning theorem. A proposed generalized solution to this is to replace the classical fan-out method with an arbitrary unitary that spreads out, but does not copy, the output of one qubit to the next layer of qubits. Using this fan-out Unitary ( U f {\displaystyle U_{f}} ) with a dummy state qubit in a known state (Ex. | 0 ⟩ {\displaystyle |0\rangle } in the computational basis), also known as an Ancilla bit, the information from the qubit can be transferred to the next layer of qubits. This process adheres to the quantum operation requirement of reversibility. Using this quantum feed-forward network, deep neural networks can be executed and trained efficiently. A deep neural network is essentially a network with many hidden-layers, as seen in the sample model neural network above. Since the Quantum neural network being discussed uses fan-out Unitary operators, and each operator only acts on its respective input, only two layers are used at any given time. In other words, no Unitary operator is acting on the entire network at any given time, meaning the number of qubits required for a given step depends on the number of inputs in a given layer. Since Quantum Computers are notorious for their ability to run multiple iterations in a short period of time, the efficiency of a quantum neural network is solely dependent on the number of qubits in any given layer, and not on the depth of the network. === Cost functions === To determine the effectiveness of a neural network, a cost function is used, which essentially measures the proximity of the network's output to the expected or desired output. In a Classical Neural Network, the weights ( w {\displaystyle w} ) and biases ( b {\displaystyle b} ) at each step determine the outcome of the cost function C ( w , b ) {\displaystyle C(w,b)} . When training a Classical Neural network, the weights and biases are adjusted after each iteration, and given equation 1 below, where y ( x ) {\displaystyle y(x)} is the desired output and a out ( x ) {\displaystyle a^{\text{out}}(x)} is the actual output, the cost function is optimized when C ( w , b ) {\displaystyle C(w,b)} = 0. For a quantum neural network, the cost function is determined by measuring the fidelity of the outcome state ( ρ out {\displaystyle \rho ^{\text{out}}} ) with the desired outcome state ( ϕ out {\displaystyle \phi ^{\text{out}}} ), seen in Equation 2 below. In this case, the Unitary operators are adjusted after each it

    Read more →
  • ID3 algorithm

    ID3 algorithm

    In decision tree learning, ID3 (Iterative Dichotomiser 3) is a greedy algorithm invented by Ross Quinlan used to generate a decision tree from a dataset. ID3 is the precursor to the C4.5 algorithm. The 3 in the name is meant to signify that this was Quinlan's third attempt at a model based on entropy-based splitting, and the term dichotimser is a misnomer as it implies a binary split, but the ID3 algorithm can split on multi-valued attributes. == Algorithm == The ID3 algorithm begins with the original set S {\displaystyle S} as the root node. On each iteration of the algorithm, it iterates through every unused attribute of the set S {\displaystyle S} and calculates the entropy H ( S ) {\displaystyle \mathrm {H} {(S)}} or the information gain I G ( S ) {\displaystyle IG(S)} of that attribute. It then selects the attribute which has the smallest entropy (or largest information gain) value. The set S {\displaystyle S} is then split or partitioned by the selected attribute to produce subsets of the data. (For example, a node can be split into child nodes based upon the subsets of the population whose ages are less than 50, between 50 and 100, and greater than 100.) The algorithm continues to recurse on each subset, considering only attributes never selected before. Recursion on a subset may stop in one of these cases: every element in the subset belongs to the same class; in which case the node is turned into a leaf node and labelled with the class of the examples. there are no more attributes to be selected, but the examples still do not belong to the same class. In this case, the node is made a leaf node and labelled with the most common class of the examples in the subset. there are no examples in the subset, which happens when no example in the parent set was found to match a specific value of the selected attribute. An example could be the absence of a person among the population with age over 100 years. Then a leaf node is created and labelled with the most common class of the examples in the parent node's set. Throughout the algorithm, the decision tree is constructed with each non-terminal node (internal node) representing the selected attribute on which the data was split, and terminal nodes (leaf nodes) representing the class label of the final subset of this branch. === Summary === Calculate the entropy of every attribute a {\displaystyle a} of the data set S {\displaystyle S} . Partition ("split") the set S {\displaystyle S} into subsets using the attribute for which the resulting entropy after splitting is minimized; or, equivalently, information gain is maximum. Make a decision tree node containing that attribute. Recurse on subsets using the remaining attributes. === Properties === ID3 does not guarantee an optimal solution. It can converge upon local optima. It uses a greedy strategy by selecting the locally best attribute to split the dataset on each iteration. The algorithm's optimality can be improved by using backtracking during the search for the optimal decision tree at the cost of possibly taking longer. ID3 can overfit the training data. To avoid overfitting, smaller decision trees should be preferred over larger ones. This algorithm usually produces small trees, but it does not always produce the smallest possible decision tree. ID3 is harder to use on continuous data than on factored data (factored data has a discrete number of possible values, thus reducing the possible branch points). If the values of any given attribute are continuous, then there are many more places to split the data on this attribute, and searching for the best value to split by can be time-consuming. === Usage === The ID3 algorithm is used by training on a data set S {\displaystyle S} to produce a decision tree which is stored in memory. At runtime, this decision tree is used to classify new test cases (feature vectors) by traversing the decision tree using the features of the datum to arrive at a leaf node. == The ID3 metrics == === Entropy === Entropy H ( S ) {\displaystyle \mathrm {H} {(S)}} is a measure of the amount of uncertainty in the (data) set S {\displaystyle S} (i.e. entropy characterizes the (data) set S {\displaystyle S} ). H ( S ) = ∑ x ∈ X − p ( x ) log 2 ⁡ p ( x ) {\displaystyle \mathrm {H} {(S)}=\sum _{x\in X}{-p(x)\log _{2}p(x)}} Where, S {\displaystyle S} – The current dataset for which entropy is being calculated This changes at each step of the ID3 algorithm, either to a subset of the previous set in the case of splitting on an attribute or to a "sibling" partition of the parent in case the recursion terminated previously. X {\displaystyle X} – The set of classes in S {\displaystyle S} p ( x ) {\displaystyle p(x)} – The proportion of the number of elements in class x {\displaystyle x} to the number of elements in set S {\displaystyle S} When H ( S ) = 0 {\displaystyle \mathrm {H} {(S)}=0} , the set S {\displaystyle S} is perfectly classified (i.e. all elements in S {\displaystyle S} are of the same class). In ID3, entropy is calculated for each remaining attribute. The attribute with the smallest entropy is used to split the set S {\displaystyle S} on this iteration. Entropy in information theory measures how much information is expected to be gained upon measuring a random variable; as such, it can also be used to quantify the amount to which the distribution of the quantity's values is unknown. A constant quantity has zero entropy, as its distribution is perfectly known. In contrast, a uniformly distributed random variable (discretely or continuously uniform) maximizes entropy. Therefore, the greater the entropy at a node, the less information is known about the classification of data at this stage of the tree; and therefore, the greater the potential to improve the classification here. As such, ID3 is a greedy heuristic performing a best-first search for locally optimal entropy values. Its accuracy can be improved by preprocessing the data. === Information gain === Information gain I G ( A ) {\displaystyle IG(A)} is the measure of the difference in entropy from before to after the set S {\displaystyle S} is split on an attribute A {\displaystyle A} . In other words, how much uncertainty in S {\displaystyle S} was reduced after splitting set S {\displaystyle S} on attribute A {\displaystyle A} . I G ( S , A ) = H ( S ) − ∑ t ∈ T p ( t ) H ( t ) = H ( S ) − H ( S | A ) . {\displaystyle IG(S,A)=\mathrm {H} {(S)}-\sum _{t\in T}p(t)\mathrm {H} {(t)}=\mathrm {H} {(S)}-\mathrm {H} {(S|A)}.} Where, H ( S ) {\displaystyle \mathrm {H} (S)} – Entropy of set S {\displaystyle S} T {\displaystyle T} – The subsets created from splitting set S {\displaystyle S} by attribute A {\displaystyle A} such that S = ⋃ t ∈ T t {\displaystyle S=\bigcup _{t\in T}t} p ( t ) {\displaystyle p(t)} – The proportion of the number of elements in t {\displaystyle t} to the number of elements in set S {\displaystyle S} H ( t ) {\displaystyle \mathrm {H} (t)} – Entropy of subset t {\displaystyle t} In ID3, information gain can be calculated (instead of entropy) for each remaining attribute. The attribute with the largest information gain is used to split the set S {\displaystyle S} on this iteration.

    Read more →
  • Artificial intelligence content detection

    Artificial intelligence content detection

    Artificial intelligence detection software aims to determine whether some content (text, image, video, or audio) was generated using artificial intelligence (AI). This software is often unreliable. == Accuracy issues == Many AI detection tools have been shown to be unreliable in detecting AI-generated text. In a 2023 study conducted by Weber-Wulff et al., researchers evaluated 14 detection tools including Turnitin and GPTZero and found that "all scored below 80% of accuracy and only 5 over 70%." They also found that these tools tend to have a bias for classifying texts more as human than as AI, and that accuracy of these tools worsens upon paraphrasing. === False positives === In AI content detection, a false positive is when human-written work is incorrectly flagged as AI-written. Many AI detection platforms claim to have a minimal level of false positives, with Turnitin claiming a less than 1% false positive rate. However, later research by The Washington Post produced much higher rates of 50%, though they used a smaller sample size. False positives in an academic setting frequently lead to accusations of academic misconduct, which can have serious consequences for a student's academic record. Additionally, studies have shown evidence that many AI detection models are prone to give false positives to work written by people whose first language is not English, and also to neurodivergent people. In June 2023, Janelle Shane wrote that portions of her book You Look Like a Thing and I Love You were flagged as AI-generated. === False negatives === A false negative is a failure to identify documents with AI-written text. False negatives often happen as a result of a detection software's sensitivity level or because evasive techniques were used when generating the work to make it sound more human. False negatives are less of a concern academically, since they aren't likely to lead to accusations and ramifications. Notably, Turnitin stated they have a 15% false negative rate. == Text detection == For text, this is usually done to prevent alleged plagiarism, often by detecting repetition of words as telltale signs that a text was AI-generated (including hallucinations). Detection systems may also rely on stylistic and structural regularities associated with LLM output, such as unusually consistent grammar, formulaic transitions, repeated discourse markers, and recurring rhetorical templates. Some tools are designed less to establish authorship provenance than to flag prose that resembles common LLM-generated style patterns. They are often used by teachers marking their students, usually on an ad hoc basis. Following the release of ChatGPT and similar AI text generative software, many educational establishments have issued policies against the use of AI by students. AI text detection software is also used by those assessing job applicants, as well as online search engines, hiring, online moderation and publishing. Current detectors may sometimes be unreliable and have incorrectly marked work by humans as originating from AI while failing to detect AI-generated work in other instances. MIT Technology Review said that the technology "struggled to pick up ChatGPT-generated text that had been slightly rearranged by humans and obfuscated by a paraphrasing tool". AI text detection software has also been shown to discriminate against non-native speakers of English. Two students from the University of California, Davis, were referred to the university's Office of Student Success and Judicial Affairs (OSSJA) after their professors scanned their essays with positive results; the first with an AI detector called GPTZero, and the second with an AI detector integration in Turnitin. However, following media coverage, and a thorough investigation, the students were cleared of any wrongdoing. In April 2023, Cambridge University and other members of the Russell Group of universities in the United Kingdom opted out of Turnitin's AI text detection tool, after expressing concerns it was unreliable. The University of Texas at Austin opted out of the system six months later. In May 2023, a professor at Texas A&M University–Commerce used ChatGPT to detect whether his students' content was written by it, which ChatGPT said was the case. As such, he threatened to fail the class despite ChatGPT not being able to detect AI-generated writing. No students were prevented from graduating because of the issue, and all but one student (who admitted to using the software) were exonerated from accusations of having used ChatGPT in their content. In July 2023, a paper titled "GPT detectors are biased against non-native English writers" was released, reporting that GPTs discriminate against non-native English authors. The paper compared seven GPT detectors against essays from both non-native English speakers and essays from United States students. The essays from non-native English speakers had an average false positive rate of 61.3%. An article by Thomas Germain, published on Gizmodo in June 2024, reported job losses among freelance writers and journalists due to AI text detection software mistakenly classifying their work as AI-generated. In September 2024, Common Sense Media reported that generative AI detectors had a 20% false positive rate for Black students, compared to 10% of Latino students and 7% of White students. To improve the reliability of AI text detection, researchers have explored digital watermarking techniques. A 2023 paper titled "A Watermark for Large Language Models" presents a method to embed imperceptible watermarks into text generated by large language models (LLMs). This watermarking approach allows content to be flagged as AI-generated with a high level of accuracy, even when text is slightly paraphrased or modified. The technique is designed to be subtle and hard to detect for casual readers, thereby preserving readability, while providing a detectable signal for those employing specialized tools. However, while promising, watermarking faces challenges in remaining robust under adversarial transformations and ensuring compatibility across different LLMs. == Anti text detection == There is software available designed to bypass AI text detection. In practice, evasion may not require specialized bypass tools. Paraphrasing, style editing, and removal of repeated discourse markers can substantially reduce the effectiveness of detectors that rely on recognizable surface patterns. A study published in August 2023 analyzed 20 abstracts from papers published in the Eye Journal, which were then paraphrased using GPT-4.0. The AI-paraphrased abstracts were examined for plagiarism using QueText and for AI-generated content using Originality.AI. The texts were then re-processed through an adversarial software called Undetectable.ai in order to reduce the AI-detection scores. The study found that the AI detection tool, Originality.AI, identified text generated by GPT-4 with a mean accuracy of 91.3%. However, after reprocessing by Undetectable.ai, the detection accuracy of Originality.ai dropped to a mean accuracy of 27.8%. Some experts also believe that techniques like digital watermarking are ineffective because they can be removed or added to trigger false positives. "A Watermark for Large Language Models" paper by Kirchenbauer et al. (2023) also addresses potential vulnerabilities of watermarking techniques. The authors outline a range of adversarial tactics, including text insertion, deletion, and substitution attacks, that could be used to bypass watermark detection. These attacks vary in complexity, from simple paraphrasing to more sophisticated approaches involving tokenization and homoglyph alterations. The study highlights the challenge of maintaining watermark robustness against attackers who may employ automated paraphrasing tools or even specific language model replacements to alter text spans iteratively while retaining semantic similarity. Experimental results show that although such attacks can degrade watermark strength, they also come at the cost of text quality and increased computational resources. == Image, video, and audio detection == Several purported AI image detection software exist, to detect AI-generated images (for example, those originating from Midjourney or DALL-E). They are not completely reliable. Industry analyses have also noted that AI-driven image recognition systems often struggle in real-world environments, where inconsistent lighting, noise and variable visual inputs reduce detection reliability, a challenge highlighted in modern agricultural quality-control research. Others claim to identify video and audio deepfakes, but this technology is also not fully reliable yet either. Despite debate around the efficacy of watermarking, Google DeepMind is actively developing a detection software called SynthID, which works by inserting a digital watermark that is invisible to the human eye into the pixels of an image.

    Read more →
  • Harrison White

    Harrison White

    Harrison Colyar White (March 21, 1930 – May 18, 2024) was an American sociologist who was the Giddings Professor of Sociology at Columbia University. White played an influential role in the “Harvard Revolution” in social networks and the New York School of relational sociology. He is credited with the development of a number of mathematical models of social structure including vacancy chains and blockmodels. He has been a leader of a revolution in sociology that is still in process, using models of social structure that are based on patterns of relations instead of the attributes and attitudes of individuals. Among social network researchers, White is widely respected. For instance, at the 1997 International Network of Social Network Analysis conference, the organizer held a special “White Tie” event, dedicated to White. Social network researcher Emmanuel Lazega refers to him as both “Copernicus and Galileo” because he invented both the vision and the tools. The most comprehensive documentation of his theories can be found in the book Identity and Control, first published in 1992. A major rewrite of the book appeared in June 2008. In 2011, White received the W.E.B. DuBois Career of Distinguished Scholarship Award from the American Sociological Association, which honors "scholars who have shown outstanding commitment to the profession of sociology and whose cumulative work has contributed in important ways to the advancement of the discipline." Before his retirement to live in Tucson, Arizona, White was interested in sociolinguistics and business strategy as well as sociology. == Life and career == === Early years === White was born on March 21, 1930, in Washington, D.C. He had three siblings and his father was a doctor in the US Navy. Although moving around to different Naval bases throughout his adolescence, he considered himself Southern, and Nashville, TN to be his home. At the age of 15, he entered the Massachusetts Institute of Technology (MIT), receiving his undergraduate degree at 20 years of age; five years later, in 1955, he received a doctorate in theoretical physics, also from MIT with John C. Slater as his advisor. His dissertation was titled A quantum-mechanical calculation of inter-atomic force constants in copper. This was published in the Physical Review as "Atomic Force Constants of Copper from Feynman's Theorem" (1958). While at MIT he also took a course with the political scientist Karl Deutsch, who White credits with encouraging him to move toward the social sciences. === Princeton University === After receiving his PhD in theoretical physics, he received a Fellowship from the Ford Foundation to begin his second doctorate in sociology at Princeton University. His dissertation advisor was Marion J. Levy. White also worked with Wilbert Moore, Fred Stephan, and Frank W. Notestein while at Princeton. His cohort was very small, with only four or five other graduate students including David Matza, and Stanley Udy. At the same time, he took up a position as an operations analyst at the Operations Research Office, Johns Hopkins University from 1955 to 1956. During this period, he worked with Lee S. Christie on Queuing with Preemptive Priorities or with Breakdown, which was published in 1958. Christie previously worked alongside mathematical psychologist R. Duncan Luce in the Small Group Laboratory at MIT while White was completing his first PhD in physics also at MIT. While continuing his studies at Princeton, White also spent a year as a fellow at the Center for Advanced Study in the Behavioral Sciences, Stanford University, California where he met Harold Guetzkow. Guetzkow was a faculty member at the Carnegie Institute of Technology, known for his application of simulations to social behavior and long-time collaborator with many other pioneers in organization studies, including Herbert A. Simon, James March, and Richard Cyert. Upon meeting Simon through his mutual acquaintance with Guetzkow, White received an invitation to move from California to Pittsburgh to work as an assistant professor of Industrial Administration and Sociology at the Graduate School of Industrial Administration, Carnegie Institute of Technology (later Carnegie-Mellon University), where he stayed for a couple of years, between 1957 and 1959. In an interview, he claimed to have fought with the dean, Leyland Bock, to have the word "sociology" included in his title. It was also during his time at the Stanford Center for Advanced Study that White met his first wife, Cynthia A. Johnson, who was a graduate of Radcliffe College, where she had majored in art history. The couple's joint work on the French Impressionists, Canvases and Careers (1965) and “Institutional Changes in the French Painting World” (1964), originally grew out of a seminar on art in 1957 at the Center for Advanced Study led by Robert Wilson. White originally hoped to use sociometry to map the social structure of French art to predict shifts, but he had an epiphany that it was not social structure but institutional structure which explained the shift. It was also during these years that White, still a graduate student in sociology, wrote and published his first social scientific work, "Sleep: A Sociological Interpretation" in Acta Sociologica in 1960, together with Vilhelm Aubert, a Norwegian sociologist. This work was a phenomenological examination of sleep which attempted to "demonstrate that sleep was more than a straightforward biological activity... [but rather also] a social event". For his dissertation, White carried out empirical research on a research and development department in a manufacturing firm, consisting of interviews and a 110-item questionnaire with managers. He specifically used sociometric questions, which he used to model the "social structure" of relationships between various departments and teams in the organization. In May 1960 he submitted as his doctoral dissertation, titled Research and Development as a Pattern in Industrial Management: A Case Study in Institutionalisation and Uncertainty, earning a PhD in sociology from Princeton University. His first publication based on his dissertation was ''Management conflict and sociometric structure'' in the American Journal of Sociology. === University of Chicago === In 1959 James Coleman left the University of Chicago to found a new department of social relations at Johns Hopkins University, this left a vacancy open for a mathematical sociologist like White. He moved to Chicago to start working as an associate professor at the Department of Sociology. At that time, highly influential sociologists, such as Peter Blau, Mayer Zald, Elihu Katz, Everett Hughes, Erving Goffman were there. As Princeton only required one year in residence, and White took the opportunity to take positions at Johns Hopkins, Stanford, and Carnegie while still working on his dissertation, it was at Chicago that White credits as being his "real socialization in a way, into sociology." It was here that White advised his first two graduate students Joel H. Levine and Morris Friedell, both who went on to make contributions to social network analysis in sociology. While at the Center for Advanced Study, White began learning anthropology and became fascinated with kinship. During his stay at the University of Chicago White was able to finish An Anatomy of Kinship, published in 1963 within the Prentice-Hall series in Mathematical Analysis of Social Behavior, with James Coleman and James March as chief editors. The book received significant attention from many mathematical sociologists of the time, and contributed greatly to establish White as a model builder. === The Harvard Revolution === In 1963, White left Chicago to be an associate professor of sociology at the Harvard Department of Social Relations—the same department founded by Talcott Parsons and still heavily influenced by the structural-functionalist paradigm of Parsons. As White previously only taught graduate courses at Carnegie and Chicago, his first undergraduate course was An Introduction to Social Relations (see Influence) at Harvard, which became infamous among network analysts. As he "thought existing textbooks were grotesquely unscientific," the syllabus of the class was noted for including few readings by sociologists, and comparatively more readings by anthropologists, social psychologists, and historians. White was also a vocal critic of what he called the "attributes and attitudes" approach of Parsonsian sociology, and came to be the leader of what has been variously known as the “Harvard Revolution," the "Harvard breakthrough," or the "Harvard renaissance" in social networks. He worked closely with small group researchers George C. Homans and Robert F. Bales, which was largely compatible with his prior work in organizational research and his efforts to formalize network analysis. Overlapping White's early years, Charles Tilly, a graduate of the Harvard Department of Social

    Read more →
  • Radial basis function

    Radial basis function

    In mathematics a radial basis function (RBF) is a real-valued function φ {\textstyle \varphi } whose value depends only on the distance between the input and some fixed point, either the origin, so that φ ( x ) = φ ^ ( ‖ x ‖ ) {\textstyle \varphi (\mathbf {x} )={\hat {\varphi }}(\left\|\mathbf {x} \right\|)} , or some other fixed point c {\textstyle \mathbf {c} } , called a center, so that φ ( x ) = φ ^ ( ‖ x − c ‖ ) {\textstyle \varphi (\mathbf {x} )={\hat {\varphi }}(\left\|\mathbf {x} -\mathbf {c} \right\|)} . Any function φ {\textstyle \varphi } that satisfies the property φ ( x ) = φ ^ ( ‖ x ‖ ) {\textstyle \varphi (\mathbf {x} )={\hat {\varphi }}(\left\|\mathbf {x} \right\|)} is a radial function. The distance is usually Euclidean distance, although other metrics are sometimes used. They are often used as a collection { φ k } k {\displaystyle \{\varphi _{k}\}_{k}} which forms a basis for some function space of interest, hence the name. Sums of radial basis functions are typically used to approximate given functions. This approximation process can also be interpreted as a simple kind of neural network; this was the context in which they were originally applied to machine learning, in work by David Broomhead and David Lowe in 1988, which stemmed from Michael J. D. Powell's seminal research from 1977. RBFs are also used as a kernel in support vector classification. The technique has proven effective and flexible enough that radial basis functions are now applied in a variety of engineering applications. == Definition == A radial function is a function φ : [ 0 , ∞ ) → R {\textstyle \varphi :[0,\infty )\to \mathbb {R} } . When paired with a norm ‖ ⋅ ‖ : V → [ 0 , ∞ ) {\textstyle \|\cdot \|:V\to [0,\infty )} on a vector space, a function of the form φ c = φ ( ‖ x − c ‖ ) {\textstyle \varphi _{\mathbf {c} }=\varphi (\|\mathbf {x} -\mathbf {c} \|)} is said to be a radial kernel centered at c ∈ V {\textstyle \mathbf {c} \in V} . A radial function and the associated radial kernels are said to be radial basis functions if, for any finite set of nodes { x k } k = 1 n ⊆ V {\displaystyle \{\mathbf {x} _{k}\}_{k=1}^{n}\subseteq V} , all of the following conditions are true: === Examples === Commonly used types of radial basis functions include (writing r = ‖ x − x i ‖ {\textstyle r=\left\|\mathbf {x} -\mathbf {x} _{i}\right\|} and using ε {\textstyle \varepsilon } to indicate a shape parameter that can be used to scale the input of the radial kernel): == Approximation == Radial basis functions are typically used to build up function approximations of the form where the approximating function y ( x ) {\textstyle y(\mathbf {x} )} is represented as a sum of N {\displaystyle N} radial basis functions, each associated with a different center x i {\textstyle \mathbf {x} _{i}} , and weighted by an appropriate coefficient w i . {\textstyle w_{i}.} The weights w i {\textstyle w_{i}} can be estimated using the matrix methods of linear least squares, because the approximating function is linear in the weights w i {\textstyle w_{i}} . Approximation schemes of this kind have been particularly used in time series prediction and control of nonlinear systems exhibiting sufficiently simple chaotic behaviour and 3D reconstruction in computer graphics (for example, hierarchical RBF and Pose Space Deformation). == RBF Network == The sum can also be interpreted as a rather simple single-layer type of artificial neural network called a radial basis function network, with the radial basis functions taking on the role of the activation functions of the network. It can be shown that any continuous function on a compact interval can in principle be interpolated with arbitrary accuracy by a sum of this form, if a sufficiently large number N {\textstyle N} of radial basis functions is used. The approximant y ( x ) {\textstyle y(\mathbf {x} )} is differentiable with respect to the weights w i {\textstyle w_{i}} . The weights could thus be learned using any of the standard iterative methods for neural networks. Using radial basis functions in this manner yields a reasonable interpolation approach provided that the fitting set has been chosen such that it covers the entire range systematically (equidistant data points are ideal). However, without a polynomial term that is orthogonal to the radial basis functions, estimates outside the fitting set tend to perform poorly. == RBFs for PDEs == Radial basis functions are used to approximate functions and so can be used to discretize and numerically solve Partial Differential Equations (PDEs). This was first done in 1990 by E. J. Kansa who developed the first RBF based numerical method. It is called the Kansa method and was used to solve the elliptic Poisson equation and the linear advection-diffusion equation. The function values at points x {\displaystyle \mathbf {x} } in the domain are approximated by the linear combination of RBFs: The derivatives are approximated as such: where N {\displaystyle N} are the number of points in the discretized domain, d {\displaystyle d} the dimension of the domain and λ {\displaystyle \lambda } the scalar coefficients that are unchanged by the differential operator. Different numerical methods based on Radial Basis Functions were developed thereafter. Some methods are the RBF-FD method, the RBF-QR method and the RBF-PUM method.

    Read more →
  • Error-driven learning

    Error-driven learning

    In reinforcement learning, error-driven learning is a method for adjusting a model's (intelligent agent's) parameters based on the difference between its output results and the ground truth. These models stand out as they depend on environmental feedback, rather than explicit labels or categories. They are based on the idea that language acquisition involves the minimization of the prediction error (MPSE). By leveraging these prediction errors, the models consistently refine expectations and decrease computational complexity. Typically, these algorithms are operated by the GeneRec algorithm. Error-driven learning has widespread applications in cognitive sciences and computer vision. These methods have also found successful application in natural language processing (NLP), including areas like part-of-speech tagging, parsing, named entity recognition (NER), machine translation (MT), speech recognition (SR), and dialogue systems. == Formal Definition == Error-driven learning models are ones that rely on the feedback of prediction errors to adjust the expectations or parameters of a model. The key components of error-driven learning include the following: A set S {\displaystyle S} of states representing the different situations that the learner can encounter. A set A {\displaystyle A} of actions that the learner can take in each state. A prediction function P ( s , a ) {\displaystyle P(s,a)} that gives the learner's current prediction of the outcome of taking action a {\displaystyle a} in state s {\displaystyle s} . An error function E ( o , p ) {\displaystyle E(o,p)} that compares the actual outcome o {\displaystyle o} with the prediction p {\displaystyle p} and produces an error value. An update rule U ( p , e ) {\displaystyle U(p,e)} that adjusts the prediction p {\displaystyle p} in light of the error e {\displaystyle e} . == Algorithms == Error-driven learning algorithms refer to a category of reinforcement learning algorithms that leverage the disparity between the real output and the expected output of a system to regulate the system's parameters. Typically applied in supervised learning, these algorithms are provided with a collection of input-output pairs to facilitate the process of generalization. The widely utilized error backpropagation learning algorithm is known as GeneRec, a generalized recirculation algorithm primarily employed for gene prediction in DNA sequences. Many other error-driven learning algorithms are derived from alternative versions of GeneRec. == Applications == === Cognitive science === Simpler error-driven learning models effectively capture complex human cognitive phenomena and anticipate elusive behaviors. They provide a flexible mechanism for modeling the brain's learning process, encompassing perception, attention, memory, and decision-making. By using errors as guiding signals, these algorithms adeptly adapt to changing environmental demands and objectives, capturing statistical regularities and structure. Furthermore, cognitive science has led to the creation of new error-driven learning algorithms that are both biologically acceptable and computationally efficient. These algorithms, including deep belief networks, spiking neural networks, and reservoir computing, follow the principles and constraints of the brain and nervous system. Their primary aim is to capture the emergent properties and dynamics of neural circuits and systems. === Computer vision === Computer vision is a complex task that involves understanding and interpreting visual data, such as images or videos. In the context of error-driven learning, the computer vision model learns from the mistakes it makes during the interpretation process. When an error is encountered, the model updates its internal parameters to avoid making the same mistake in the future. This repeated process of learning from errors helps improve the model's performance over time. For NLP to do well at computer vision, it employs deep learning techniques. This form of computer vision is sometimes called neural computer vision (NCV), since it makes use of neural networks. NCV therefore interprets visual data based on a statistical, trial and error approach and can deal with context and other subtleties of visual data. === Natural Language Processing === ==== Part-of-speech tagging ==== Part-of-speech (POS) tagging is a crucial component in Natural Language Processing (NLP). It helps resolve human language ambiguity at different analysis levels. In addition, its output (tagged data) can be used in various applications of NLP such as information extraction, information retrieval, question Answering, speech eecognition, text-to-speech conversion, partial parsing, and grammar correction. ==== Parsing ==== Parsing in NLP involves breaking down a text into smaller pieces (phrases) based on grammar rules. If a sentence cannot be parsed, it may contain grammatical errors. In the context of error-driven learning, the parser learns from the mistakes it makes during the parsing process. When an error is encountered, the parser updates its internal model to avoid making the same mistake in the future. This iterative process of learning from errors helps improve the parser's performance over time. In conclusion, error-driven learning plays a crucial role in improving the accuracy and efficiency of NLP parsers by allowing them to learn from their mistakes and adapt their internal models accordingly. ==== Named entity recognition (NER) ==== NER is the task of identifying and classifying entities (such as persons, locations, organizations, etc.) in a text. Error-driven learning can help the model learn from its false positives and false negatives and improve its recall and precision on (NER). In the context of error-driven learning, the significance of NER is quite profound. Traditional sequence labeling methods identify nested entities layer by layer. If an error occurs in the recognition of an inner entity, it can lead to incorrect identification of the outer entity, leading to a problem known as error propagation of nested entities. This is where the role of NER becomes crucial in error-driven learning. By accurately recognizing and classifying entities, it can help minimize these errors and improve the overall accuracy of the learning process. Furthermore, deep learning-based NER methods have shown to be more accurate as they are capable of assembling words, enabling them to understand the semantic and syntactic relationship between various words better. ==== Machine translation ==== Machine translation is a complex task that involves converting text from one language to another. In the context of error-driven learning, the machine translation model learns from the mistakes it makes during the translation process. When an error is encountered, the model updates its internal parameters to avoid making the same mistake in the future. This iterative process of learning from errors helps improve the model's performance over time. ==== Speech recognition ==== Speech recognition is a complex task that involves converting spoken language into written text. In the context of error-driven learning, the speech recognition model learns from the mistakes it makes during the recognition process. When an error is encountered, the model updates its internal parameters to avoid making the same mistake in the future. This iterative process of learning from errors helps improve the model's performance over time. ==== Dialogue systems ==== Dialogue systems are a popular NLP task as they have promising real-life applications. They are also complicated tasks since many NLP tasks deserving study are involved. In the context of error-driven learning, the dialogue system learns from the mistakes it makes during the dialogue process. When an error is encountered, the model updates its internal parameters to avoid making the same mistake in the future. This iterative process of learning from errors helps improve the model's performance over time. == Advantages == Error-driven learning has several advantages over other types of machine learning algorithms: They can learn from feedback and correct their mistakes, which makes them adaptive and robust to noise and changes in the data. They can handle large and high-dimensional data sets, as they do not require explicit feature engineering or prior knowledge of the data distribution. They can achieve high accuracy and performance, as they can learn complex and nonlinear relationships between the input and the output. == Limitations == Although error driven learning has its advantages, their algorithms also have the following limitations: They can suffer from overfitting, which means that they memorize the training data and fail to generalize to new and unseen data. This can be mitigated by using regularization techniques, such as adding a penalty term to the loss function, or reducing the complexity of the model. They can be sensitive to the choice of

    Read more →
  • Scene text

    Scene text

    Scene text is text that appears in an image captured by a camera in an outdoor environment. The detection and recognition of scene text from camera captured images are computer vision tasks which became important after smart phones with good cameras became ubiquitous. The text in scene images varies in shape, font, colour and position. The recognition of scene text is further complicated sometimes by non-uniform illumination and focus. To improve scene text recognition, the International Conference on Document Analysis and Recognition (ICDAR) conducts a robust reading competition once in two years. The competition was held in 2003, 2005 and during every ICDAR conference. International association for pattern recognition (IAPR) has created a list of datasets as Reading systems. == Text detection == Text detection is the process of detecting the text present in the image, followed by surrounding it with a rectangular bounding box. Text detection can be carried out using image based techniques or frequency based techniques. In image based techniques, an image is segmented into multiple segments. Each segment is a connected component of pixels with similar characteristics. The statistical features of connected components are utilised to group them and form the text. Machine learning approaches such as support vector machine and convolutional neural networks are used to classify the components into text and non-text. In frequency based techniques, discrete Fourier transform (DFT) or discrete wavelet transform (DWT) are used to extract the high frequency coefficients. It is assumed that the text present in an image has high frequency components and selecting only the high frequency coefficients filters the text from the non-text regions in an image. == Word recognition == In word recognition, the text is assumed to be already detected and located and the rectangular bounding box containing the text is available. The word present in the bounding box needs to be recognized. The methods available to perform word recognition can be broadly classified into top-down and bottom-up approaches. In the top-down approaches, a set of words from a dictionary is used to identify which word suits the given image. Images are not segmented in most of these methods. Hence, the top-down approach is sometimes referred as segmentation free recognition. In the bottom-up approaches, the image is segmented into multiple components and the segmented image is passed through a recognition engine. Either an off the shelf Optical character recognition (OCR) engine or a custom-trained one is used to recognise the text.

    Read more →
  • Pruning (artificial neural network)

    Pruning (artificial neural network)

    In deep learning, pruning is the practice of removing parameters from an existing artificial neural network. The goal of this process is to reduce the size (parameter count) of the neural network (and therefore the computational resources required to run it) whilst maintaining accuracy. This can be compared to the biological process of synaptic pruning which takes place in mammalian brains during development. == Node (neuron) pruning == A basic algorithm for pruning is as follows: Evaluate the importance of each neuron. Rank the neurons according to their importance (assuming there is a clearly defined measure for "importance"). Remove the least important neuron. Check a termination condition (to be determined by the user) to see whether to continue pruning. == Edge (weight) pruning == Most work on neural network pruning does not remove full neurons or layers (structured pruning). Instead, it focuses on removing the most insignificant weights (unstructured pruning), namely, setting their values to zero. This can either be done globally by comparing weights from all layers in the network or locally by comparing weights in each layer separately. Different metrics can be used to measure the importance of each weight. Weight magnitude as well as combinations of weight and gradient information are commonly used metrics. Early work suggested also to change the values of non-pruned weights. == When to prune the neural network? == Pruning can be applied at three different stages: before training, during training, or after training. When pruning is performed during or after training, additional fine-tuning epochs are typically required. Each approach involves different trade-offs between accuracy and computational cost.

    Read more →
  • Targeted maximum likelihood estimation

    Targeted maximum likelihood estimation

    Targeted Maximum Likelihood Estimation (TMLE) (also more accurately referred to as Targeted Minimum Loss-Based Estimation) is a general statistical estimation framework for causal inference and semiparametric models. TMLE combines ideas from maximum likelihood estimation, semiparametric efficiency theory, and machine learning. It was introduced by Mark J. van der Laan and colleagues in the mid-2000s as a method that yields asymptotically efficient plug-in estimators while allowing the use of flexible, data-adaptive algorithms such as ensemble machine learning for nuisance parameter estimation. TMLE is used in epidemiology, biostatistics, and the social sciences to estimate causal effects in observational and experimental studies. Applications of TMLE include Longitudinal TMLE (LTMLE) for time-varying treatments and confounders. Variations in how the targeting step in TMLE is carried out have resulted in various versions of TMLE such as Collaborative TMLE (CTMLE) and Adaptive TMLE for improved finite-sample performance and automated variable selection. == History == The TMLE framework was first described by van der Laan and Rubin (2006) as a general approach for the construction of efficient plug-in estimators of smooth features of the data density. It was demonstrated in the context of causal inference and missing data problems. It was developed to address limitations of traditional doubly robust methods, such as Augmented Inverse Probability Weighting (AIPW), by respecting the plug-in principle in the sense that it respects that the target parameter is a function of the data density that is an element of the statistical model. TMLE estimates the data density or relevant parts of it with machine learning and targets these machine learning fits before it is plugged in the target parameter mapping. In this manner, a TMLE always respects global knowledge and satisfies known bounds such as that the target parameter is a probability . Since its introduction, TMLE has been developed in a series of theoretical and applied papers, culminating in book-length treatments of the method and its applications to survival analysis, adaptive designs, and longitudinal data. == Methodology == At its core, TMLE is a two-step estimation procedure: Initial estimation: Machine learning methods (such as the Super Learner ensemble) are used to obtain flexible estimates of nuisance parameters, such as outcome regressions and propensity scores. Targeting step: The initial estimate is updated by solving a score equation (the efficient influence function) so that the final estimator is consistent, asymptotically normal, and efficient under mild regularity conditions. The targeted machine learning fit is then mapped into the corresponding estimator of the target parameter by simply plugging it in the target parameter mapping. This approach balances the bias–variance trade-off by combining data-adaptive estimation with semiparametric efficiency theory. TMLE is doubly robust, meaning it remains consistent if either the outcome model or the treatment model is consistently estimated. === Formula === Here we explain the TMLE of the average treatment effect of a binary treatment on an outcome adjusting for baseline covariates. Consider i.i.d. observations O i = ( W i , A i , Y i ) {\displaystyle O_{i}=(W_{i},A_{i},Y_{i})} from a distribution P 0 {\displaystyle P_{0}} , where W {\displaystyle W} are baseline covariates, A {\displaystyle A} is a binary treatment, and Y {\displaystyle Y} is an outcome. Let Q ¯ ( a , w ) = E [ Y ∣ A = a , W = w ] {\displaystyle {\bar {Q}}(a,w)=\mathbb {E} [Y\mid A=a,W=w]} represent the outcome model and g ( a ∣ w ) = P ( A = a ∣ W = w ) {\displaystyle g(a\mid w)=P(A=a\mid W=w)} represent the propensity score. The average treatment effect (ATE) is given by ψ 0 = E { Q ¯ ( 1 , W ) − Q ¯ ( 0 , W ) } . {\displaystyle \psi _{0}=\mathbb {E} \{{\bar {Q}}(1,W)-{\bar {Q}}(0,W)\}.} A basic TMLE for the ATE proceeds as follows: Step 1: Estimate initial models. Obtain estimates Q ¯ ^ ( a , w ) {\displaystyle {\hat {\bar {Q}}}(a,w)} and g ^ ( a ∣ w ) {\displaystyle {\hat {g}}(a\mid w)} , often using flexible methods such as Super Learner. Step 2: Compute the clever covariate. Define: H ( A , W ) = A g ^ ( 1 ∣ W ) − 1 − A g ^ ( 0 ∣ W ) . {\displaystyle H(A,W)={\frac {A}{{\hat {g}}(1\mid W)}}-{\frac {1-A}{{\hat {g}}(0\mid W)}}.} Step 3: Estimate the fluctuation parameter. Fit a logistic regression of Y {\displaystyle Y} on H ( A , W ) {\displaystyle H(A,W)} with logit ⁡ ( Q ¯ ^ ( A , W ) ) {\displaystyle \operatorname {logit} ({\hat {\bar {Q}}}(A,W))} as offset. This yields ε ^ {\displaystyle {\hat {\varepsilon }}} , the MLE that solves the score equation: 1 n ∑ i = 1 n H ( A i , W i ) { Y i − Q ¯ ^ ε ( A i , W i ) } = 0. {\displaystyle {\frac {1}{n}}\sum _{i=1}^{n}H(A_{i},W_{i}){\big \{}Y_{i}-{\hat {\bar {Q}}}^{\varepsilon }(A_{i},W_{i}){\big \}}=0.} Step 4: Update the initial estimate. Apply the "blip" to obtain the targeted estimate: Q ¯ ^ ∗ ( A , W ) = expit ⁡ ( logit ⁡ ( Q ¯ ^ ( A , W ) ) + ε ^ H ( A , W ) ) . {\displaystyle {\hat {\bar {Q}}}^{}(A,W)=\operatorname {expit} {\Big (}\operatorname {logit} {\big (}{\hat {\bar {Q}}}(A,W){\big )}+{\hat {\varepsilon }}\,H(A,W){\Big )}.} Step 5: Compute the TMLE. The ATE estimate is: ψ ^ TMLE = 1 n ∑ i = 1 n [ Q ¯ ^ ∗ ( 1 , W i ) − Q ¯ ^ ∗ ( 0 , W i ) ] . {\displaystyle {\hat {\psi }}_{\text{TMLE}}={\frac {1}{n}}\sum _{i=1}^{n}{\big [}{\hat {\bar {Q}}}^{}(1,W_{i})-{\hat {\bar {Q}}}^{}(0,W_{i}){\big ]}.} Inference. The efficient influence function (EIF) for the ATE is: D ∗ ( O ) = H ( A , W ) { Y − Q ¯ ∗ ( A , W ) } + Q ¯ ∗ ( 1 , W ) − Q ¯ ∗ ( 0 , W ) − ψ . {\displaystyle D^{}(O)=H(A,W)\{Y-{\bar {Q}}^{}(A,W)\}+{\bar {Q}}^{}(1,W)-{\bar {Q}}^{}(0,W)-\psi .} The variance is estimated by σ ^ 2 = n − 1 ∑ i = 1 n ( D ∗ ( O i ) ) 2 {\displaystyle {\hat {\sigma }}^{2}=n^{-1}\sum _{i=1}^{n}{\big (}D^{}(O_{i}){\big )}^{2}} , yielding Wald-type confidence intervals ψ ^ TMLE ± z 1 − α / 2 σ ^ / n {\displaystyle {\hat {\psi }}_{\text{TMLE}}\pm z_{1-\alpha /2}\,{\hat {\sigma }}/{\sqrt {n}}} . Remark. For continuous outcomes, a linear fluctuation Q ¯ ^ ∗ = Q ¯ ^ + ε ^ H {\displaystyle {\hat {\bar {Q}}}^{}={\hat {\bar {Q}}}+{\hat {\varepsilon }}\,H} may be used instead. For bounded continuous outcomes, the logistic fluctuation (after rescaling Y {\displaystyle Y} to [ 0 , 1 ] {\displaystyle [0,1]} ) is often preferred for improved finite-sample performance. == Applications == TMLE has been applied in: Epidemiology: Estimating causal effects of exposures and interventions in observational cohort studies. Clinical trials and real-world evidence: The Targeted Learning roadmap provides a structured framework for generating and validating real-world evidence (RWE), bridging randomized trials and observational data using TMLE and related estimation techniques. This approach enables transparency, sensitivity analysis, and stronger causal inference for regulatory and clinical trial contexts. High-dimensional settings: Integration with ensemble methods for causal effect estimation. TMLE has been successfully applied in pharmacoepidemiology where a large number of covariates are automatically selected to adjust for confounding. In a study of post–myocardial infarction statin use and 1-year mortality, TMLE demonstrated robust performance relative to inverse probability weighting in scenarios with hundreds of potential confounders. == Derivatives and extensions == Longitudinal TMLE (LTMLE): A methodological extension of TMLE for longitudinal data with time-varying treatments, confounders, and censoring. It allows the estimation of dynamic treatment regimes and intervention-specific causal effects over time. This framework was originally introduced by van der Laan & Gruber (2012). Collaborative TMLE (CTMLE): Enhances finite-sample performance and variable selection by collaboratively fitting the treatment mechanism in conjunction with the target parameter. == Software == Several R packages implement TMLE and related methods: tmle: Functions for binary, categorical, and continuous outcomes. ltmle: Implementation for longitudinal data with time-varying treatments and outcomes. ctmle: Algorithms for collaborative TMLE and adaptive variable selection. SuperLearner: A theoretically grounded, cross-validated ensemble learning method that combines predictions from multiple algorithms to minimize predictive risk. Widely used in TMLE for estimating nuisance parameters. The original implementation is available as the R package SuperLearner. Recent machine learning platforms like H2O AutoML implement similar ensemble strategies, combining diverse learners in parallel and leveraging stacking and blending techniques, effectively functioning as a large-scale Super Learner.

    Read more →
  • LogitBoost

    LogitBoost

    In machine learning and computational learning theory, LogitBoost is a boosting algorithm formulated by Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The original paper casts the AdaBoost algorithm into a statistical framework. Specifically, if one considers AdaBoost as a generalized additive model and then applies the cost function of logistic regression, one can derive the LogitBoost algorithm. == Minimizing the LogitBoost cost function == LogitBoost can be seen as a convex optimization. Specifically, given that we seek an additive model of the form f = ∑ t α t h t {\displaystyle f=\sum _{t}\alpha _{t}h_{t}} the LogitBoost algorithm minimizes the logistic loss: ∑ i log ⁡ ( 1 + e − y i f ( x i ) ) {\displaystyle \sum _{i}\log \left(1+e^{-y_{i}f(x_{i})}\right)}

    Read more →
  • AI washing

    AI washing

    AI washing is a deceptive marketing tactic that consists of promoting a product or a service by overstating the role of artificial intelligence (AI) and the integration of it. Companies often involve in the practice to mislead customers to boost their offerings, and to secure funding from investors. The practice raises concerns regarding transparency, and legal issues. == Definition == AI washing is a deceptive marketing practice. It involves promoting a product or a service by overstating the role of artificial intelligence (AI) and its integration in the design and manufacture of the same. The practice raises concerns regarding transparency, compliance with security regulations, and consumer trust in the AI industry potentially hampering legitimate advancements in AI. The term was first defined by the AI Now Institute, a research institute based at New York University in 2019. The term is derived from greenwashing, another deceptive marketing technique that misrepresents a product's environmental impact in a similar manner. AI washing might involve a company claiming to have used AI in the development or enhancement of its products or services without its actual involvement, or using buzzwords such as "smart" or "AI-powered" without the product actually offering it or making use of it. A company may overstate the usage of AI or misuse the term, which is also construed as AI washing. In 2026, The Washington Post defined AI washing as "a trend for bosses to blame layoffs on the productive capabilities of AI and its ability to replace workers, even when job cuts may have little to do with the technology". == Usage and effects == AI washing can lead to deception of customers and misleading of investors. It is also an illegal and unethical practice that lacks transparency regarding disclosing the details of a product or a service. Companies get involved in such a practice often in response to competition who might have used AI in their offerings. It might also be used as a ploy to secure funding and investment, assuming that it will attract them towards it. AI washing has been compared to dot-com bubble, when businesses appended "dot-com" to the end of the business name to boost their valuation. In September 2023, Coca-Cola released a new product called Coca-Cola Y3000, and the company stated that the Y3000 flavor had been "co-created with human and artificial intelligence". The company was accused of AI washing due to no proof of AI involvement in the creation of the product, and critics believed that AI was used as a way to grab consumer attention more than it was used in the actual product creation. In 2026, mass tech layoffs were attributed to AI washing from AI innovation instead of balance sheet restructuring. == Mitigation == Companies are expected to be transparent and clearer in communicating the usage of AI in their products or services. Consumers can mitigate the same by requesting for hard evidence from the companies regarding the usage of AI tools. Customers should evaluate the product or service as a whole rather than being swayed by the usage of AI. Informed decision making and purchasing can keep them from falling for such marketing gimmicks. The United States Securities and Exchange Commission (SEC) imposes penalties for companies indulging in such practices. In March 2024, the SEC imposed the first civil penalties on two companies for misleading statements about their use of AI, and in July 2024, it charged a corporate executive from a supposed AI hiring startup with fraud for the usage of buzzwords related to AI.

    Read more →
  • Least-squares support vector machine

    Least-squares support vector machine

    Least-squares support-vector machines (LS-SVM) for statistics and in statistical modeling, are least-squares versions of support-vector machines (SVM), which are a set of related supervised learning methods that analyze data and recognize patterns, and which are used for classification and regression analysis. In this version one finds the solution by solving a set of linear equations instead of a convex quadratic programming (QP) problem for classical SVMs. Least-squares SVM classifiers were proposed by Johan Suykens and Joos Vandewalle. LS-SVMs are a class of kernel-based learning methods. == From support-vector machine to least-squares support-vector machine == Given a training set { x i , y i } i = 1 N {\displaystyle \{x_{i},y_{i}\}_{i=1}^{N}} with input data x i ∈ R n {\displaystyle x_{i}\in \mathbb {R} ^{n}} and corresponding binary class labels y i ∈ { − 1 , + 1 } {\displaystyle y_{i}\in \{-1,+1\}} , the SVM classifier, according to Vapnik's original formulation, satisfies the following conditions: { w T ϕ ( x i ) + b ≥ 1 , if y i = + 1 , w T ϕ ( x i ) + b ≤ − 1 , if y i = − 1 , {\displaystyle {\begin{cases}w^{T}\phi (x_{i})+b\geq 1,&{\text{if }}\quad y_{i}=+1,\\w^{T}\phi (x_{i})+b\leq -1,&{\text{if }}\quad y_{i}=-1,\end{cases}}} which is equivalent to y i [ w T ϕ ( x i ) + b ] ≥ 1 , i = 1 , … , N , {\displaystyle y_{i}\left[{w^{T}\phi (x_{i})+b}\right]\geq 1,\quad i=1,\ldots ,N,} where ϕ ( x ) {\displaystyle \phi (x)} is the nonlinear map from original space to the high- or infinite-dimensional space. === Inseparable data === In case such a separating hyperplane does not exist, we introduce so-called slack variables ξ i {\displaystyle \xi _{i}} such that { y i [ w T ϕ ( x i ) + b ] ≥ 1 − ξ i , i = 1 , … , N , ξ i ≥ 0 , i = 1 , … , N . {\displaystyle {\begin{cases}y_{i}\left[{w^{T}\phi (x_{i})+b}\right]\geq 1-\xi _{i},&i=1,\ldots ,N,\\\xi _{i}\geq 0,&i=1,\ldots ,N.\end{cases}}} According to the structural risk minimization principle, the risk bound is minimized by the following minimization problem: min J 1 ( w , ξ ) = 1 2 w T w + c ∑ i = 1 N ξ i , {\displaystyle \min J_{1}(w,\xi )={\frac {1}{2}}w^{T}w+c\sum \limits _{i=1}^{N}\xi _{i},} Subject to { y i [ w T ϕ ( x i ) + b ] ≥ 1 − ξ i , i = 1 , … , N , ξ i ≥ 0 , i = 1 , … , N , {\displaystyle {\text{Subject to }}{\begin{cases}y_{i}\left[{w^{T}\phi (x_{i})+b}\right]\geq 1-\xi _{i},&i=1,\ldots ,N,\\\xi _{i}\geq 0,&i=1,\ldots ,N,\end{cases}}} To solve this problem, we could construct the Lagrangian function: L 1 ( w , b , ξ , α , β ) = 1 2 w T w + c ∑ i = 1 N ξ i − ∑ i = 1 N α i { y i [ w T ϕ ( x i ) + b ] − 1 + ξ i } − ∑ i = 1 N β i ξ i , {\displaystyle L_{1}(w,b,\xi ,\alpha ,\beta )={\frac {1}{2}}w^{T}w+c\sum \limits _{i=1}^{N}{\xi _{i}}-\sum \limits _{i=1}^{N}\alpha _{i}\left\{y_{i}\left[{w^{T}\phi (x_{i})+b}\right]-1+\xi _{i}\right\}-\sum \limits _{i=1}^{N}\beta _{i}\xi _{i},} where α i ≥ 0 , β i ≥ 0 ( i = 1 , … , N ) {\displaystyle \alpha _{i}\geq 0,\ \beta _{i}\geq 0\ (i=1,\ldots ,N)} are the Lagrangian multipliers. The optimal point will be in the saddle point of the Lagrangian function, and then we obtain By substituting w {\displaystyle w} by its expression in the Lagrangian formed from the appropriate objective and constraints, we will get the following quadratic programming problem: max Q 1 ( α ) = − 1 2 ∑ i , j = 1 N α i α j y i y j K ( x i , x j ) + ∑ i = 1 N α i , {\displaystyle \max Q_{1}(\alpha )=-{\frac {1}{2}}\sum \limits _{i,j=1}^{N}{\alpha _{i}\alpha _{j}y_{i}y_{j}K(x_{i},x_{j})}+\sum \limits _{i=1}^{N}\alpha _{i},} where K ( x i , x j ) = ⟨ ϕ ( x i ) , ϕ ( x j ) ⟩ {\displaystyle K(x_{i},x_{j})=\left\langle \phi (x_{i}),\phi (x_{j})\right\rangle } is called the kernel function. Solving this QP problem subject to constraints in (1), we will get the hyperplane in the high-dimensional space and hence the classifier in the original space. === Least-squares SVM formulation === The least-squares version of the SVM classifier is obtained by reformulating the minimization problem as min J 2 ( w , b , e ) = μ 2 w T w + ζ 2 ∑ i = 1 N e i 2 , {\displaystyle \min J_{2}(w,b,e)={\frac {\mu }{2}}w^{T}w+{\frac {\zeta }{2}}\sum \limits _{i=1}^{N}e_{i}^{2},} subject to the equality constraints y i [ w T ϕ ( x i ) + b ] = 1 − e i , i = 1 , … , N . {\displaystyle y_{i}\left[{w^{T}\phi (x_{i})+b}\right]=1-e_{i},\quad i=1,\ldots ,N.} The least-squares SVM (LS-SVM) classifier formulation above implicitly corresponds to a regression interpretation with binary targets y i = ± 1 {\displaystyle y_{i}=\pm 1} . Using y i 2 = 1 {\displaystyle y_{i}^{2}=1} , we have ∑ i = 1 N e i 2 = ∑ i = 1 N ( y i e i ) 2 = ∑ i = 1 N e i 2 = ∑ i = 1 N ( y i − ( w T ϕ ( x i ) + b ) ) 2 , {\displaystyle \sum \limits _{i=1}^{N}e_{i}^{2}=\sum \limits _{i=1}^{N}(y_{i}e_{i})^{2}=\sum \limits _{i=1}^{N}e_{i}^{2}=\sum \limits _{i=1}^{N}\left(y_{i}-(w^{T}\phi (x_{i})+b)\right)^{2},} with e i = y i − ( w T ϕ ( x i ) + b ) . {\displaystyle e_{i}=y_{i}-(w^{T}\phi (x_{i})+b).} Notice, that this error would also make sense for least-squares data fitting, so that the same end results holds for the regression case. Hence the LS-SVM classifier formulation is equivalent to J 2 ( w , b , e ) = μ E W + ζ E D {\displaystyle J_{2}(w,b,e)=\mu E_{W}+\zeta E_{D}} with E W = 1 2 w T w {\displaystyle E_{W}={\frac {1}{2}}w^{T}w} and E D = 1 2 ∑ i = 1 N e i 2 = 1 2 ∑ i = 1 N ( y i − ( w T ϕ ( x i ) + b ) ) 2 . {\displaystyle E_{D}={\frac {1}{2}}\sum \limits _{i=1}^{N}e_{i}^{2}={\frac {1}{2}}\sum \limits _{i=1}^{N}\left(y_{i}-(w^{T}\phi (x_{i})+b)\right)^{2}.} Both μ {\displaystyle \mu } and ζ {\displaystyle \zeta } should be considered as hyperparameters to tune the amount of regularization versus the sum squared error. The solution does only depend on the ratio γ = ζ / μ {\displaystyle \gamma =\zeta /\mu } , therefore the original formulation uses only γ {\displaystyle \gamma } as tuning parameter. We use both μ {\displaystyle \mu } and ζ {\displaystyle \zeta } as parameters in order to provide a Bayesian interpretation to LS-SVM. The solution of LS-SVM regressor will be obtained after we construct the Lagrangian function: { L 2 ( w , b , e , α ) = J 2 ( w , e ) − ∑ i = 1 N α i { [ w T ϕ ( x i ) + b ] + e i − y i } , = 1 2 w T w + γ 2 ∑ i = 1 N e i 2 − ∑ i = 1 N α i { [ w T ϕ ( x i ) + b ] + e i − y i } , {\displaystyle {\begin{cases}L_{2}(w,b,e,\alpha )\;=J_{2}(w,e)-\sum \limits _{i=1}^{N}\alpha _{i}\left\{{\left[{w^{T}\phi (x_{i})+b}\right]+e_{i}-y_{i}}\right\},\\\quad \quad \quad \quad \quad \;={\frac {1}{2}}w^{T}w+{\frac {\gamma }{2}}\sum \limits _{i=1}^{N}e_{i}^{2}-\sum \limits _{i=1}^{N}\alpha _{i}\left\{\left[w^{T}\phi (x_{i})+b\right]+e_{i}-y_{i}\right\},\end{cases}}} where α i ∈ R {\displaystyle \alpha _{i}\in \mathbb {R} } are the Lagrange multipliers. The conditions for optimality are { ∂ L 2 ∂ w = 0 → w = ∑ i = 1 N α i ϕ ( x i ) , ∂ L 2 ∂ b = 0 → ∑ i = 1 N α i = 0 , ∂ L 2 ∂ e i = 0 → α i = γ e i , i = 1 , … , N , ∂ L 2 ∂ α i = 0 → y i = w T ϕ ( x i ) + b + e i , i = 1 , … , N . {\displaystyle {\begin{cases}{\frac {\partial L_{2}}{\partial w}}=0\quad \to \quad w=\sum \limits _{i=1}^{N}\alpha _{i}\phi (x_{i}),\\{\frac {\partial L_{2}}{\partial b}}=0\quad \to \quad \sum \limits _{i=1}^{N}\alpha _{i}=0,\\{\frac {\partial L_{2}}{\partial e_{i}}}=0\quad \to \quad \alpha _{i}=\gamma e_{i},\;i=1,\ldots ,N,\\{\frac {\partial L_{2}}{\partial \alpha _{i}}}=0\quad \to \quad y_{i}=w^{T}\phi (x_{i})+b+e_{i},\,i=1,\ldots ,N.\end{cases}}} Elimination of w {\displaystyle w} and e {\displaystyle e} will yield a linear system instead of a quadratic programming problem: [ 0 1 N T 1 N Ω + γ − 1 I N ] [ b α ] = [ 0 Y ] , {\displaystyle \left[{\begin{matrix}0&1_{N}^{T}\\1_{N}&\Omega +\gamma ^{-1}I_{N}\end{matrix}}\right]\left[{\begin{matrix}b\\\alpha \end{matrix}}\right]=\left[{\begin{matrix}0\\Y\end{matrix}}\right],} with Y = [ y 1 , … , y N ] T {\displaystyle Y=[y_{1},\ldots ,y_{N}]^{T}} , 1 N = [ 1 , … , 1 ] T {\displaystyle 1_{N}=[1,\ldots ,1]^{T}} and α = [ α 1 , … , α N ] T {\displaystyle \alpha =[\alpha _{1},\ldots ,\alpha _{N}]^{T}} . Here, I N {\displaystyle I_{N}} is an N × N {\displaystyle N\times N} identity matrix, and Ω ∈ R N × N {\displaystyle \Omega \in \mathbb {R} ^{N\times N}} is the kernel matrix defined by Ω i j = ϕ ( x i ) T ϕ ( x j ) = K ( x i , x j ) {\displaystyle \Omega _{ij}=\phi (x_{i})^{T}\phi (x_{j})=K(x_{i},x_{j})} . === Kernel function K === For the kernel function K(•, •) one typically has the following choices: Linear kernel : K ( x , x i ) = x i T x , {\displaystyle K(x,x_{i})=x_{i}^{T}x,} Polynomial kernel of degree d {\displaystyle d} : K ( x , x i ) = ( 1 + x i T x / c ) d , {\displaystyle K(x,x_{i})=\left({1+x_{i}^{T}x/c}\right)^{d},} Radial basis function RBF kernel : K ( x , x i ) = exp ⁡ ( − ‖ x − x i ‖ 2 / σ 2 ) , {\displaystyle K(x,x_{i})=\exp \left({-\left\|{x-x_{i}}\right\|^{2}/\sigma ^{2}}\right),} MLP kernel : K ( x , x i ) = tanh ⁡ ( k x i T x + θ ) , {\displaystyle K(x,x_{i})=\tanh \left({k

    Read more →
  • Gremlin (query language)

    Gremlin (query language)

    Gremlin is a graph traversal language and virtual machine developed by Apache TinkerPop of the Apache Software Foundation. Gremlin works for both OLTP-based graph databases as well as OLAP-based graph processors. Gremlin's automata and functional language foundation enable Gremlin to naturally support imperative and declarative querying, host language agnosticism, user-defined domain specific languages, an extensible compiler/optimizer, single- and multi-machine execution models, and hybrid depth- and breadth-first evaluation with Turing completeness. As an explanatory analogy, Apache TinkerPop and Gremlin are to graph databases what the JDBC and SQL are to relational databases. Likewise, the Gremlin traversal machine is to graph computing as what the Java virtual machine is to general purpose computing. == History == 2009-10-30 the project is born, and immediately named "TinkerPop" 2009-12-25 v0.1 is the first release 2011-05-21 v1.0 is released 2012-05-24 v2.0 is released 2015-01-16 TinkerPop becomes an Apache Incubator project 2015-07-09 v3.0.0-incubating is released 2016-05-23 Apache TinkerPop becomes a top-level project 2016-07-18 v3.1.3 and v3.2.1 are first releases as Apache TinkerPop 2017-12-17 v3.3.1 is released 2018-05-08 v3.3.3 is released 2019-08-05 v3.4.3 is released 2020-02-20 v3.4.6 is released 2021-05-01 v3.5.0 is released 2022-04-04 v3.6.0 is released 2023-07-31 v3.7.0 is released 2025-11-12 v3.8.0 is released == Vendor integration == Gremlin is an Apache2-licensed graph traversal language that can be used by graph system vendors. There are typically two types of graph system vendors: OLTP graph databases and OLAP graph processors. The table below outlines those graph vendors that support Gremlin. == Traversal examples == The following examples of Gremlin queries and responses in a Gremlin-Groovy environment are relative to a graph representation of the MovieLens dataset. The dataset includes users who rate movies. Users each have one occupation, and each movie has one or more categories associated with it. The MovieLens graph schema is detailed below. === Simple traversals === For each vertex in the graph, emit its label, then group and count each distinct label. What year was the oldest movie made? What is Die Hard's average rating? === Projection traversals === For each category, emit a map of its name and the number of movies it represents. For each movie with at least 11 ratings, emit a map of its name and average rating. Sort the maps in decreasing order by their average rating. Emit the first 10 maps (i.e. top 10). === Declarative pattern matching traversals === Gremlin supports declarative graph pattern matching similar to SPARQL. For instance, the following query below uses Gremlin's match()-step. What 80's action movies do 30-something programmers like? Group count the movies by their name and sort the group count map in decreasing order by value. Clip the map to the top 10 and emit the map entries. === OLAP traversal === Which movies are most central in the implicit 5-stars graph? == Gremlin graph traversal machine == Gremlin is a virtual machine composed of an instruction set as well as an execution engine. An analogy is drawn between Gremlin and Java. === Gremlin steps (instruction set) === The following traversal is a Gremlin traversal in the Gremlin-Java8 dialect. The Gremlin language (i.e. the fluent-style of expressing a graph traversal) can be represented in any host language that supports function composition and function nesting. Due to this simple requirement, there exists various Gremlin dialects including Gremlin-Groovy, Gremlin-Scala, Gremlin-Clojure, etc. The above Gremlin-Java8 traversal is ultimately compiled down to a step sequence called a traversal. A string representation of the traversal above provided below. The steps are the primitives of the Gremlin graph traversal machine. They are the parameterized instructions that the machine ultimately executes. The Gremlin instruction set is approximately 30 steps. These steps are sufficient to provide general purpose computing and what is typically required to express the common motifs of any graph traversal query. Given that Gremlin is a language, an instruction set, and a virtual machine, it is possible to design another traversal language that compiles to the Gremlin traversal machine (analogous to how Scala compiles to the JVM). For instance, the popular SPARQL graph pattern match language can be compiled to execute on the Gremlin machine. The following SPARQL query would compile to In Gremlin-Java8, the SPARQL query above would be represented as below and compile to the identical Gremlin step sequence (i.e. traversal). === Gremlin Machine (virtual machine) === The Gremlin graph traversal machine can execute on a single machine or across a multi-machine compute cluster. Execution agnosticism allows Gremlin to run over both graph databases (OLTP) and graph processors (OLAP).

    Read more →