AI Email Follow Up

AI Email Follow Up — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • AI anthropomorphism

    AI anthropomorphism

    AI anthropomorphism is the attribution of human-like feelings, mental states, and behavioral characteristics to artificial intelligence systems. Factors related to the user of the AI – such as culture, age, education, gender, and personality traits – are also important determinants of the strength of anthropomorphic effects. Since the earliest days of AI development, humans have interpreted machine outputs through anthropomorphic frameworks, but the recent emergence of generative AI has amplified these tendencies. In research and engineering, there is a distinction between anthropomorphism and anthropomorphic design. The former is an innate human tendency toward non-human entities. The latter is the scientific community effort to “design anthropomorphism”. Such a design can involve the manipulation of cues, including AI appearance, behaviour and language. Contemporary AI systems today can generate extremely human-like outputs and are often designed specifically to do so, meaning that their anthropomorphic effects can be especially powerful. In some cases, anthropomorphism is accompanied with explicit beliefs that AI systems are capable of empathy, goodwill, understanding, or consciousness. == Background == === In early AIs === Views of artificial agents possessing a human-like intelligence have existed since the early development of computers in the mid-1900s. The use of the human mind as a metaphor for understanding the workings of machine systems was prevalent among researchers in the early days of computer science, with multiple influential works widely distributing the idea of intelligent machines. Among the most widely cited papers of this period was Alan Turing's "Computing Machinery and Intelligence" in which he introduced the Turing Test, stating that a machine was intelligent if it could produce conversation that was indistinguishable from that of a human. These academic works in the 1940s and 1950s gave early credibility to the idea that machine workings could be thought of similarly to human minds. The public quickly came to view artificial systems similarly, with often exaggerated conceptions of the capabilities of early machines. Among the most well-known demonstrations of this was through the chatbot ELIZA designed by Joseph Weizenbaum in 1966. ELIZA responded to user inputs with a rudimentary text-processing approach that could not be considered anything resembling true understanding of the inputs, yet users, even when operating with full conscious knowledge of ELIZA's limitations, often began to ascribe motivation and understanding to the program's output. Weizenbaum later wrote, "I had not realized ... that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people." Comparisons between the intellectual capabilities of artificial intelligence and human intelligence were continually intensified by the attempts of computer scientists to develop machines that could perform human tasks at a level equal to or better than humans. A symbolic turning point was achieved in 1997, when IBM's chess supercomputer Deep Blue defeated then-world champion Garry Kasparov in a highly publicized six-game match. The defeat of a human by a machine for the first time in chess – a game viewed as a canonical example of human intellect – and the media attention surrounding the match led to a significant shift, where views of parallels between human and artificial intelligence moved from abstract speculation to being concretely demonstrated. A similar achievement was reached in the board game Go in 2017, when the program AlphaGo defeated world top-ranked Ke Jie. === Large language models === The AI boom of the 2020s brought about the widespread emergence of generative AI; in particular, chatbots such as ChatGPT, Gemini, and Claude based on large language models (LLMs) have become increasingly pervasive in everyday society. These systems are notable for the fact that they are able to respond to a wide range of prompts across contexts while producing strikingly human-like outputs – research has shown that humans are often unable to distinguish human-generated text from AI-generated text, and modern AI chatbots have formally been shown to pass the Turing test. As such, the anthropomorphic effects of AI are more powerful than ever. Given that LLMs have brought AI into the technological mainstream, considerable scientific effort has been devoted in recent years to understand existing and potential ramifications of AI in the public sphere; the prevalence and effects of anthropomorphism is one of those domains where much of this effort has been directed. == Current anthropomorphic attributions == === In the general public === Surveys have shown that a substantial portion of the public attributes human-like qualities to AI. In one sample of U.S. adults from 2024, two-thirds of people believed that ChatGPT is possibly conscious on some level, though other research has shown that the public still views the likelihood itself of AI consciousness as comparatively low. Another study conducted in 2025 found that women, people of color, and older individuals were most likely to anthropomorphize AI, as well as that – in general – humans view AIs as warm and competent, and anthropomorphic attributions to AI had increased by 34% in the past year. A YouGov poll reported that 46% of Americans believe that people should display politeness to AI chatbots by saying "please" and "thank you", demonstrating the application of social norms to AI. These beliefs extend to behavior, where majorities of AI users claim to always be polite to chatbots; of those who behave politely, most say they do so simply because it is the "nice" thing to do. In many recent cases, humans have developed robust interpersonal bonds with AI systems. For example: users of social chatbots like Replika and Character.ai have been documented to fall in love with the AIs, or to otherwise treat the AIs as intimate companions, and it has become increasingly common for individuals to use LLMs like ChatGPT as therapists. Chatbots are able to produce responses deeply attuned to users, as they are often designed to maximize agreeableness and mirror users' emotions; this can create compelling illusions of intimacy. === In the research community === In many cases, even AI researchers anthropomorphize AI systems in some capacity. Among the most extreme and well-publicized of these instances occurred in 2022, when engineer Blake Lemoine publicly claimed that Google's LLM LaMDA was conscious. Lemoine published the transcript of a conversation he had had with LaMDA regarding self identity and morality which he claimed was evidence of its sentience; he asserted that LaMDA was "a person" as defined by the United States Constitution and compared its mental capability to that of a 7- or 8-year-old. Lemoine's claims were widely dismissed by the scientific community and by Google itself, which described Lemoine's conclusions as "wholly unfounded" and fired him on the grounds that he had violated policies "to safeguard product information". It is much more common that AI researchers unintentionally imply humanness of AI through the ordinary use of anthropomorphic language to describe nonhuman agents. This kind of language, which Daniel Dennett coined the "intentional stance", is very common in everyday life in a variety of different contexts (e.g., "My computer doesn't want to turn on today"). For AI agents that may actually appear to very closely replicate some human abilities, however, the casual use of such anthropomorphic language in research has been scrutinized for being potentially misleading to the public. As early as 1976, Drew McDermott criticized the research community for the use of "wishful mnemonics", where AIs were referred to with terms like "understand" and "learn". In the LLM era, these criticisms have further intensified, with the negative effects of AI anthropomorphism in the public posing an especially salient danger given the elevated accessibility of modern AI. In some cases, the use of anthropomorphic language for AI is not unintentional, but is willfully used by researchers in order to promote better understanding of the brain – the idea being that, as AI can be functionally similar in some ways to the human brain, we may gain new insights and ideas from treating AI as a kind of model of the brain's workings. In particular, deep neuronal networks (DNNs) are often explicitly compared to the human brain, and significant advances in DNN research have stirred considerable enthusiasm about the ability of AI to emulate the human abilities. Caution has been urged in this domain as well, however; the use of anthropomorphic language can mask important differences that fundamentally distinguish AI from human intelligence. When it comes to DNNs, for example, it has been pointed out that they are still structurally quite different

    Read more →
  • Google Books Ngram Viewer

    Google Books Ngram Viewer

    The Google Books Ngram Viewer is an online search engine that charts the frequencies of any set of search strings using a yearly count of n-grams found in printed sources published between 1500 and 2022 in Google's text corpora in English, Chinese (simplified), French, German, Hebrew, Italian, Russian, or Spanish. There are also some specialized English corpora, such as American English, British English, and English Fiction. The program can search for a word or a phrase. The n-grams are matched with the text within the selected corpus, and if found in 40 or more books, are then displayed as a graph. The program supports searches for parts of speech and wildcards. It is routinely used in research. == History == The Ngram Viewer was created by Google software engineers Will Brockman and Jon Orwant , who teamed up with Harvard researchers Jean-Baptiste Michel and Erez Lieberman Aiden. The service was released on December 16, 2010. Before the release, it was difficult to quantify the rate of linguistic change because of the absence of a database that was designed for this purpose, said Steven Pinker, a well-known linguist who was one of the co-authors of the Science paper published on the same day. The Google Books Ngram Viewer was developed in the hope of opening a new window to quantitative research in the humanities field, and the database contained 500 billion words from 5.2 million books publicly available from the very beginning. The intended audience was scholarly, but the Google Books Ngram Viewer made it possible for anyone with a computer to see a graph that represents the diachronic change of the use of words and phrases with ease. Lieberman said in response to The New York Times that the developers aimed to provide even children with the ability to browse cultural trends throughout history. In the Science paper, Lieberman and his collaborators called the method of high-volume data analysis in digitized texts "culturomics". == Usage == Commas delimit user-entered search terms, where each comma-separated term is searched in the database as an n-gram (for example, "nursery school" is a 2-gram or bigram). The Ngram Viewer then returns a plotted line chart. Due to limitations on the size of the Ngram database, only matches found in at least 40 books are indexed. == Limitations == The data sets of the Ngram Viewer have been criticized for their reliance upon inaccurate optical character recognition (OCR) and for including large numbers of incorrectly dated and categorized texts. Because of these errors, and because they are uncontrolled for bias (such as the increasing amount of scientific literature, which causes other terms to appear to decline in popularity), care must be taken in using the corpora to study language or test theories. Furthermore, the data sets may not reflect general linguistic or cultural change and can only hint at such an effect because they do not involve any metadata like date published, author, length, or genre, to avoid any potential copyright infringements. Systemic errors like the confusion of s and f in pre-19th century texts (due to the use of ſ, the long s, which is similar in appearance to f) can cause systemic bias. Although the Google Books team claims that the results are reliable from 1800 onwards, poor OCR and insufficient data mean that frequencies given for languages such as Chinese may only be accurate from 1970 onward, with earlier parts of the corpus showing no results at all for common terms, and data for some years containing more than 50% noise. Guidelines for doing research with data from Google Ngram have been proposed that try to address some of the issues discussed above.

    Read more →
  • Meesho

    Meesho

    Meesho Limited (short for Meri shop, transl. My shop) is an Indian e-commerce company, headquartered in Bengaluru. Founded by Vidit Aatrey and Sanjeev Barnwal in December 2015, Meesho is an online marketplace in categories such as fashion, home and kitchen, beauty and personal care, electronics accessories, and daily use products. == History == Meesho Private Limited, formerly Fashnear Technologies Private Limited, was established by IIT Delhi graduates Vidit Aatrey and Sanjeev Barnwal in December, 2015 In 2016, the founders came up with the idea of re-establishing the platform as Meesho, one that would enable country-wide shipping for resellers with the use of social media sites as tools for marketing. In February 2019, the platform reported having around 209,000 users and about 1.2 million monthly orders, and in March 2020, it reported approximately 563,000 users and 3.1 million monthly orders. In 2021, the Meesho mobile application was ranked among the most downloaded shopping apps globally. In 2022, Meesho had about 120 million monthly users and about 910 million orders were made through the platform, with a gross merchandise value (GMV) of about $5 billion. According to report as of August 2023 Meesho delisted 42 lakh counterfeit listings and 10 lakh restricted products under its initiative Project Suraksha. During the same period, the platform blocked access for over 12,000 user accounts flagged for policy violations. The Court granted injunctive relief by directing domain registrars to suspend the infringing websites. Additionally, the Court ordered law enforcement authorities to initiate criminal investigations, freeze associated financial accounts against the identified offenders. In 2023, Meesho became the fastest shopping app to cross over 500 million downloads. In 2024, Meesho introduced Valmo, a logistics marketplace, to provide shipment services to sellers by aggregating multiple logistics providers. Meesho employs over 3,000 small businesses and 10-12 large firms for warehousing and sorting operations within its logistics framework. According to media reports, Valmo operating in approximately 15,000 pincodes in India with around 6,000 partners. It is reported to handle over 50% of Meesho's daily orders. In November 2024, Meesho introduced a generative AI-powered voice bot for customer support, managing approximately 60,000 calls daily in English and Hindi. According to media reports, the system resolves the majority of queries without human assistance, with only a small fraction of calls requiring manual intervention. According to media reports, in 2024, Meesho prevented over 22 million suspicious or potentially fraudulent transactions on its platform. The company initiated legal proceedings, resulting in the filing of twelve cases, including nine specifically targeting over forty individuals in the cities of Kolkata and Ranchi. The company filed a suit in the Delhi High Court for a permanent injunction against parties operating deceptive websites misappropriating its brand identity. Meesha went public through an initial public offering in December 2025, raising $603 million. It is listed on both the BSE and NSE. == Recognition == In 2023, Meesho was named one of the most influential companies of the year by Time (magazine).

    Read more →
  • Pooling layer

    Pooling layer

    In neural networks, a pooling layer is a kind of network layer that downsamples and aggregates information that is dispersed among many vectors into fewer vectors. It has several uses. It removes redundant information, thus reducing the amount of computation and memory required, which makes the model more robust to small variations in the input; and it increases the receptive field of neurons in later layers in the network. == Convolutional neural network pooling == Pooling is most commonly used in convolutional neural networks (CNN). Below is a description of pooling in 2-dimensional CNNs. The generalization to n-dimensions is immediate. As notation, we consider a tensor x ∈ R H × W × C {\displaystyle x\in \mathbb {R} ^{H\times W\times C}} , where H {\displaystyle H} is height, W {\displaystyle W} is width, and C {\displaystyle C} is the number of channels. A pooling layer outputs a tensor y ∈ R H ′ × W ′ × C ′ {\displaystyle y\in \mathbb {R} ^{H'\times W'\times C'}} . We define two variables f , s {\displaystyle f,s} called "filter size" (aka "kernel size") and "stride". Sometimes, it is necessary to use a different filter size and stride for horizontal and vertical directions. In such cases, we define 4 variables: f H , f W , s H , s W {\displaystyle f_{H},f_{W},s_{H},s_{W}} . The receptive field of an entry in the output tensor, y {\displaystyle y} , are all the entries in x {\displaystyle x} that can affect that entry. === Max pooling === Max Pooling (MaxPool) is commonly used in CNNs to reduce the spatial dimensions of feature maps. Define M a x P o o l ( x | f , s ) 0 , 0 , 0 = max ( x 0 : f − 1 , 0 : f − 1 , 0 ) {\displaystyle \mathrm {MaxPool} (x|f,s)_{0,0,0}=\max(x_{0:f-1,0:f-1,0})} where 0 : f − 1 {\displaystyle 0:f-1} means the range 0 , 1 , … , f − 1 {\displaystyle 0,1,\dots ,f-1} . Note that we need to avoid the off-by-one error. The next input is M a x P o o l ( x | f , s ) 1 , 0 , 0 = max ( x s : s + f − 1 , 0 : f − 1 , 0 ) {\displaystyle \mathrm {MaxPool} (x|f,s)_{1,0,0}=\max(x_{s:s+f-1,0:f-1,0})} and so on. The receptive field of y i , j , c {\displaystyle y_{i,j,c}} is x i s + f − 1 , j s + f − 1 , c {\displaystyle x_{is+f-1,js+f-1,c}} , so in general, M a x P o o l ( x | f , s ) i , j , c = m a x ( x i s : i s + f − 1 , j s : j s + f − 1 , c ) {\displaystyle \mathrm {MaxPool} (x|f,s)_{i,j,c}=\mathrm {max} (x_{is:is+f-1,js:js+f-1,c})} If the horizontal and vertical filter size and strides differ, then in general, M a x P o o l ( x | f , s ) i , j , c = m a x ( x i s H : i s H + f H − 1 , j s W : j s W + f W − 1 , c ) {\displaystyle \mathrm {MaxPool} (x|f,s)_{i,j,c}=\mathrm {max} (x_{is_{H}:is_{H}+f_{H}-1,js_{W}:js_{W}+f_{W}-1,c})} More succinctly, we can write y k = max ( { x k ′ | k ′ in the receptive field of k } ) {\displaystyle y_{k}=\max(\{x_{k'}|k'{\text{ in the receptive field of }}k\})} . If H {\displaystyle H} is not expressible as k s + f {\displaystyle ks+f} where k {\displaystyle k} is an integer, then for computing the entries of the output tensor on the boundaries, max pooling would attempt to take as inputs variables off the tensor. In this case, how those non-existent variables are handled depends on the padding conditions, illustrated on the right. Global Max Pooling (GMP) is a specific kind of max pooling where the output tensor has shape R C {\displaystyle \mathbb {R} ^{C}} and the receptive field of y c {\displaystyle y_{c}} is all of x 0 : H , 0 : W , c {\displaystyle x_{0:H,0:W,c}} . That is, it takes the maximum over each entire channel. It is often used just before the final fully connected layers in a CNN classification head. === Average pooling === Average pooling (AvgPool) is similarly defined A v g P o o l ( x | f , s ) i , j , c = a v e r a g e ( x i s : i s + f − 1 , j s : j s + f − 1 , c ) = 1 f 2 ∑ k ∈ i s : i s + f − 1 ∑ l ∈ j s : j s + f − 1 x k , l , c {\displaystyle \mathrm {AvgPool} (x|f,s)_{i,j,c}=\mathrm {average} (x_{is:is+f-1,js:js+f-1,c})={\frac {1}{f^{2}}}\sum _{k\in is:is+f-1}\sum _{l\in js:js+f-1}x_{k,l,c}} Global Average Pooling (GAP) is defined similarly to GMP. It was first proposed in Network-in-Network. Similarly to GMP, it is often used just before the final fully connected layers in a CNN classification head. === Interpolations === There are some interpolations of max pooling and average pooling. Mixed Pooling is a linear sum of max pooling and average pooling. That is, M i x e d P o o l ( x | f , s , w ) = w M a x P o o l ( x | f , s ) + ( 1 − w ) A v g P o o l ( x | f , s ) {\displaystyle \mathrm {MixedPool} (x|f,s,w)=w\mathrm {MaxPool} (x|f,s)+(1-w)\mathrm {AvgPool} (x|f,s)} where w ∈ [ 0 , 1 ] {\displaystyle w\in [0,1]} is either a hyperparameter, a learnable parameter, or randomly sampled anew every time. Lp Pooling is similar to average pooling, but uses Lp norm average instead of average: y k = ( 1 N ∑ k ′ in the receptive field of k | x k ′ | p ) 1 / p {\displaystyle y_{k}=\left({\frac {1}{N}}\sum _{k'{\text{ in the receptive field of }}k}|x_{k'}|^{p}\right)^{1/p}} where N {\displaystyle N} is the size of receptive field, and p ≥ 1 {\displaystyle p\geq 1} is a hyperparameter. If all activations are non-negative, then average pooling is the case of p = 1 {\displaystyle p=1} , and max pooling is the case of p → ∞ {\displaystyle p\to \infty } . Square-root pooling is the case of p = 2 {\displaystyle p=2} . Stochastic pooling samples a random activation x k ′ {\displaystyle x_{k'}} from the receptive field with probability x k ′ ∑ k ″ x k ″ {\displaystyle {\frac {x_{k'}}{\sum _{k''}x_{k''}}}} . It is the same as average pooling in expectation. Softmax pooling is like max pooling, but uses softmax, i.e. ∑ k ′ e β x k ′ x k ′ ∑ k ″ e β x k ″ {\displaystyle {\frac {\sum _{k'}e^{\beta x_{k'}}x_{k'}}{\sum _{k''}e^{\beta x_{k''}}}}} where β > 0 {\displaystyle \beta >0} . Average pooling is the case of β ↓ 0 {\displaystyle \beta \downarrow 0} , and max pooling is the case of β ↑ ∞ {\displaystyle \beta \uparrow \infty } Local Importance-based Pooling generalizes softmax pooling by ∑ k ′ e g ( x k ′ ) x k ′ ∑ k ″ e g ( x k ″ ) {\displaystyle {\frac {\sum _{k'}e^{g(x_{k'})}x_{k'}}{\sum _{k''}e^{g(x_{k''})}}}} where g {\displaystyle g} is a learnable function. === Other poolings === Spatial pyramidal pooling applies max pooling (or any other form of pooling) in a pyramid structure. That is, it applies global max pooling, then applies max pooling to the image divided into 4 equal parts, then 16, etc. The results are then concatenated. It is a hierarchical form of global pooling, and similar to global pooling, it is often used just before a classification head. Region of Interest Pooling (also known as RoI pooling) is a variant of max pooling used in R-CNNs for object detection. It is designed to take an arbitrarily-sized input matrix, and output a fixed-sized output matrix. Covariance pooling computes the covariance matrix of the vectors { x k , l , 0 : C − 1 } k ∈ i s : i s + f − 1 , l ∈ j s : j s + f − 1 {\displaystyle \{x_{k,l,0:C-1}\}_{k\in is:is+f-1,l\in js:js+f-1}} which is then flattened to a C 2 {\displaystyle C^{2}} -dimensional vector y i , j , 0 : C 2 − 1 {\displaystyle y_{i,j,0:C^{2}-1}} . Global covariance pooling is used similarly to global max pooling. As average pooling computes the average, which is a first-degree statistic, and covariance is a second-degree statistic, covariance pooling is also called "second-order pooling". It can be generalized to higher-order poolings. Blur Pooling means applying a blurring method before downsampling. For example, the Rect-2 blur pooling means taking an average pooling at f = 2 , s = 1 {\displaystyle f=2,s=1} , then taking every second pixel (identity with s = 2 {\displaystyle s=2} ). == Vision Transformer pooling == In Vision Transformers (ViT), there are the following common kinds of poolings. BERT-like pooling uses a dummy [CLS] token, "classification". For classification, the output at [CLS] is the classification token, which is then processed by a LayerNorm-feedforward-softmax module into a probability distribution, which is the network's prediction of class probability distribution. This is the one used by the original ViT and Masked Autoencoder. Global average pooling (GAP) does not use the dummy token, but simply takes the average of all output tokens as the classification token. It was mentioned in the original ViT as being equally good. Multihead attention pooling (MAP) applies a multi headed attention block to pooling. Specifically, it takes as input a list of vectors x 1 , x 2 , … , x n {\displaystyle x_{1},x_{2},\dots ,x_{n}} , which might be thought of as the output vectors of a layer of a ViT. It then applies a feedforward layer F F N {\displaystyle \mathrm {FFN} } on each vector, resulting in a matrix V = [ F F N ( v 1 ) , … , F F N ( v n ) ] {\displaystyle V=[\mathrm {FFN} (v_{1}),\dots ,\mathrm {FFN} (v_{n})]} . This is then sent to a multi-headed attention, resulting in M u l t i h e a d e d A

    Read more →
  • Structural risk minimization

    Structural risk minimization

    Structural risk minimization (SRM) is an inductive principle of use in machine learning. Commonly in machine learning, a generalized model must be selected from a finite data set, with the consequent problem of overfitting – the model becoming too strongly tailored to the particularities of the training set and generalizing poorly to new data. The SRM principle addresses this problem by balancing the model's complexity against its success at fitting the training data. This principle was first set out in a 1974 book by Vladimir Vapnik and Alexey Chervonenkis and uses the VC dimension. In practical terms, Structural Risk Minimization is implemented by minimizing E t r a i n + β H ( W ) {\displaystyle E_{train}+\beta H(W)} , where E t r a i n {\displaystyle E_{train}} is the train error, the function H ( W ) {\displaystyle H(W)} is called a regularization function, and β {\displaystyle \beta } is a constant. H ( W ) {\displaystyle H(W)} is chosen such that it takes large values on parameters W {\displaystyle W} that belong to high-capacity subsets of the parameter space. Minimizing H ( W ) {\displaystyle H(W)} in effect limits the capacity of the accessible subsets of the parameter space, thereby controlling the trade-off between minimizing the training error and minimizing the expected gap between the training error and test error. The SRM problem can be formulated in terms of data. Given n data points consisting of data x and labels y, the objective J ( θ ) {\displaystyle J(\theta )} is often expressed in the following manner: J ( θ ) = 1 2 n ∑ i = 1 n ( h θ ( x i ) − y i ) 2 + λ 2 ∑ j = 1 d θ j 2 {\displaystyle J(\theta )={\frac {1}{2n}}\sum _{i=1}^{n}(h_{\theta }(x^{i})-y^{i})^{2}+{\frac {\lambda }{2}}\sum _{j=1}^{d}\theta _{j}^{2}} The first term is the mean squared error (MSE) term between the value of the learned model, h θ {\displaystyle h_{\theta }} , and the given labels y {\displaystyle y} . This term is the training error, E t r a i n {\displaystyle E_{train}} , that was discussed earlier. The second term, places a prior over the weights, to favor sparsity and penalize larger weights. The trade-off coefficient, λ {\displaystyle \lambda } , is a hyperparameter that places more or less importance on the regularization term. Larger λ {\displaystyle \lambda } encourages sparser weights at the expense of a more optimal MSE, and smaller λ {\displaystyle \lambda } relaxes regularization allowing the model to fit to data. Note that as λ → ∞ {\displaystyle \lambda \to \infty } the weights become zero, and as λ → 0 {\displaystyle \lambda \to 0} , the model typically suffers from overfitting.

    Read more →
  • Speculative decoding

    Speculative decoding

    Speculative decoding is an inference-time optimization for autoregressive large language models (LLMs) that generates multiple tokens per decoding step instead of one. A smaller draft model proposes a sequence of candidate tokens, and the larger target model verifies them in a single forward pass through a modified rejection sampling scheme. The verification preserves the target model's original output distribution, so the technique produces the same results as standard decoding while cutting latency by roughly two to three times. The name is an analogy to speculative execution in CPU design, where a processor runs instructions along a predicted branch before the outcome is known. == Background == Standard autoregressive decoding in large language models generates one token at a time. The model computes a probability distribution over its vocabulary, samples the next token, and feeds that token back as input. For large models, this process is bottlenecked by memory bandwidth rather than arithmetic throughput: loading the model's parameters from high-bandwidth memory (HBM) to the processor takes up most of the wall-clock time at each step. Because of this, a forward pass over one token and a forward pass over several tokens in a batch take roughly the same time. Speculative decoding relies on this property. == Mechanism == The technique alternates between two phases: drafting and verification. During drafting, a fast approximation model generates a short run of K candidate tokens, typically between 3 and 12. The draft model is usually a much smaller version of the target model or a lightweight auxiliary network. During verification, the target model scores the entire draft sequence in one batched forward pass. A modified rejection sampling algorithm compares the draft and target probabilities at each position. If the target model would have been at least as likely to produce a given token, that token is accepted; the first token that fails is resampled from a corrected distribution, and everything after it is thrown out. The result is that the output distribution is the same as if each token had been generated one at a time. How many tokens get accepted per cycle depends on how well the draft model matches the target. For common words and predictable continuations the match tends to be good, so the target model can confirm several tokens at once. == History == An early precursor was blockwise parallel decoding, proposed in 2018 by Stern, Shazeer, and Uszkoreit. Their method predicted multiple future tokens through auxiliary prediction heads and validated them against the autoregressive model, but it only worked with greedy decoding and did not preserve the full sampling distribution. The modern form of the technique came from Yaniv Leviathan, Matan Kalman, and Yossi Matias at Google Research, who posted "Fast Inference from Transformers via Speculative Decoding" on arXiv in November 2022. Separately and at about the same time, Charlie Chen and colleagues at DeepMind arrived at a closely related method they called speculative sampling, published in February 2023. Both papers introduced the use of rejection sampling to guarantee that the output distribution is unchanged. Leviathan et al. showed roughly 2–3x speedup on T5-XXL (11 billion parameters); Chen et al. reported 2–2.5x on the Chinchilla model (70 billion parameters). The Leviathan et al. paper was presented as an oral at the International Conference on Machine Learning in July 2023. == Variants == SpecInfer (Miao et al., 2024) uses multiple small language models to jointly build a tree of candidate continuations rather than a single chain. The target model verifies the whole tree in parallel and keeps the longest valid path, with reported speedups of 1.5–3.5x. Medusa (Cai et al., 2024) takes a different approach by not using a separate draft model at all. Extra lightweight decoding heads are attached to the target model itself, and each one predicts a token at a different future position. The candidates are evaluated through a tree-structured attention mechanism. The authors measured 2.2–3.6x speedup. EAGLE (Li et al., 2024) performs autoregression on the target model's internal feature representations (specifically the second-to-top layer) rather than on tokens directly. On LLaMA 2 Chat 70B, this gave a 2.7–3.5x latency reduction. Later versions added dynamic draft trees (EAGLE-2) and further optimizations (EAGLE-3), reaching 3–6.5x speedup. == Adoption == By 2024, speculative decoding had become a standard part of production LLM serving. Google uses it in the AI Overviews feature of Google Search. Open-source inference frameworks such as vLLM, NVIDIA's TensorRT-LLM, and SGLang all include built-in support for speculative decoding and its variants. Apple, AWS, and Meta have also published research extending the method or deploying it at scale.

    Read more →
  • Computational photography

    Computational photography

    Computational photography refers to digital image capture and processing techniques that use digital computation instead of optical processes. Computational photography can improve the capabilities of a camera, or introduce features that were not possible at all with film-based photography, or reduce the cost or size of camera elements. Examples of computational photography include in-camera computation of digital panoramas, high-dynamic-range images, and light field cameras. Light field cameras use novel optical elements to capture three-dimensional scene information, which can then be used to produce 3D images, enhanced depth-of-field, and selective de-focusing (or "post focus"). Enhanced depth-of-field reduces the need for mechanical focusing systems. All of these features use computational imaging techniques. The definition of computational photography has evolved to cover a number of subject areas in computer graphics, computer vision, and applied optics. These areas are given below, organized according to a taxonomy proposed by Shree K. Nayar. Within each area is a list of techniques, and for each technique, one or two representative papers or books are cited. Deliberately omitted from the taxonomy are image processing (see also digital image processing) techniques applied to traditionally captured images to produce better images. Examples of such techniques are image scaling, dynamic range compression (i.e. tone mapping), color management, image completion (a.k.a. inpainting or hole filling), image compression, digital watermarking, and artistic image effects. Also omitted are techniques that produce range data, volume data, 3D models, 4D light fields, 4D, 6D, or 8D BRDFs, or other high-dimensional image-based representations. Epsilon photography is a sub-field of computational photography. == Effect on photography == Photos taken using computational photography can allow amateurs to produce photographs rivalling the quality of professional photographers, but as of 2019 do not outperform the use of professional-level equipment. == Computational illumination == This is controlling photographic illumination in a structured fashion, then processing the captured images, to create new images. The applications include image-based relighting, image enhancement, image deblurring, geometry/material recovery and so forth. High-dynamic-range imaging uses differently exposed pictures of the same scene to extend dynamic range. Other examples include processing and merging differently illuminated images of the same subject matter ("lightspace"). == Computational optics == This is a capture of optically coded images, followed by computational decoding to produce new images. Coded aperture imaging was mainly applied in astronomy and X-ray imaging to boost the image quality. Instead of a single pin-hole, a pinhole pattern is applied in imaging, and deconvolution is performed to recover the image. In coded exposure imaging, the on/off state of the shutter is coded to modify the kernel of motion blur. In this way, motion deblurring becomes a well-conditioned problem. Similarly, in a lens based coded aperture, the aperture can be modified by inserting a broadband mask. Thus, out of focus deblurring becomes a well-conditioned problem. The coded aperture can also improve the quality in light field acquisition using Hadamard transform optics. Coded aperture patterns can also be designed using color filters, in order to apply different codes at different wavelengths. This allows for increase the amount of light that reaches the camera sensor, compared to binary masks. == Computational imaging == Computational imaging is a set of imaging techniques that combine data acquisition and data processing to create the image of an object through indirect means to yield enhanced resolution, additional information such as optical phase or 3D reconstruction. The information is often recorded without using a conventional optical microscope configuration or with limited datasets. Computational imaging allows going beyond physical limitations of optical systems, such as numerical aperture, or even obliterates the need for optical elements. For parts of the optical spectrum where imaging elements such as objectives are difficult to manufacture or image sensors cannot be miniaturized, computational imaging provides useful alternatives, in fields such as X-ray and THz radiations. === Common techniques === Among common computational imaging techniques are lensless imaging, computational speckle imaging , ptychography and Fourier ptychography. Computational imaging technique often draws on compressive sensing or phase retrieval techniques, where the angular spectrum of the object is reconstructed. Other techniques are related to the field of computational imaging, such as digital holography, computer vision and inverse problems such as tomography. == Computational processing == This is the processing of non-optically-coded images to produce new images. == Computational sensors == These are detectors that combine sensing and processing, typically in hardware, like the oversampled binary image sensor. == Early work in computer vision == Although computational photography is a currently popular buzzword in computer graphics, many of its techniques first appeared in the computer vision literature, either under other names or within papers aimed at 3D shape analysis. == Art history == Computational photography, as an art form, has been practiced by capturing differently exposed pictures of the same subject matter and combining them. This was the inspiration for the development of the wearable computer in the 1970s and early 1980s. Computational photography was inspired by the work of Charles Wyckoff, and thus computational photography datasets (e.g. differently exposed pictures of the same subject matter that are taken in order to make a single composite image) are sometimes referred to as Wyckoff Sets, in his honor. Early work in this area (joint estimation of image projection and exposure value) was undertaken by Mann and Candoccia. Charles Wyckoff devoted much of his life to creating special kinds of 3-layer photographic films that captured different exposures of the same subject matter. A picture of a nuclear explosion, taken on Wyckoff's film, appeared on the cover of Life Magazine and showed the dynamic range from the dark outer areas to the inner core.

    Read more →
  • Weak supervision

    Weak supervision

    Weak supervision (also known as semi-supervised learning) is a paradigm in machine learning, the relevance and notability of which increased with the advent of large language models due to the large amount of data required to train them. It is characterized by using a combination of a small amount of human-labeled data (exclusively used in more expensive and time-consuming supervised learning paradigm), followed by a large amount of unlabeled data (used exclusively in unsupervised learning paradigm). In other words, the desired output values are provided only for a subset of the training data. The remaining data is unlabeled or imprecisely labeled. Intuitively, it can be seen as an exam and labeled data as sample problems that the teacher solves for the class as an aid in solving another set of problems. In the transductive setting, these unsolved problems act as exam questions. In the inductive setting, they become practice problems of the sort that will make up the exam. == Problem == The acquisition of labeled data for a learning problem often requires a skilled human agent (e.g. to transcribe an audio segment) or a physical experiment (e.g. determining the 3D structure of a protein or determining whether there is oil at a particular location). The cost associated with the labeling process thus may render large, fully labeled training sets infeasible, whereas acquisition of unlabeled data is relatively inexpensive. In such situations, semi-supervised learning can be of great practical value. Semi-supervised learning is also of theoretical interest in machine learning and as a model for human learning. == Technique == More formally, semi-supervised learning assumes a set of l {\displaystyle l} independently identically distributed examples x 1 , … , x l ∈ X {\displaystyle x_{1},\dots ,x_{l}\in X} with corresponding labels y 1 , … , y l ∈ Y {\displaystyle y_{1},\dots ,y_{l}\in Y} and u {\displaystyle u} unlabeled examples x l + 1 , … , x l + u ∈ X {\displaystyle x_{l+1},\dots ,x_{l+u}\in X} are processed. Semi-supervised learning combines this information to surpass the classification performance that can be obtained either by discarding the unlabeled data and doing supervised learning or by discarding the labels and doing unsupervised learning. Semi-supervised learning may refer to either transductive learning or inductive learning. The goal of transductive learning is to infer the correct labels for the given unlabeled data x l + 1 , … , x l + u {\displaystyle x_{l+1},\dots ,x_{l+u}} only. The goal of inductive learning is to infer the correct mapping from X {\displaystyle X} to Y {\displaystyle Y} . It is unnecessary (and, according to Vapnik's principle, imprudent) to perform transductive learning by way of inferring a classification rule over the entire input space; however, in practice, algorithms formally designed for transduction or induction are often used interchangeably. == Assumptions == In order to make any use of unlabeled data, some relationship to the underlying distribution of data must exist. Semi-supervised learning algorithms make use of at least one of the following assumptions: === Continuity / smoothness assumption === Points that are close to each other are more likely to share a label. This is also generally assumed in supervised learning and yields a preference for geometrically simple decision boundaries. In the case of semi-supervised learning, the smoothness assumption additionally yields a preference for decision boundaries in low-density regions, so few points are close to each other but in different classes. === Cluster assumption === The data tend to form discrete clusters, and points in the same cluster are more likely to share a label (although data that shares a label may spread across multiple clusters). This is a special case of the smoothness assumption and gives rise to feature learning with clustering algorithms. === Manifold assumption === The data lie approximately on a manifold of much lower dimension than the input space. In this case learning the manifold using both the labeled and unlabeled data can avoid the curse of dimensionality. Then learning can proceed using distances and densities defined on the manifold. The manifold assumption is practical when high-dimensional data are generated by some process that may be hard to model directly, but which has only a few degrees of freedom. For instance, human voice is controlled by a few vocal folds, and images of various facial expressions are controlled by a few muscles. In these cases, it is better to consider distances and smoothness in the natural space of the generating problem, rather than in the space of all possible acoustic waves or images, respectively. == History == The heuristic approach of self-training (also known as self-learning or self-labeling) is historically the oldest approach to semi-supervised learning, with examples of applications starting in the 1960s. The transductive learning framework was formally introduced by Vladimir Vapnik in the 1970s. Interest in inductive learning using generative models also began in the 1970s. A probably approximately correct learning bound for semi-supervised learning of a Gaussian mixture was demonstrated by Ratsaby and Venkatesh in 1995. == Methods == === Generative models === Generative approaches to statistical learning first seek to estimate p ( x | y ) {\displaystyle p(x|y)} , the distribution of data points belonging to each class. The probability p ( y | x ) {\displaystyle p(y|x)} that a given point x {\displaystyle x} has label y {\displaystyle y} is then proportional to p ( x | y ) p ( y ) {\displaystyle p(x|y)p(y)} by Bayes' rule. Semi-supervised learning with generative models can be viewed either as an extension of supervised learning (classification plus information about p ( x ) {\displaystyle p(x)} ) or as an extension of unsupervised learning (clustering plus some labels). Generative models assume that the distributions take some particular form p ( x | y , θ ) {\displaystyle p(x|y,\theta )} parameterized by the vector θ {\displaystyle \theta } . If these assumptions are incorrect, the unlabeled data may actually decrease the accuracy of the solution relative to what would have been obtained from labeled data alone. However, if the assumptions are correct, then the unlabeled data necessarily improves performance. The unlabeled data are distributed according to a mixture of individual-class distributions. In order to learn the mixture distribution from the unlabeled data, it must be identifiable, that is, different parameters must yield different summed distributions. Gaussian mixture distributions are identifiable and commonly used for generative models. The parameterized joint distribution can be written as p ( x , y | θ ) = p ( y | θ ) p ( x | y , θ ) {\displaystyle p(x,y|\theta )=p(y|\theta )p(x|y,\theta )} by using the chain rule. Each parameter vector θ {\displaystyle \theta } is associated with a decision function f θ ( x ) = argmax y p ( y | x , θ ) {\displaystyle f_{\theta }(x)={\underset {y}{\operatorname {argmax} }}\ p(y|x,\theta )} . The parameter is then chosen based on fit to both the labeled and unlabeled data, weighted by λ {\displaystyle \lambda } : argmax Θ ( log ⁡ p ( { x i , y i } i = 1 l | θ ) + λ log ⁡ p ( { x i } i = l + 1 l + u | θ ) ) {\displaystyle {\underset {\Theta }{\operatorname {argmax} }}\left(\log p(\{x_{i},y_{i}\}_{i=1}^{l}|\theta )+\lambda \log p(\{x_{i}\}_{i=l+1}^{l+u}|\theta )\right)} === Low-density separation === Another major class of methods attempts to place boundaries in regions with few data points (labeled or unlabeled). One of the most commonly used algorithms is the transductive support vector machine, or TSVM (which, despite its name, may be used for inductive learning as well). Whereas support vector machines for supervised learning seek a decision boundary with maximal margin over the labeled data, the goal of TSVM is a labeling of the unlabeled data such that the decision boundary has maximal margin over all of the data. In addition to the standard hinge loss ( 1 − y f ( x ) ) + {\displaystyle (1-yf(x))_{+}} for labeled data, a loss function ( 1 − | f ( x ) | ) + {\displaystyle (1-|f(x)|)_{+}} is introduced over the unlabeled data by letting y = sign ⁡ f ( x ) {\displaystyle y=\operatorname {sign} {f(x)}} . TSVM then selects f ∗ ( x ) = h ∗ ( x ) + b {\displaystyle f^{}(x)=h^{}(x)+b} from a reproducing kernel Hilbert space H {\displaystyle {\mathcal {H}}} by minimizing the regularized empirical risk: f ∗ = argmin f ( ∑ i = 1 l ( 1 − y i f ( x i ) ) + + λ 1 ‖ h ‖ H 2 + λ 2 ∑ i = l + 1 l + u ( 1 − | f ( x i ) | ) + ) {\displaystyle f^{}={\underset {f}{\operatorname {argmin} }}\left(\displaystyle \sum _{i=1}^{l}(1-y_{i}f(x_{i}))_{+}+\lambda _{1}\|h\|_{\mathcal {H}}^{2}+\lambda _{2}\sum _{i=l+1}^{l+u}(1-|f(x_{i})|)_{+}\right)} An exact solution is intractable due to the non-convex term ( 1 − | f ( x ) | ) + {\displayst

    Read more →
  • Language identification

    Language identification

    In natural language processing, language identification or language guessing is the problem of determining which natural language a given content is in. Computational approaches to this problem view it as a special case of text categorization, solved with various statistical methods. == Overview == === Logical approach === A common non-statistical intuitive approach (though highly uncertain) is to look for common letter combinations, or distinctive diacritics or punctuation. === Statistical approach === There are several statistical approaches to language identification. An older statistical method by Grefenstette was based on the frequency of short n-grams, which are often function morphemes. For example, "ing" is more common in English than in French, while the sequence "que" is more common in French. Given a new page found on the Web, one counts the number of occurrences of each such short sequence and picks the language whose frequency table it matches the most. One technique is to compare the compressibility of the text to the compressibility of texts in a set of known languages. This approach is known as mutual information based distance measure. The same technique can also be used to empirically construct family trees of languages which closely correspond to the trees constructed using historical methods. Mutual information based distance measure is essentially equivalent to more conventional model-based methods and is not generally considered to be either novel or better than simpler techniques. Another technique, as described by Cavnar and Trenkle (1994) and Dunning (1994) is to create a language n-gram model from a "training text" for each of the languages. These models can be based on characters (Cavnar and Trenkle) or encoded bytes (Dunning); in the latter, language identification and character encoding detection are integrated. Then, for any piece of text needing to be identified, a similar model is made, and that model is compared to each stored language model. The most likely language is the one with the model that is most similar to the model from the text needing to be identified. This approach can be problematic when the input text is in a language for which there is no model. In that case, the method may return another, "most similar" language as its result. Also problematic for any approach are pieces of input text that are composed of several languages, as is common on the Web. As of 2025, a commonly used baseline method is via the fastText library, which has comparable classification accuracy as deep learning techniques, but much faster. == Identifying similar languages == One of the great bottlenecks of language identification systems is to distinguish between closely related languages. Similar languages like Bulgarian and Macedonian or Indonesian and Malay present significant lexical and structural overlap, making it challenging for systems to discriminate between them. In 2014 the DSL shared task has been organized providing a dataset (Tan et al., 2014) containing 13 different languages (and language varieties) in six language groups: Group A (Bosnian, Croatian, Serbian), Group B (Indonesian, Malaysian), Group C (Czech, Slovak), Group D (Brazilian Portuguese, European Portuguese), Group E (Peninsular Spanish, Argentine Spanish), Group F (American English, British English). The best system reached performance of over 95% results (Goutte et al., 2014). Results of the DSL shared task are described in Zampieri et al. 2014. == Software == Apache OpenNLP includes char n-gram based statistical detector and comes with a model that can distinguish 103 languages Apache Tika contains a language detector for 18 languages

    Read more →
  • Geometric hashing

    Geometric hashing

    In computer science, geometric hashing is a method for efficiently finding two-dimensional objects represented by discrete points that have undergone an affine transformation, though extensions exist to other object representations and transformations. In an off-line step, the objects are encoded by treating each pair of points as a geometric basis. The remaining points can be represented in an invariant fashion with respect to this basis using two parameters. For each point, its quantized transformed coordinates are stored in the hash table as a key, and indices of the basis points as a value. Then a new pair of basis points is selected, and the process is repeated. In the on-line (recognition) step, randomly selected pairs of data points are considered as candidate bases. For each candidate basis, the remaining data points are encoded according to the basis and possible correspondences from the object are found in the previously constructed table. The candidate basis is accepted if a sufficiently large number of the data points index a consistent object basis. Geometric hashing was originally suggested in computer vision for object recognition in 2D and 3D, but later was applied to different problems such as structural alignment of proteins. == Geometric hashing in computer vision == Geometric hashing is a method used for object recognition. Let’s say that we want to check if a model image can be seen in an input image. This can be accomplished with geometric hashing. The method could be used to recognize one of the multiple objects in a base, in this case the hash table should store not only the pose information but also the index of object model in the base. === Example === For simplicity, this example will not use too many point features and assume that their descriptors are given by their coordinates only (in practice local descriptors such as SIFT could be used for indexing). ==== Training Phase ==== Find the model's feature points. Assume that 5 feature points are found in the model image with the coordinates ( 12 , 17 ) ; {\displaystyle (12,17);} ( 45 , 13 ) ; {\displaystyle (45,13);} ( 40 , 46 ) ; {\displaystyle (40,46);} ( 20 , 35 ) ; {\displaystyle (20,35);} ( 35 , 25 ) {\displaystyle (35,25)} , see the picture. Introduce a basis to describe the locations of the feature points. For 2D space and similarity transformation the basis is defined by a pair of points. The point of origin is placed in the middle of the segment connecting the two points (P2, P4 in our example), the x ′ {\displaystyle x'} axis is directed towards one of them, the y ′ {\displaystyle y'} is orthogonal and goes through the origin. The scale is selected such that absolute value of x ′ {\displaystyle x'} for both basis points is 1. Describe feature locations with respect to that basis, i.e. compute the projections to the new coordinate axes. The coordinates should be discretised to make recognition robust to noise, we take the bin size 0.25. We thus get the coordinates ( − 0.75 , − 1.25 ) ; {\displaystyle (-0.75,-1.25);} ( 1.00 , 0.00 ) ; {\displaystyle (1.00,0.00);} ( − 0.50 , 1.25 ) ; {\displaystyle (-0.50,1.25);} ( − 1.00 , 0.00 ) ; {\displaystyle (-1.00,0.00);} ( 0.00 , 0.25 ) {\displaystyle (0.00,0.25)} Store the basis in a hash table indexed by the features (only transformed coordinates in this case). If there were more objects to match with, we should also store the object number along with the basis pair. Repeat the process for a different basis pair (Step 2). It is needed to handle occlusions. Ideally, all the non-colinear pairs should be enumerated. We provide the hash table after two iterations, the pair (P1, P3) is selected for the second one. Hash Table: Most hash tables cannot have identical keys mapped to different values. So in real life one won’t encode basis keys (1.0, 0.0) and (-1.0, 0.0) in a hash table. ==== Recognition Phase ==== Find interesting feature points in the input image. Choose an arbitrary basis. If there isn't a suitable arbitrary basis, then it is likely that the input image does not contain the target object. Describe coordinates of the feature points in the new basis. Quantize obtained coordinates as it was done before. Compare all the transformed point features in the input image with the hash table. If the point features are identical or similar, then increase the count for the corresponding basis (and the type of object, if any). For each basis such that the count exceeds a certain threshold, verify the hypothesis that it corresponds to an image basis chosen in Step 2. Transfer the image coordinate system to the model one (for the supposed object) and try to match them. If successful, the object is found. Otherwise, go back to Step 2. === Finding mirrored pattern === It seems that this method is only capable of handling scaling, translation, and rotation. However, the input image may contain the object in mirror transform. Therefore, geometric hashing should be able to find the object, too. There are two ways to detect mirrored objects. For the vector graph, make the left side positive, and the right side negative. Multiplying the x position by -1 will give the same result. Use 3 points for the basis. This allows detecting mirror images (or objects). Actually, using 3 points for the basis is another approach for geometric hashing. === Geometric hashing in higher-dimensions === Similar to the example above, hashing applies to higher-dimensional data. For three-dimensional data points, three points are also needed for the basis. The first two points define the x-axis, and the third point defines the y-axis (with the first point). The z-axis is perpendicular to the created axis using the right-hand rule. Notice that the order of the points affects the resulting basis

    Read more →
  • BulSemCor

    BulSemCor

    The Bulgarian Sense-annotated Corpus (BulSemCor) (Bulgarian: Български семантично анотиран корпус (БулСемКор)) is a structured corpus of Bulgarian texts in which each lexical item is assigned a sense tag. BulSemCor was created by the Department of Computational Linguistics at the Institute for Bulgarian Language of the Bulgarian Academy of Sciences. == Structure == BulSemCor was created as part of a nationally funded project titled "BulNet – A lexico-semantic network for the Bulgarian Language" (2005–2010). It follows the general methodology of SemCor combined with some specific principles. The corpus for annotation consists of 101,791 tokens covering an excerpt from the Bulgarian "Brown" Corpus modelled on the Brown Corpus.Francis Kucera An important feature of BulSemCor is that the samples are selected using heuristics that provide optimal coverage of ambiguous lexis. BulSemCor is manually sense-annotated according to the Bulgarian WordNet. Its size is comparable to that of other contemporary semantically annotated corpora or pool of acceptable linguistic components. The semantic annotation consists in associating each lexical item in the corpus with exactly one synonym set (synset) in the Bulgarian WordNet that best describes its sense in the particular context. The selection of the best match among the suggested candidates is based on a set of procedures, such as the other synset members, the synset gloss (explanatory definition) and the position of a given candidate in the WordNet structure. == Scale == The number of annotated tokens is 99,480 (the difference in the number of tokens compared to the initial corpus is due to the fact that some of them are not linguistic items). The simple word count is 86,842 and multiword expressions (MWE) are 5,797 (12,638 tokens). == Specific features == All words in BulSemCor are assigned a sense, while according to established practice only simple content words or content word classes (typically nouns and verbs) are annotated. Since 2000 the development of language resources, has broadened to include annotation of function words and multiword expressions covering particular senses or types of words and expressions. In this respect, BulSemCor's annotation is more exhaustive and hence provides greater opportunities for linguistic observations and non-linear programming (NLP) applications. Annotated items inherit the linguistic information associated with the corresponding synset, which along with morphological and semantic tags may include annotation on one or more of the following additional levels: Partial information about the syntactic structure of MWE types – particularly, information about syntactic heads and their dependents; Information about the category of the named entities – names, locations, organisations, dates, numbers, etc.; Information about the taxonomic category of adverbs, such as time, place, manner, degree, quantity, etc.; Information about the type of the syntactic relationships – coordination or subordination – expressed by conjunctions; Information about the original part-of-speech of substantivised words (non-nouns that act as nouns in a particular context); Stylistic/register, grammatical and other information about synsets or individual synset members;

    Read more →
  • You Only Look Once

    You Only Look Once

    You Only Look Once (YOLO) is a series of real-time object detection systems based on convolutional neural networks. First introduced by Joseph Redmon et al. in 2015, YOLO has undergone several iterations and improvements, becoming one of the most popular object detection frameworks. The name "You Only Look Once" refers to the fact that the algorithm requires only one forward propagation pass through the neural network to make predictions, unlike previous region proposal-based techniques like R-CNN that require thousands for a single image. == Overview == Compared to previous methods like R-CNN and OverFeat, instead of applying the model to an image at multiple locations and scales, YOLO applies a single neural network to the full image. This network divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities. === OverFeat === OverFeat was an early influential model for simultaneous object classification and localization. Its architecture is as follows: Train a neural network for image classification only ("classification-trained network"). This could be one like the AlexNet. The last layer of the trained network is removed, and for every possible object class, initialize a network module at the last layer ("regression network"). The base network has its parameters frozen. The regression network is trained to predict the ( x , y ) {\displaystyle (x,y)} coordinates of two corners of the object's bounding box. During inference time, the classification-trained network is run over the same image over many different zoom levels and croppings. For each, it outputs a class label and a probability for that class label. Each output is then processed by the regression network of the corresponding class. This results in thousands of bounding boxes with class labels and probability. These boxes are merged until only one single box with a single class label remains. == Versions == There are two parts to the YOLO series. The original part contained YOLOv1, v2, and v3, all released on a website maintained by Joseph Redmon. === YOLOv1 === The original YOLO algorithm, introduced in 2015, divides the image into an S × S {\displaystyle S\times S} grid of cells. If the center of an object's bounding box falls into a grid cell, that cell is said to "contain" that object. Each grid cell predicts B bounding boxes and confidence scores for those boxes. These confidence scores reflect how confident the model is that the box contains an object and how accurate it thinks the box is that it predicts. In more detail, the network performs the same convolutional operation over each of the S 2 {\displaystyle S^{2}} patches. The output of the network on each patch is a tuple as follows: ( p 1 , … , p C , c 1 , x 1 , y 1 , w 1 , h 1 , … , c B , x B , y B , w B , h B ) {\displaystyle (p_{1},\dots ,p_{C},c_{1},x_{1},y_{1},w_{1},h_{1},\dots ,c_{B},x_{B},y_{B},w_{B},h_{B})} where p i {\displaystyle p_{i}} is the conditional probability that the cell contains an object of class i {\displaystyle i} , conditional on the cell containing at least one object. x j , y j , w j , h j {\displaystyle x_{j},y_{j},w_{j},h_{j}} are the center coordinates, width, and height of the j {\displaystyle j} -th predicted bounding box that is centered in the cell. Multiple bounding boxes are predicted to allow each prediction to specialize in one kind of bounding box. For example, slender objects might be predicted by j = 2 {\displaystyle j=2} while stout objects might be predicted by j = 1 {\displaystyle j=1} . c j {\displaystyle c_{j}} is the predicted intersection over union (IoU) of each bounding box with its corresponding ground truth. The network architecture has 24 convolutional layers followed by 2 fully connected layers. During training, for each cell, if it contains a ground truth bounding box, then only the predicted bounding boxes with the highest IoU with the ground truth bounding boxes is used for gradient descent. Concretely, let j {\displaystyle j} be that predicted bounding box, and let i {\displaystyle i} be the ground truth class label, then x j , y j , w j , h j {\displaystyle x_{j},y_{j},w_{j},h_{j}} are trained by gradient descent to approach the ground truth, p i {\displaystyle p_{i}} is trained towards 1 {\displaystyle 1} , other p i ′ {\displaystyle p_{i'}} are trained towards zero. If a cell contains no ground truth, then only c 1 , c 2 , … , c B {\displaystyle c_{1},c_{2},\dots ,c_{B}} are trained by gradient descent to approach zero. === YOLOv2 === Released in 2016, YOLOv2 (also known as YOLO9000) improved upon the original model by incorporating batch normalization, a higher resolution classifier, and using anchor boxes to predict bounding boxes. It could detect over 9000 object categories. It was also released on GitHub under the Apache 2.0 license. === YOLOv3 === YOLOv3, introduced in 2018, contained only "incremental" improvements, including the use of a more complex backbone network, multiple scales for detection, and a more sophisticated loss function. === YOLOv4 and beyond === Subsequent versions of YOLO (v4, v5, etc.) have been developed by different researchers, further improving performance and introducing new features. These versions are not officially associated with the original YOLO authors but build upon their work. As of 2026, versions up to YOLO26 have been released..

    Read more →
  • Jordan Antiquities Database and Information System

    Jordan Antiquities Database and Information System

    The Jordan Antiquities Database and Information System (JADIS) was a computer database of antiquities in Jordan, the first of its kind in the Arab world. It was established by the Department of Antiquities in 1990, in cooperation with the American Center for Oriental Research in Amman and sponsored by the United States Agency for International Development. JADIS was in use until 2002, when it was superseded by a new system, MEGA-J. Over 10,841 antiquities were registered in the database. An introduction and printed summary of the database was published by the Department of Antiquities in 1994, edited by Gaetano Palumbo.

    Read more →
  • Euratlas

    Euratlas

    Euratlas is a Switzerland-based software company dedicated to elaborate digital history maps of Europe. Founded in 2001, Euratlas has created a collection of history maps of Europe from year 1 AD to year 2000 AD that present the evolution of every country from the Roman Empire to present times. The evolution includes sovereign states and their administrative subdivisions, but also unorganized peoples and dependent territories. The maps show European country borders at regular intervals of 100 years, but not year by year. This leaves out many important turning points in history. Euratlas is considered a digital humanities company, and a scholar research software used in the field of historic cartography. It is broadly known among American and European universities, who mainly use Euratlas as a research tool and as a digital library atlas. == Sequential mapping policy == This concept was first designed by the German scholar Christian Kruse (1753–1827). Kruse, well aware that historical accounts are often biased for geographical, philosophical or political reasons, created a set of sequential maps in order to give a global vision of the successive political situations. Nowadays, the majority of atlases don't use this approach, but are event-based, like the well-known Penguin Atlas of History. The sequential approach intends to make the sequence of maps more neutral and suitable for students, historians and professionals of several fields. Although, this approach has been discussed as it leaves out many important history events that are not reflected on any of the maps because of the century interval. == Geo-referenced historical data == Initially, the European maps by century were developed as vector maps. From 2006 on, they have been converted to a geographic information system (GIS) database, enabling geo-referenced data capabilities. The map information is distributed in several layers: physical (geography information layer); political information layer (supranational entities, sovereign states, administrative divisions, dependent states and autonomous peoples); and special layers for cities and uncertain borders. The software database also contains much non-geographical information about political relationships between the various kinds of territories. == Map projection == Euratlas History Maps uses a Mercator projection, with the center in Europe. The maps include the North-African coast and the Near-East, offering a complete view of the Mediterranean Basin. The European Russia plains are shown, but not Scandinavia, specially Finland, which is cropped off the map view.

    Read more →
  • Abdul Majid Bhurgri Institute of Language Engineering

    Abdul Majid Bhurgri Institute of Language Engineering

    Abdul Majid Bhurgri Institute of Language Engineering (Sindhi: عبدالماجد ڀرڳڙي انسٽيٽيوٽ آف لئنگئيج انجنيئرنگ) is an autonomous body under the administrative control of the Culture, Tourism and Antiquities Department, Government of Sindh established for bringing Sindhi language at par with national and international languages in all computational process and Natural language processing. == Establishment == In recognition to services of Abdul-Majid Bhurgri, who is the founder of Sindhi computing, Government of Sindh has established the institute after his name. The institute was primarily initiated on the concept given by a language engineer and linguist Amar Fayaz Buriro in briefing to the Minister, Culture, Tourism and Antiquities, Government of Sindh, Syed Sardar Ali Shah on 21 February 2017 on celebration of International Mother Language Day in Sindhi Language Authority, Hyderabad, Sindh. After the presentation and concept given by Amar Fayaz Buriro, the minister Syed Sardar Ali Shah had announced the Institute. Then, Government of Sindh added the development scheme in the Budget of fiscal year 2017-2018. == Projects == The Institute has developed several projects aimed at advancing the Sindhi language and promoting linguistic research. Notable initiatives include the AMBILE Hamiz Ali Sindhi Optical character recognition, which allows for the accurate digitization of Sindhi text, and the ongoing Sindhi WordNet System, a project to build a comprehensive lexical database for Natural language processing. The institute has also created the Font, which integrates symbols from the Indus script, Khudabadi script, and modern Perso-Arabic Script Code for Information Interchange into a single resource for researchers]. Additionally, institute has developed online converter tools that automatically transliterate between the Arabic-Perso script and Devanagari script, improving linguistic accessibility. Another key project is Bhittaipedia, a digital platform dedicated to the preservation and dissemination of the poetry of Shah Abdul Latif Bhittai, one of Sindh's most renowned poet. == Location == The institute is established behind Sindh Museum and Sindhi Language Authority, N-5 National Highway, Qasimabad, Hyderabad, Sindh.

    Read more →