AI Grammar Sentence Checker

AI Grammar Sentence Checker — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Memory color effect

    Memory color effect

    The memory color effect is the phenomenon that the canonical hue of a type of object acquired through experience (e.g. the sky, a leaf, or a strawberry) can directly modulate the appearance of the actual colors of objects. Human observers acquire memory colors through their experiences with instances of that type. For example, most human observers know that an apple typically has a reddish hue; this knowledge about the canonical color which is represented in memory constitutes a memory color. As an example of the effect, normal human trichromats, when presented with a gray banana, often perceive the gray banana as being yellow - the banana's memory color. In light of this, subjects typically adjust the color of the banana towards the color blue - the opponent color of yellow - when asked to adjust its surface to gray to cancel the subtle activation of banana's memory color. Subsequent empirical studies have also shown the memory color effect on man-made objects (e.g. smurfs, German mailboxes), the effect being especially pronounced for blue and yellow objects. To explain this, researchers have argued that because natural daylight shifts from short wavelengths of light (i.e., bluish hues) towards light of longer wavelengths (i.e., yellowish-orange hues) during the day, the memory colors for blue and yellow objects are recruited by the visual system to a higher degree to compensate for this fluctuation in illumination, thereby providing a stronger memory color effect. == Form identification == Memory color plays a role when detecting an object. In a study where participants were given objects, such as an apple, with two alternate forms for each, a crooked apple and a circular apple, researchers changed the colors of the alternate forms and asked if they could identify them. Most of the participants answered "unsure," suggesting that we use memory color when identifying an object. The research redefined memory color as a phenomenon when "a form's identity affects the phenomenal hue of that form." == Color effect on memorization == Memory color effect can be derived from the human instinct to memorize objects better. Comparing the effect of recognizing gray-scaled images and colored images, results showed that people were able to recall colored images 5% higher compared to gray-scaled images. An important factor was that higher level of contrast between the object and background color influences memory. In a specific study related to this, participants reported that colors were 5% to 10% easier to recognize compared to black and white. == Color constancy and memory color effect == Color constancy is the phenomenon where a surface to appear to be of the same color under a wide rage of illumination. A study tested two hypotheses with regards to color memory; the photoreceptor hypothesis and the surface reflectance hypothesis. The test color was surround either by various color patches forming a complex pattern or a uniform “grey” field at the same chromaticity as that of the illuminant. The test color was presented on a dark background for the control group. It was observed that complex surround results where in line with the surface-reflectance hypothesis and not the photoreceptor hypothesis, showing that the accuracy and precision of color memory are fundamentals to understanding the phenomenon of color constancy. == Significance to the evolution of trichromacy == While objects that possess canonical hues make up a small percentage of the objects which populate humans’ visual experience, the human visual system evolved in an environment populated with objects that possess canonical hues. This suggests that the memory color effect is related to the emergence of trichromacy because it has been argued that trichromacy evolved to optimize the ability to detect ripe fruits—objects that appear in canonical hues. == In perception research == In perception research, the memory color effect is cited as evidence for the opponent color theory, which states that four basic colors can be paired with its opponent color: red—green, blue—yellow. This explains why participants adjust the ripe banana color to a blueish tone to make its memory color yellow as gray. Researchers have also found empirical evidence that suggests memory color is recruited by the visual system to achieve color constancy. For example, participants had a lower percentage of color constancy when looking at a color incongruent scene, such as a purple banana, compared to a color diagnostical scene, a yellow banana. This suggests that color constancy is influenced by the color of objects that we are familiar with, which the memory color effect takes part.

    Read more →
  • Artificial intelligence content detection

    Artificial intelligence content detection

    Artificial intelligence detection software aims to determine whether some content (text, image, video, or audio) was generated using artificial intelligence (AI). This software is often unreliable. == Accuracy issues == Many AI detection tools have been shown to be unreliable in detecting AI-generated text. In a 2023 study conducted by Weber-Wulff et al., researchers evaluated 14 detection tools including Turnitin and GPTZero and found that "all scored below 80% of accuracy and only 5 over 70%." They also found that these tools tend to have a bias for classifying texts more as human than as AI, and that accuracy of these tools worsens upon paraphrasing. === False positives === In AI content detection, a false positive is when human-written work is incorrectly flagged as AI-written. Many AI detection platforms claim to have a minimal level of false positives, with Turnitin claiming a less than 1% false positive rate. However, later research by The Washington Post produced much higher rates of 50%, though they used a smaller sample size. False positives in an academic setting frequently lead to accusations of academic misconduct, which can have serious consequences for a student's academic record. Additionally, studies have shown evidence that many AI detection models are prone to give false positives to work written by people whose first language is not English, and also to neurodivergent people. In June 2023, Janelle Shane wrote that portions of her book You Look Like a Thing and I Love You were flagged as AI-generated. === False negatives === A false negative is a failure to identify documents with AI-written text. False negatives often happen as a result of a detection software's sensitivity level or because evasive techniques were used when generating the work to make it sound more human. False negatives are less of a concern academically, since they aren't likely to lead to accusations and ramifications. Notably, Turnitin stated they have a 15% false negative rate. == Text detection == For text, this is usually done to prevent alleged plagiarism, often by detecting repetition of words as telltale signs that a text was AI-generated (including hallucinations). Detection systems may also rely on stylistic and structural regularities associated with LLM output, such as unusually consistent grammar, formulaic transitions, repeated discourse markers, and recurring rhetorical templates. Some tools are designed less to establish authorship provenance than to flag prose that resembles common LLM-generated style patterns. They are often used by teachers marking their students, usually on an ad hoc basis. Following the release of ChatGPT and similar AI text generative software, many educational establishments have issued policies against the use of AI by students. AI text detection software is also used by those assessing job applicants, as well as online search engines, hiring, online moderation and publishing. Current detectors may sometimes be unreliable and have incorrectly marked work by humans as originating from AI while failing to detect AI-generated work in other instances. MIT Technology Review said that the technology "struggled to pick up ChatGPT-generated text that had been slightly rearranged by humans and obfuscated by a paraphrasing tool". AI text detection software has also been shown to discriminate against non-native speakers of English. Two students from the University of California, Davis, were referred to the university's Office of Student Success and Judicial Affairs (OSSJA) after their professors scanned their essays with positive results; the first with an AI detector called GPTZero, and the second with an AI detector integration in Turnitin. However, following media coverage, and a thorough investigation, the students were cleared of any wrongdoing. In April 2023, Cambridge University and other members of the Russell Group of universities in the United Kingdom opted out of Turnitin's AI text detection tool, after expressing concerns it was unreliable. The University of Texas at Austin opted out of the system six months later. In May 2023, a professor at Texas A&M University–Commerce used ChatGPT to detect whether his students' content was written by it, which ChatGPT said was the case. As such, he threatened to fail the class despite ChatGPT not being able to detect AI-generated writing. No students were prevented from graduating because of the issue, and all but one student (who admitted to using the software) were exonerated from accusations of having used ChatGPT in their content. In July 2023, a paper titled "GPT detectors are biased against non-native English writers" was released, reporting that GPTs discriminate against non-native English authors. The paper compared seven GPT detectors against essays from both non-native English speakers and essays from United States students. The essays from non-native English speakers had an average false positive rate of 61.3%. An article by Thomas Germain, published on Gizmodo in June 2024, reported job losses among freelance writers and journalists due to AI text detection software mistakenly classifying their work as AI-generated. In September 2024, Common Sense Media reported that generative AI detectors had a 20% false positive rate for Black students, compared to 10% of Latino students and 7% of White students. To improve the reliability of AI text detection, researchers have explored digital watermarking techniques. A 2023 paper titled "A Watermark for Large Language Models" presents a method to embed imperceptible watermarks into text generated by large language models (LLMs). This watermarking approach allows content to be flagged as AI-generated with a high level of accuracy, even when text is slightly paraphrased or modified. The technique is designed to be subtle and hard to detect for casual readers, thereby preserving readability, while providing a detectable signal for those employing specialized tools. However, while promising, watermarking faces challenges in remaining robust under adversarial transformations and ensuring compatibility across different LLMs. == Anti text detection == There is software available designed to bypass AI text detection. In practice, evasion may not require specialized bypass tools. Paraphrasing, style editing, and removal of repeated discourse markers can substantially reduce the effectiveness of detectors that rely on recognizable surface patterns. A study published in August 2023 analyzed 20 abstracts from papers published in the Eye Journal, which were then paraphrased using GPT-4.0. The AI-paraphrased abstracts were examined for plagiarism using QueText and for AI-generated content using Originality.AI. The texts were then re-processed through an adversarial software called Undetectable.ai in order to reduce the AI-detection scores. The study found that the AI detection tool, Originality.AI, identified text generated by GPT-4 with a mean accuracy of 91.3%. However, after reprocessing by Undetectable.ai, the detection accuracy of Originality.ai dropped to a mean accuracy of 27.8%. Some experts also believe that techniques like digital watermarking are ineffective because they can be removed or added to trigger false positives. "A Watermark for Large Language Models" paper by Kirchenbauer et al. (2023) also addresses potential vulnerabilities of watermarking techniques. The authors outline a range of adversarial tactics, including text insertion, deletion, and substitution attacks, that could be used to bypass watermark detection. These attacks vary in complexity, from simple paraphrasing to more sophisticated approaches involving tokenization and homoglyph alterations. The study highlights the challenge of maintaining watermark robustness against attackers who may employ automated paraphrasing tools or even specific language model replacements to alter text spans iteratively while retaining semantic similarity. Experimental results show that although such attacks can degrade watermark strength, they also come at the cost of text quality and increased computational resources. == Image, video, and audio detection == Several purported AI image detection software exist, to detect AI-generated images (for example, those originating from Midjourney or DALL-E). They are not completely reliable. Industry analyses have also noted that AI-driven image recognition systems often struggle in real-world environments, where inconsistent lighting, noise and variable visual inputs reduce detection reliability, a challenge highlighted in modern agricultural quality-control research. Others claim to identify video and audio deepfakes, but this technology is also not fully reliable yet either. Despite debate around the efficacy of watermarking, Google DeepMind is actively developing a detection software called SynthID, which works by inserting a digital watermark that is invisible to the human eye into the pixels of an image.

    Read more →
  • ReRites

    ReRites

    ReRites (also known as RERITES, ReadingRites, Big Data Poetry) is a literary work of "Human + A.I. poetry" by David Jhave Johnston that used neural network models trained to generate poetry which the author then edited. ReRites won the Robert Coover Award for a Work of Electronic Literature in 2022. == About the project == The ReRites project began as a daily rite of writing with a neural network, expanded into a series of performances from which video documentation has been published online, and concluded with a set of 12 books and an accompanying book of essays published by Anteism Books in 2019. In Electronic Literature, Scott Rettberg describes the early phases of the project in 2016, when it bore the preliminary name Big Data Poetry. Jhave (the artist name that David Jhave Johnston goes by) describes the process of writing ReRites as a rite: "Every morning for 2 hours (normally 6:30–8:30am) I get up and edit the poetic output of a neural net. Deleting, weaving, conjugating, lineating, cohering. Re-writing. Re-wiring authorship: hybrid augmented enhanced evolutionary". There is video documentation of the writing process. The human editing of the neural network's output is fundamental to this project, and Jhave gives examples of both unedited text extracts and his edited versions in publications about the project. Kyle Booten describes ReRites as "simultaneously dusty and outrageously verdant, monotonously sublime and speckled with beautiful and rare specimens". === Performances === ReRites was first shared with an audience through a series of performances where audience members and poets would participate in reading the automatically generated texts, which appeared on screen so fast that human readers could barely keep up. This has been described as allowing participants to "re-discover[..] the peculiar pleasures of being embodied", or, in Jhave's own words, as a space where human participants were "playing their wits and voices against an evocative infinite deep-learning muse". The first performance was at Brown University's Interrupt Festival in 2019. It has been performed many times since, including at the Barbican Centre in London and Anteism Books. === Print publications === For a single year Jhave published one book of poetry from the ReRites project each month. These twelve volumes are accompanied by a book of essays, all published by Anteism Books. The accompanying essays provide critical responses to the project from poets and scholars including Allison Parrish, Johanna Drucker, Kyle Booten, Stephanie Strickland, John Cayley, Lai-Tze Fan, Nick Montfort, Mairéad Byrne, and Chris Funkhouser. Allison Parrish notes elsewhere that these paratexts to ReRites serve a legitimising function for a genre of poetry that is not yet institutionally acknowledged. === Technical details === Starting in 2016 under the name Big Data Poetry, Jhave generated poems using, in his own words, "neural network code (..) adapted from three corporate github-hosted machine-learning libraries: TensorFlow (Google), PyTorch (Facebook), and AWD-LSTM (SalesForce)". He explains that the "models were trained on a customised corpus of 600,000 lines of poetry ranging from the romantic epoch to the 20th century avant garde". Jhave maintains a GitHub repository with some of the code supporting ReRites. == Reception == ReRites is described by John Cayley as "one of the most thorough and beautiful" poetic responses to machine learning. The work's influence on the field of electronic literature was acknowledged in 2022, when the work won the Electronic Literature Organization's Robert Coover Award for a Work of Electronic Literature. The jury described ReRites as particularly poignant in the time of the pandemic, as it was "a documentation of the performance of the private ritual of writing and the obsessive-compulsive need for writers to communicate — even when no one else is reading". The question of authorship and voice in ReRites has been raised by several critics. Although generated poetry is an established genre in electronic literature, Cayley notes that unlike the combinatory poems created by authors like Nick Montfort, where the author explicitly defines which words and phrases will be recombined, ReRites has "not been directed by literary preconceptions inscribed in the program itself, but only by patterns and rhythms pre-existing in the corpora". In an essay for the Australian journal TEXT, David Thomas Henry Wright asks how to understand authorship and authority in ReRites: "Who or what is the authority of the work? The original data fed into the machine, that is not currently retrievable or discernible from the final works? The code that was taken and adapted for his purposes? Or Jhave, the human editor?" Wright concludes that Jhave is the only actor with any intentionality and therefore the authority of the work. The centrality of the human editor is also emphasised by other scholars. In a chapter analysing ReRites Malthe Stavning Erslev argues that the machine learning misrepresents the dataset it is trained on. While ReRites uses 21st century neural networks, it has been compared to earlier literary traditions. Poet Victoria Stanton, who read at one of the ReRites performances, has compared ReRites to found poetry, while David Thomas Henry Wright compares it to the Oulipo movement and Mark Amerika to the cut-up technique. Scholars also position ReRites firmly within the long tradition of generative poetry both in electronic literature and print, stretching from the I Ching, Queneau's Cent Mille Milliards de Poemes and Nabokov's Pale Fire to computer-generated poems like Christopher Strachey's Love Letter Generator (1952) and more contemporary examples. Jhave describes the process of working with the output from the neural network as "carving". In his book My Life as an Artificial Creative Intelligence, Mark Amerika writes that the "method of carving the digital outputs provided by the language model as part of a collaborative remix jam session with GPT-2, where the language artist and the language model play off each other’s unexpected outputs as if caught in a live postproduction set, is one I share with electronic literature composer David Jhave Johnston, whose AI poetry experiments precede my own investigations."

    Read more →
  • Weight initialization

    Weight initialization

    In deep learning, weight initialization or parameter initialization describes the initial step in creating a neural network. A neural network contains trainable parameters that are modified during training: weight initialization is the pre-training step of assigning initial values to these parameters. The choice of weight initialization method affects the speed of convergence, the scale of neural activation within the network, the scale of gradient signals during backpropagation, and the quality of the final model. Proper initialization is necessary for avoiding issues such as vanishing and exploding gradients and activation function saturation. Note that even though this article is titled "weight initialization", both weights and biases are used in a neural network as trainable parameters, so this article describes how both of these are initialized. Similarly, trainable parameters in convolutional neural networks (CNNs) are called kernels and biases, and this article also describes these. == Constant initialization == We discuss the main methods of initialization in the context of a multilayer perceptron (MLP). Specific strategies for initializing other network architectures are discussed in later sections. For an MLP, there are only two kinds of trainable parameters, called weights and biases. Each layer l {\displaystyle l} contains a weight matrix W ( l ) ∈ R n l − 1 × n l {\displaystyle W^{(l)}\in \mathbb {R} ^{n_{l-1}\times n_{l}}} and a bias vector b ( l ) ∈ R n l {\displaystyle b^{(l)}\in \mathbb {R} ^{n_{l}}} , where n l {\displaystyle n_{l}} is the number of neurons in that layer. A weight initialization method is an algorithm for setting the initial values for W ( l ) , b ( l ) {\displaystyle W^{(l)},b^{(l)}} for each layer l {\displaystyle l} . The simplest form is zero initialization: W ( l ) = 0 , b ( l ) = 0 {\displaystyle W^{(l)}=0,b^{(l)}=0} Zero initialization is usually used for initializing biases, but it is not used for initializing weights, as it leads to symmetry in the network, causing all neurons to learn the same features. In this page, we assume b = 0 {\displaystyle b=0} unless otherwise stated. Recurrent neural networks typically use activation functions with bounded range, such as sigmoid and tanh, since unbounded activation may cause exploding values. (Le, Jaitly, Hinton, 2015) suggested initializing weights in the recurrent parts of the network to identity and zero bias, similar to the idea of residual connections and LSTM with no forget gate. In most cases, the biases are initialized to zero, though some situations can use a nonzero initialization. For example, in multiplicative units, such as the forget gate of LSTM, the bias can be initialized to 1 to allow good gradient signal through the gate. For neurons with ReLU activation, one can initialize the bias to a small positive value like 0.1, so that the gradient is likely nonzero at initialization, avoiding the dying ReLU problem. == Random initialization == Random initialization means sampling the weights from a normal distribution or a uniform distribution, usually independently. === LeCun initialization === LeCun initialization, popularized in (LeCun et al., 1998), is designed to preserve the variance of neural activations during the forward pass. It samples each entry in W ( l ) {\displaystyle W^{(l)}} independently from a distribution with mean 0 and variance 1 / n l − 1 {\displaystyle 1/n_{l-1}} . For example, if the distribution is a continuous uniform distribution, then the distribution is U ( ± 3 / n l − 1 ) {\displaystyle {\mathcal {U}}(\pm {\sqrt {3/n_{l-1}}})} . === Glorot initialization === Glorot initialization (or Xavier initialization) was proposed by Xavier Glorot and Yoshua Bengio. It was designed as a compromise between two goals: to preserve activation variance during the forward pass and to preserve gradient variance during the backward pass. For uniform initialization, it samples each entry in W ( l ) {\displaystyle W^{(l)}} independently and identically from U ( ± 6 / ( n l + 1 + n l − 1 ) ) {\displaystyle {\mathcal {U}}(\pm {\sqrt {6/(n_{l+1}+n_{l-1})}})} . In the context, n l − 1 {\displaystyle n_{l-1}} is also called the "fan-in", and n l + 1 {\displaystyle n_{l+1}} the "fan-out". When the fan-in and fan-out are equal, then Glorot initialization is the same as LeCun initialization. === He initialization === As Glorot initialization performs poorly for ReLU activation, He initialization (or Kaiming initialization) was proposed by Kaiming He et al. for networks with ReLU activation. It samples each entry in W ( l ) {\displaystyle W^{(l)}} from N ( 0 , 2 / n l − 1 ) {\displaystyle {\mathcal {N}}(0,2/n_{l-1})} . === Orthogonal initialization === (Saxe et al. 2013) proposed orthogonal initialization: initializing weight matrices as uniformly random (according to the Haar measure) semi-orthogonal matrices, multiplied by a factor that depends on the activation function of the layer. It was designed so that if one initializes a deep linear network this way, then its training time until convergence is independent of depth. Sampling a uniformly random semi-orthogonal matrix can be done by initializing X {\displaystyle X} by IID sampling its entries from a standard normal distribution, then calculate ( X X ⊤ ) − 1 / 2 X {\displaystyle \left(XX^{\top }\right)^{-1/2}X} or its transpose, depending on whether X {\displaystyle X} is tall or wide. For CNN kernels with odd widths and heights, orthogonal initialization is done this way: initialize the central point by a semi-orthogonal matrix, and fill the other entries with zero. As an illustration, a kernel K {\displaystyle K} of shape 3 × 3 × c × c ′ {\displaystyle 3\times 3\times c\times c'} is initialized by filling K [ 2 , 2 , : , : ] {\displaystyle K[2,2,:,:]} with the entries of a random semi-orthogonal matrix of shape c × c ′ {\displaystyle c\times c'} , and the other entries with zero. (Balduzzi et al., 2017) used it with stride 1 and zero-padding. This is sometimes called the Orthogonal Delta initialization. Related to this approach, unitary initialization proposes to parameterize the weight matrices to be unitary matrices, with the result that at initialization they are random unitary matrices (and throughout training, they remain unitary). This is found to improve long-sequence modelling in LSTM. Orthogonal initialization has been generalized to layer-sequential unit-variance (LSUV) initialization. It is a data-dependent initialization method, and can be used in convolutional neural networks. It first initializes weights of each convolution or fully connected layer with orthonormal matrices. Then, proceeding from the first to the last layer, it runs a forward pass on a random minibatch, and divides the layer's weights by the standard deviation of its output, so that its output has variance approximately 1. === Fixup initialization === In 2015, the introduction of residual connections allowed very deep neural networks to be trained, much deeper than the ~20 layers of the previous state of the art (such as the VGG-19). Residual connections gave rise to their own weight initialization problems and strategies. These are sometimes called "normalization-free" methods, since using residual connection could stabilize the training of a deep neural network so much that normalizations become unnecessary. Fixup initialization is designed specifically for networks with residual connections and without batch normalization, as follows: Initialize the classification layer and the last layer of each residual branch to 0. Initialize every other layer using a standard method (such as He initialization), and scale only the weight layers inside residual branches by L − 1 2 m − 2 {\displaystyle L^{-{\frac {1}{2m-2}}}} . Add a scalar multiplier (initialized at 1) in every branch and a scalar bias (initialized at 0) before each convolution, linear, and element-wise activation layer. Similarly, T-Fixup initialization is designed for Transformers without layer normalization. === Others === Instead of initializing all weights with random values on the order of O ( 1 / n ) {\displaystyle O(1/{\sqrt {n}})} , sparse initialization initialized only a small subset of the weights with larger random values, and the other weights zero, so that the total variance is still on the order of O ( 1 ) {\displaystyle O(1)} . Random walk initialization was designed for MLP so that during backpropagation, the L2 norm of gradient at each layer performs an unbiased random walk as one moves from the last layer to the first. Looks linear initialization was designed to allow the neural network to behave like a deep linear network at initialization, since W R e L U ( x ) − W R e L U ( − x ) = W x {\displaystyle W\;\mathrm {ReLU} (x)-W\;\mathrm {ReLU} (-x)=Wx} . It initializes a matrix W {\displaystyle W} of shape R n 2 × m {\displaystyle \mathbb {R} ^{{\frac {n}{2}}\times m}} by any method, such as orthogonal initialization, t

    Read more →
  • CapCut

    CapCut

    CapCut, known domestically as JianYing (Chinese: 剪映; pinyin: Jiǎnyìng) and formerly internationally as ViaMaker, is a video editor developed by ByteDance, available as a mobile app, desktop app, and web app. == History == The app was first released in China in 2019 and was initially available for iPhone and Android. In 2020, it was rebranded in English from ViaMaker to CapCut and became available globally. It later expanded to include web and desktop versions for Mac and Windows. In 2022, CapCut reached 200 million active users. According to The Wall Street Journal, in March 2023, it was the second-most downloaded app in the U.S., behind that of Chinese discount retailer Temu. In January 2025, CapCut had over 1 billion downloads on the Google Play Store. On February 1, 2021, CapCut Pro for Windows was launched. On November 27, the Pro version for Mac was launched. In July 2025, CapCut Pro for HarmonyOS was available on HarmonyOS NEXT tablets. In July 2024, CapCut was reported by the South China Morning Post to be a generative AI (GenAI) application that led global AI app downloads, with approximately 38.42 million downloads and 323 million monthly active users. == Features == CapCut supports basic video editing functions, including editing, trimming, and adding or splitting clips. Editing projects is limited to single-layer editing, but the app supports overlay options that enable additional effects, including multi-layer editing. The app includes a library of pre-made templates and a tool that generates editable video captions. It also provides photo editing tools, including retouch and product photo features integrated within the editing interface. CapCut's video editor includes AI-based features such as video and script generation. Users can export or save completed projects directly to different social media platforms. CapCut includes a free version and a paid Pro version with cloud storage and advanced features. == Controversies == === Illegal data collection === In July 2023, many users of CapCut accused it of illegally profiting off their personal data. A class-action lawsuit filed in the U.S. District Court for the Northern District of Illinois on July 28, 2023, alleged that CapCut illegally harvests and profits from user data including biometric information and geolocation without consent. In September 2025, a federal court excluded most of the lawsuit, which alleged that TikTok’s parent company improperly scraped private data from CapCut's video editing software, as lacking grounds, with some of the class action continuing to move forward. == Bans and restrictions == === Ban in India === As a response to border clashes with China in May 2020, the Indian government banned around 56 Chinese applications including CapCut and TikTok, which is owned by CapCut's parent company ByteDance. Indian users were unable to use and download the application. As of February 2022, around 273 Chinese applications have been banned by the Indian government under the concern of national security and Indian user privacy. === Ban in the United States === On January 18, 2025, at 10 PM EST, CapCut was banned in the United States along with TikTok and all other ByteDance apps due to the implementation of the Protecting Americans from Foreign Adversary Controlled Applications Act. Hours after the suspension of services took effect, President Donald Trump indicated on Truth Social that he would issue an executive order on the day of his inauguration "to extend the period of time before the law's prohibitions take effect". On January 21, CapCut began restoring service. On February 13, Google and Apple restored CapCut on the App Store and Google Play Store.

    Read more →
  • Quantexa

    Quantexa

    Quantexa is a UK-based software company that develops artificial intelligence-based applications for data analytics and decision-making. The company was founded in 2016 and is headquartered in London, with operations in North America, Europe, and the Asia-Pacific region. As of 2025, Quantexa reported a valuation of $2.6 billion and provides services to organizations in over 70 countries. Investors include Warburg Pincus, HSBC, and the Ontario Teachers’ Pension Plan. == History == Quantexa was founded in London in 2016 by several co-founders, including Jamie Hutton, Richard Seewald, Imam Hoque, Felix Hoddinott, and Vishal Marria, who also serves as the company's chief executive officer. The company was established to develop tools intended to address limitations in traditional data analysis methods, particularly those related to identifying hidden connections across large datasets. The name "Quantexa" is derived from the company's focus on quantitative methods and data analysis. In 2023, Quantexa acquired Dublin-based AI firm Aylien. In April 2023, the company completed a Series E funding round, raising $129 million at a valuation of approximately $1.8 billion, marking its entry into "unicorn" status. In October 2024, the company reported annual recurring revenue (ARR) exceeding $100 million. In early 2025, Quantexa participated in the World Economic Forum's Unicorn Program, which supports high-growth technology companies. In March 2025, Quantexa completed a Series F funding round of $175 million, led by Teachers' Venture Growth, the venture arm of the Ontario Teachers' Pension Plan. That August, the company was reported to be considering a 2026 IPO. The company formed a partnership with Zurich in October 2025, the first insurer to add its AI-based Decision Intelligence platform to enhance fraud detection.

    Read more →
  • N-jet

    N-jet

    An N-jet is the set of (partial) derivatives of a function f ( x ) {\displaystyle f(x)} up to order N. Specifically, in the area of computer vision, the N-jet is usually computed from a scale space representation L {\displaystyle L} of the input image f ( x , y ) {\displaystyle f(x,y)} , and the partial derivatives of L {\displaystyle L} are used as a basis for expressing various types of visual modules. For example, algorithms for tasks such as feature detection, feature classification, stereo matching, tracking and object recognition can be expressed in terms of N-jets computed at one or several scales in scale space.

    Read more →
  • Bottlenose (company)

    Bottlenose (company)

    Bottlenose.com, also known as Bottlenose, is an enterprise trend intelligence company that analyzes big data and business data to detect trends for brands. It helps Fortune 500 enterprises discover, and track emerging trends that affect their brands. The company uses natural language processing, sentiment analysis, statistical algorithms, data mining, and machine learning heuristics to determine trends, and has a search engine that gathers information from social networks. KPMG Capital has invested a "substantial amount" in the company. Bottlenose processed 72 billion messages per day, in real-time, from across social and broadcast media, as of December 2014. == History == The company is based in Los Angeles, CA. Bottlenose is a real-time trend intelligence tool that measures social media campaigns and trends. The company also provides a free version of its Sonar tool that shows real-time trends across social media. In October 2012, the company received $1 million of funding from ff Venture Capital and Prosper Capital. By 2014, the company raised about $7 million in funding. In December 2014, KPMG Capital announced further investment in the company. In February 2015, the company confirmed it had raised $13.4 million in Series B funding led by KPMG Capital. Bottlenose partnered with the nonprofit No Labels during the 2014 State of the Union Address to analyze Twitter conversations for bipartisanship. The company also partnered with media monitoring company Critical Mention to analyze broadcast analytics. The Bottlenose Nerve Center integrated with the Critical Mention API to analyze real-time trends in television and radio broadcasts. In June 2014, Bottlenose updated its trend detection product to Nerve Center 2.0. It creates a newsfeed to show changes in trends and sends alerts when trends occur. It also has "emotion detection," which will display the emotions associated with specific comments on trending topics. In 2016, Bottlenose released its Nerve Center 3.0 platform, which was designed to automate the work of data scientists and lower the cost of artificial intelligence for businesses.

    Read more →
  • Self-management (computer science)

    Self-management (computer science)

    Self-management is the process by which computer systems manage their own operation without human intervention. Self-management technologies are expected to pervade the next generation of network management systems. The growing complexity of modern networked computer systems is a limiting factor in their expansion. The increasing heterogeneity of corporate computer systems, the inclusion of mobile computing devices, and the combination of different networking technologies like WLAN, cellular phone networks, and mobile ad hoc networks make the conventional, manual management difficult, time-consuming, and error-prone. More recently, self-management has been suggested as a solution to increasing complexity in cloud computing. An industrial initiative towards realizing self-management is the Autonomic Computing Initiative (ACI) started by IBM in 2001. The ACI defines the following four functional areas: Self-configuration Auto-configuration of components Self-healing Automatic discovery, and correction of faults; automatically applying all necessary actions to bring system back to normal operation Self-optimization Automatic monitoring and control of resources to ensure the optimal functioning with respect to the defined requirements Self-protection Proactive identification and protection from arbitrary attacks

    Read more →
  • Neighborhood operation

    Neighborhood operation

    In computer vision and image processing a neighborhood operation is a commonly used class of computations on image data which implies that it is processed according to the following pseudo code: Visit each point p in the image data and do { N = a neighborhood or region of the image data around the point p result(p) = f(N) } This general procedure can be applied to image data of arbitrary dimensionality. Also, the image data on which the operation is applied does not have to be defined in terms of intensity or color, it can be any type of information which is organized as a function of spatial (and possibly temporal) variables in p. The result of applying a neighborhood operation on an image is again something which can be interpreted as an image, it has the same dimension as the original data. The value at each image point, however, does not have to be directly related to intensity or color. Instead it is an element in the range of the function f, which can be of arbitrary type. Normally the neighborhood N is of fixed size and is a square (or a cube, depending on the dimensionality of the image data) centered on the point p. Also the function f is fixed, but may in some cases have parameters which can vary with p, see below. In the simplest case, the neighborhood N may be only a single point. This type of operation is often referred to as a point-wise operation. == Examples == The most common examples of a neighborhood operation use a fixed function f which in addition is linear, that is, the computation consists of a linear shift invariant operation. In this case, the neighborhood operation corresponds to the convolution operation. A typical example is convolution with a low-pass filter, where the result can be interpreted in terms of local averages of the image data around each image point. Other examples are computation of local derivatives of the image data. It is also rather common to use a fixed but non-linear function f. This includes median filtering, and computation of local variances. The Nagao-Matsuyama filter is an example of a complex local neighbourhood operation that uses variance as an indicator of the uniformity within a pixel group. The result is similar to a convolution with a low-pass filter with the added effect of preserving sharp edges. There is also a class of neighborhood operations in which the function f has additional parameters which can vary with p: Visit each point p in the image data and do { N = a neighborhood or region of the image data around the point p result(p) = f(N, parameters(p)) } This implies that the result is not shift invariant. Examples are adaptive Wiener filters. == Implementation aspects == The pseudo code given above suggests that a neighborhood operation is implemented in terms of an outer loop over all image points. However, since the results are independent, the image points can be visited in arbitrary order, or can even be processed in parallel. Furthermore, in the case of linear shift-invariant operations, the computation of f at each point implies a summation of products between the image data and the filter coefficients. The implementation of this neighborhood operation can then be made by having the summation loop outside the loop over all image points. An important issue related to neighborhood operation is how to deal with the fact that the neighborhood N becomes more or less undefined for points p close to the edge or border of the image data. Several strategies have been proposed: Compute result only for points p for which the corresponding neighborhood is well-defined. This implies that the output image will be somewhat smaller than the input image. Zero padding: Extend the input image sufficiently by adding extra points outside the original image which are set to zero. The loops over the image points described above visit only the original image points. Border extension: Extend the input image sufficiently by adding extra points outside the original image which are set to the image value at the closest image point. The loops over the image points described above visit only the original image points. Mirror extension: Extend the image sufficiently much by mirroring the image at the image boundaries. This method is less sensitive to local variations at the image boundary than border extension. Wrapping: The image is tiled, so that going off one edge wraps around to the opposite side of the image. This method assumes that the image is largely homogeneous, for example a stochastic image texture without large textons.

    Read more →
  • Negobot

    Negobot

    Negobot also referred to as Lolita or Lolita chatbot is a chatterbot that was introduced to the public in 2013, designed by researchers from the University of Deusto and Optenet to catch online pedophiles. It is a conversational agent that utilizes natural language processing (NLP), information retrieval (IR) and Automatic Learning. Because the bot poses as a young female in order to entice and track potential predators, it became known in media as the "virtual Lolita", in reference to Vladimir Nabokov's novel. == Background == In 2013, the University of Deusto researchers published a paper on their work with Negobot and disclosed the text online. In their abstract, the researchers addressed the issue that an increasing number of children are using the internet and that these young users are more susceptible to existing internet risks. Their main objective was to create a chatterbot with the ability to trap online predators that posed a threat to children. They intended to deploy the bot into sites frequented by predators such as social networks and chatrooms. The university researchers used information provided by anti-pedophilia activist organization Perverted-Justice, including examples of online encounters and conversations with sexual predators, to supplement the program's artificial intelligence system. == Features == === Programmed persona === The chatterbot takes the guise of a naive and vulnerable 14-year-old girl. The bot's programmers used methods of artificial intelligence and natural language processing to create a conversational agent fluent in typical teenage slang, misspellings, and knowledge of pop culture. Through these linguistic features, the bot is able to mimic the conversational style of young teenagers. It also features split personalities and seven different patterns of conversation. Negobot's primary creator, Dr. Carlos Laorden, expressed the significance of the bot's distinguishable style of communication, stating that normally, "chatbots tend to be very predictable. Their behavior and interest in a conversation are flat, which is a problem when attempting to detect untrustworthy targets like paedophiles." What makes Negobot different is its game theory feature, which makes it able to "maintain a much more realistic conversation." Apart from being able to imitate a stereotypical teenager, the program is also able to translate messages into different languages. === Game theory === Negobot's designers programmed it with the ability to treat conversations with potential predators as if it were a game, the objective being to collect as much information on the suspect as possible that could provide evidence of pedophilic characteristics and motives. The use of game theory shapes the decisions the bot makes and the overall direction of the conversation. The bot initiates its undercover operations by entering a chat as a passive participant, waiting to be chatted by a user. Once a user elicits conversation, the bot will frame the conversation in such a way that keeps the target engaged, extracting personal information and discouraging it from leaving the chat. The information is then recorded to be potentially sent to the police. If the target seems to lose interest, the bot attempts to make it feel guilty by expressing sentiments of loneliness and emotional need through strategic, formulated responses, ultimately prolonging interaction. In addition, the bot may provide fake information about itself in attempt to lure the target into physical meetings. === Limitations === Despite being able to carry out a realistic conversation, Negobot is still unable to detect linguistic subtleties in the messages of others, including sarcasm. == Controversy == John Carr, a specialist in online child safety, expressed his concern to BBC over the legality of this undercover investigation. He claimed that using the bot on unsuspecting internet users could be considered a form of entrapment or harassment. The type of information that Negobot collects from potential online predators, he said, is unlikely to be upheld in court. Furthermore, he warned that relying on only software without any real-world policing risks enticing individuals to do or say things that they would not have if real-world policing were a factor.

    Read more →
  • Weight initialization

    Weight initialization

    In deep learning, weight initialization or parameter initialization describes the initial step in creating a neural network. A neural network contains trainable parameters that are modified during training: weight initialization is the pre-training step of assigning initial values to these parameters. The choice of weight initialization method affects the speed of convergence, the scale of neural activation within the network, the scale of gradient signals during backpropagation, and the quality of the final model. Proper initialization is necessary for avoiding issues such as vanishing and exploding gradients and activation function saturation. Note that even though this article is titled "weight initialization", both weights and biases are used in a neural network as trainable parameters, so this article describes how both of these are initialized. Similarly, trainable parameters in convolutional neural networks (CNNs) are called kernels and biases, and this article also describes these. == Constant initialization == We discuss the main methods of initialization in the context of a multilayer perceptron (MLP). Specific strategies for initializing other network architectures are discussed in later sections. For an MLP, there are only two kinds of trainable parameters, called weights and biases. Each layer l {\displaystyle l} contains a weight matrix W ( l ) ∈ R n l − 1 × n l {\displaystyle W^{(l)}\in \mathbb {R} ^{n_{l-1}\times n_{l}}} and a bias vector b ( l ) ∈ R n l {\displaystyle b^{(l)}\in \mathbb {R} ^{n_{l}}} , where n l {\displaystyle n_{l}} is the number of neurons in that layer. A weight initialization method is an algorithm for setting the initial values for W ( l ) , b ( l ) {\displaystyle W^{(l)},b^{(l)}} for each layer l {\displaystyle l} . The simplest form is zero initialization: W ( l ) = 0 , b ( l ) = 0 {\displaystyle W^{(l)}=0,b^{(l)}=0} Zero initialization is usually used for initializing biases, but it is not used for initializing weights, as it leads to symmetry in the network, causing all neurons to learn the same features. In this page, we assume b = 0 {\displaystyle b=0} unless otherwise stated. Recurrent neural networks typically use activation functions with bounded range, such as sigmoid and tanh, since unbounded activation may cause exploding values. (Le, Jaitly, Hinton, 2015) suggested initializing weights in the recurrent parts of the network to identity and zero bias, similar to the idea of residual connections and LSTM with no forget gate. In most cases, the biases are initialized to zero, though some situations can use a nonzero initialization. For example, in multiplicative units, such as the forget gate of LSTM, the bias can be initialized to 1 to allow good gradient signal through the gate. For neurons with ReLU activation, one can initialize the bias to a small positive value like 0.1, so that the gradient is likely nonzero at initialization, avoiding the dying ReLU problem. == Random initialization == Random initialization means sampling the weights from a normal distribution or a uniform distribution, usually independently. === LeCun initialization === LeCun initialization, popularized in (LeCun et al., 1998), is designed to preserve the variance of neural activations during the forward pass. It samples each entry in W ( l ) {\displaystyle W^{(l)}} independently from a distribution with mean 0 and variance 1 / n l − 1 {\displaystyle 1/n_{l-1}} . For example, if the distribution is a continuous uniform distribution, then the distribution is U ( ± 3 / n l − 1 ) {\displaystyle {\mathcal {U}}(\pm {\sqrt {3/n_{l-1}}})} . === Glorot initialization === Glorot initialization (or Xavier initialization) was proposed by Xavier Glorot and Yoshua Bengio. It was designed as a compromise between two goals: to preserve activation variance during the forward pass and to preserve gradient variance during the backward pass. For uniform initialization, it samples each entry in W ( l ) {\displaystyle W^{(l)}} independently and identically from U ( ± 6 / ( n l + 1 + n l − 1 ) ) {\displaystyle {\mathcal {U}}(\pm {\sqrt {6/(n_{l+1}+n_{l-1})}})} . In the context, n l − 1 {\displaystyle n_{l-1}} is also called the "fan-in", and n l + 1 {\displaystyle n_{l+1}} the "fan-out". When the fan-in and fan-out are equal, then Glorot initialization is the same as LeCun initialization. === He initialization === As Glorot initialization performs poorly for ReLU activation, He initialization (or Kaiming initialization) was proposed by Kaiming He et al. for networks with ReLU activation. It samples each entry in W ( l ) {\displaystyle W^{(l)}} from N ( 0 , 2 / n l − 1 ) {\displaystyle {\mathcal {N}}(0,2/n_{l-1})} . === Orthogonal initialization === (Saxe et al. 2013) proposed orthogonal initialization: initializing weight matrices as uniformly random (according to the Haar measure) semi-orthogonal matrices, multiplied by a factor that depends on the activation function of the layer. It was designed so that if one initializes a deep linear network this way, then its training time until convergence is independent of depth. Sampling a uniformly random semi-orthogonal matrix can be done by initializing X {\displaystyle X} by IID sampling its entries from a standard normal distribution, then calculate ( X X ⊤ ) − 1 / 2 X {\displaystyle \left(XX^{\top }\right)^{-1/2}X} or its transpose, depending on whether X {\displaystyle X} is tall or wide. For CNN kernels with odd widths and heights, orthogonal initialization is done this way: initialize the central point by a semi-orthogonal matrix, and fill the other entries with zero. As an illustration, a kernel K {\displaystyle K} of shape 3 × 3 × c × c ′ {\displaystyle 3\times 3\times c\times c'} is initialized by filling K [ 2 , 2 , : , : ] {\displaystyle K[2,2,:,:]} with the entries of a random semi-orthogonal matrix of shape c × c ′ {\displaystyle c\times c'} , and the other entries with zero. (Balduzzi et al., 2017) used it with stride 1 and zero-padding. This is sometimes called the Orthogonal Delta initialization. Related to this approach, unitary initialization proposes to parameterize the weight matrices to be unitary matrices, with the result that at initialization they are random unitary matrices (and throughout training, they remain unitary). This is found to improve long-sequence modelling in LSTM. Orthogonal initialization has been generalized to layer-sequential unit-variance (LSUV) initialization. It is a data-dependent initialization method, and can be used in convolutional neural networks. It first initializes weights of each convolution or fully connected layer with orthonormal matrices. Then, proceeding from the first to the last layer, it runs a forward pass on a random minibatch, and divides the layer's weights by the standard deviation of its output, so that its output has variance approximately 1. === Fixup initialization === In 2015, the introduction of residual connections allowed very deep neural networks to be trained, much deeper than the ~20 layers of the previous state of the art (such as the VGG-19). Residual connections gave rise to their own weight initialization problems and strategies. These are sometimes called "normalization-free" methods, since using residual connection could stabilize the training of a deep neural network so much that normalizations become unnecessary. Fixup initialization is designed specifically for networks with residual connections and without batch normalization, as follows: Initialize the classification layer and the last layer of each residual branch to 0. Initialize every other layer using a standard method (such as He initialization), and scale only the weight layers inside residual branches by L − 1 2 m − 2 {\displaystyle L^{-{\frac {1}{2m-2}}}} . Add a scalar multiplier (initialized at 1) in every branch and a scalar bias (initialized at 0) before each convolution, linear, and element-wise activation layer. Similarly, T-Fixup initialization is designed for Transformers without layer normalization. === Others === Instead of initializing all weights with random values on the order of O ( 1 / n ) {\displaystyle O(1/{\sqrt {n}})} , sparse initialization initialized only a small subset of the weights with larger random values, and the other weights zero, so that the total variance is still on the order of O ( 1 ) {\displaystyle O(1)} . Random walk initialization was designed for MLP so that during backpropagation, the L2 norm of gradient at each layer performs an unbiased random walk as one moves from the last layer to the first. Looks linear initialization was designed to allow the neural network to behave like a deep linear network at initialization, since W R e L U ( x ) − W R e L U ( − x ) = W x {\displaystyle W\;\mathrm {ReLU} (x)-W\;\mathrm {ReLU} (-x)=Wx} . It initializes a matrix W {\displaystyle W} of shape R n 2 × m {\displaystyle \mathbb {R} ^{{\frac {n}{2}}\times m}} by any method, such as orthogonal initialization, t

    Read more →
  • Digistar

    Digistar

    Digistar is the first computer graphics-based planetarium projection and content system. It was designed by Evans & Sutherland and released in 1983. The technology originally focused on accurate and high quality display of stars, including for the first time showing stars from points of view other than Earth's surface, travelling through the stars, and accurately showing celestial bodies from different times in the past and future. Beginning with the Digistar 3 the system now projects full-dome video. == Projector == Unlike modern full-dome systems, which use LCD, DLP, SXRD, or laser projection technology, the Digistar projection system was designed for projecting bright pinpoints of light representing stars. This was accomplished using a calligraphic display, a form of vector graphics, rather than raster graphics. The heart of the Digistar projector is a large cathode-ray tube (CRT). A phosphor plate is mounted atop the tube, and light is then dispersed by a large lens with a 160 degree field of view to cover the planetarium dome. The original lens bore the inscription: "August 1979 mfg. by Lincoln Optical Corp., L.A., CA for Evans and Sutherland Computer Corp., SLC, UT, Digital planetarium CRT projection lens, 43mm, f2.8, 160 degree field of view". The coordinates of the stars and wire-frame models to be displayed by the projector were stored in computer RAM in a display list. The display would read each set of coordinates in turn and drive the CRT's electron beam directly to those coordinates. If the electron beam was enabled while being moved a line would be painted on the phosphor plate. Otherwise, the electron beam would be enabled once at its destination and a star would be painted. Once all coordinates in the display list had been processed, the display would repeat from the top of the display list. Thus, the shorter the display list the more frequently the electron beam would refresh the charge on a given point on the phosphor plate, making the projection of the points brighter. In this way, the stars projected by Digistar were substantially brighter than could be achieved using a raster display, which has to touch every point on the phosphor plate before repeating. Likewise, the calligraphic technology allowed Digistar to have a darker black-level than full-dome projectors, since the portions of the phosphor plate representing dark sky were never hit by the electron beam. As it is only one tube, with no pixelated color filter screen, the Digistar projector is monochromatic. The Digistar projects a bright, phosphorescent green, though many (including both visitors and planetarians) report they cannot distinguish between this green and white. Additionally, unlike a raster display, the calligraphic display is not discretized into pixels, so the displayed stars were a more realistic single spot of light, without the blocky or ropy artifacts that are hard to avoid with raster graphics. Due to the use of vector graphics, as opposed to raster imaging, the Digistar does not have the resolution issues that many full-dome systems have. Thanks to this, and the brightness of the CRT, only one projector is needed to project on the entire dome, whereas most full-dome systems require up to six raster projectors, depending on dome size. The projector in the original Digistar was housed in a square pyramid-shaped sheathing. When powered on, the four sides at the tip of the pyramid would recede into the housing, exposing the lens and appearing as a cut-off pyramid. As Digistar II was being developed, many planetaria were sold Digistar LEA projectors. The LEA, called Digistar 1.5 by many users, was effectively a prototype of the D2 projector, compatible with Digistar and upgradable to Digistar II. There are no significant differences in performance between the LEA and the true D2. == History == Digistar was the brainchild of Stephen McAllister and Brent Watson, both of whom were long-time amateur astronomers and computer graphics engineers. In 1977, E&S had been consulting with Johnson Space Center regarding training simulators for astronauts. McAllister had been writing proof-of-concept software for this consultation and in summer 1977 entered the data for 400 bright stars and wrote the software to display them. Steve and Brent both originally saw the system's purpose as celestial navigation training. Brent, who had until recently worked at Hansen planetarium, asked his planetarium coworkers what they thought of a potential digital planetarium system, and then Steve and Brent both targeted the system toward planetaria. The primary goal of the planetarium system was to use computer graphics to overcome the limitation of traditional star ball technology that only allowed display of star fields from the point of view of Earth's surface. By using computer graphics the stars could be displayed from viewpoints in space, including simulating the appearance of space flight. Likewise, planets and moons within the Solar System could be displayed accurately for any time in history, from any point of view. The system used the location of real stars from the Yale Bright Star Catalogue, as well as random stars. A laboratory prototype of Digistar was used to generate the star fields and tactical displays in the 1982 science fiction film Star Trek II: The Wrath of Khan. Filming was done directly from the Digistar display in the lab. ILM projected the effort would take two weeks, but in fact it took from late November 1981 until mid-February 1982. The last shot recorded was what became the first entirely computer generated feature film sequence. It was the opening scene of the film, a rotating forward translation through a star field that lasted 3.5 minutes. It was recorded in one take, at a rate of one frame every 3.5 seconds, taking four hours for the shoot. The Digistar team members are credited in the film. After prototyping in labs at Evans and Sutherland the team repeatedly used Salt Lake City's Hansen planetarium to beta test the system at the planetarium at night. The Digistar team performed one week of shows at the planetarium as a fund raiser to benefit the planetarium. The company also later gave the planetarium an improved prototype Digistar to replace "Jake", the planetarium's aging Spitz planetarium projector. The first customer installation was to the newly constructed Universe Planetarium at the Science Museum of Virginia in 1983, the largest planetarium dome in the world at the time, for $595,000. By September 1986 there were four installed Digistars. Even at this point the long-term success of the product was very much in doubt, but as of 2019 Digistar has an installed base of over 550 planetaria. === Versions === Digistar (1983) Digistar II (1995) Digistar 3 (2002) Digistar 4 (2010?) Digistar 5 (2012) Digistar 6 (2016) Digistar 7 (2021) == Hardware == Digistar was driven by a VAX-11/780 minicomputer, with custom graphics hardware related to the E&S Picture System 2. Later versions of Digistar 1 used a DEC MicroVAX 2, driving a custom version of a PS/300. The original Digistar and Digistar 2 had a physical control panel that was used for running the star shows. This control panel was approximately 3' x 4' and contained a keyboard, a 6 DOF joystick, and a large array of back-lit buttons. One button that was used for moving the viewpoint forward in space was labeled "Boldly Go". Later iterations of Digistar replaced the physical control panel with a common graphical user interface. Digistar 3 was the first Digistar system to offer full-dome video in 2002, using six projectors. Digistar 4 was able to cover the dome using only two projectors. == System limitations == Though technologically advanced in its day, and the closest system to true full-dome video at the time of its release, the original Digistar and Digistar 2 are limited to only projecting dots and lines—meaning only wireframe models can be projected. To compensate for this, the projector is capable of defocusing specific models, blurring lines and dots together. An example of this is in the Digistar 2's built-in Milky Way model. The model is a circle of parallel lines that, when defocused, appear as the continuous band of the Milky Way across the sky. On more complex models, especially three-dimensional ones, brightness and details may be lost in this process, so it is not useful in all situations. The Digistar and Digistar 2 also suffer focus limitations. Because they use a single lens to cover the entire dome, it is difficult to gain perfect focus across the dome. Coupled with this, stars greater than a certain brightness are "multihit" points, meaning the projector draws two dots at the given position to accommodate the brightness of the star. Errors in the projector can lead the second dot to be slightly out-of-place with the first one. These two issues together, along with other issues that can occur within the projector's focus system, give the stars a blobby look. Some p

    Read more →
  • Retrieval-augmented generation

    Retrieval-augmented generation

    Retrieval-augmented generation (RAG) is a technique that enables large language models (LLMs) to retrieve and incorporate new information from external data sources. With RAG, LLMs first refer to a specified set of documents, then respond to user queries. These documents supplement information from the LLM's pre-existing training data. This allows LLMs to use domain-specific and/or updated information that is not available in the training data. For example, this enables LLM-based chatbots to access internal company data or generate responses based on authoritative sources. RAG improves LLMs by incorporating information retrieval before generating responses. Unlike LLMs that rely on static training data, RAG pulls relevant text from databases, uploaded documents, or web sources. According to Ars Technica, "RAG is a way of improving LLM performance, in essence by blending the LLM process with a web search or other document look-up process to help LLMs stick to the facts." This method helps reduce AI hallucinations, which have caused chatbots to describe policies that don't exist, or recommend nonexistent legal cases to lawyers that are looking for citations to support their arguments. RAG also reduces the need to retrain LLMs with new data, saving on computational and financial costs. Beyond efficiency gains, RAG also allows LLMs to include sources in their responses, so users can verify the cited sources. This provides greater transparency, as users can cross-check retrieved content to ensure accuracy and relevance. The term retrieval-augmented generation (RAG) was introduced in a 2020 paper that described combining a parametric language model with a non-parametric external memory accessed through retrieval at inference time. == RAG and LLM limitations == LLMs can provide incorrect information. For example, when Google first demonstrated its LLM tool "Google Bard" (later re-branded to Gemini), the LLM provided incorrect information about the James Webb Space Telescope. This error contributed to a $100 billion decline in Google's stock value. RAG is used to prevent these errors, but it does not solve all the problems. For example, LLMs can generate misinformation even when pulling from factually correct sources if they misinterpret the context. MIT Technology Review gives the example of an AI-generated response stating, "The United States has had one Muslim president, Barack Hussein Obama." The model retrieved this from an academic book rhetorically titled Barack Hussein Obama: America's First Muslim President? The LLM did not "know" or "understand" the context of the title, generating a false statement. LLMs with RAG are programmed to prioritize new information. This technique has been called "prompt stuffing." Without prompt stuffing, the LLM's input is generated by a user; with prompt stuffing, additional relevant context is added to this input to guide the model's response. This approach provides the LLM with key information early in the prompt, encouraging it to prioritize the supplied data over pre-existing training knowledge. == Process == Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating an information-retrieval mechanism that allows models to access and utilize additional data beyond their original training set. Ars Technica notes that "when new information becomes available, rather than having to retrain the model, all that's needed is to augment the model's external knowledge base with the updated information" ("augmentation"). IBM states that "in the generative phase, the LLM draws from the augmented prompt and its internal representation of its training data to synthesize" an answer. === RAG key stages === Typically, the data to be referenced is converted into LLM embeddings, numerical representations in the form of a large vector space. RAG can be used on unstructured (usually text), semi-structured, or structured data (for example knowledge graphs). These embeddings are then stored in a vector database to allow for document retrieval. Given a user query, a document retriever is first called to select the most relevant documents that will be used to augment the query. This comparison can be done using a variety of methods, which depend in part on the type of indexing used. The model feeds this relevant retrieved information into the LLM via prompt engineering of the user's original query. Newer implementations (as of 2023) can also incorporate specific augmentation modules with abilities such as expanding queries into multiple domains and using memory and self-improvement to learn from previous retrievals. Finally, the LLM can generate output based on both the query and the retrieved documents. Some models incorporate extra steps to improve output, such as the re-ranking of retrieved information, context selection, and fine-tuning. == Applications == Retrieval-augmented generation is used in applications where generated responses need to be grounded in external or frequently updated information. Commonly cited use cases include search engines, question-answering systems, customer support chatbots, enterprise knowledge assistants, content generation, recommendation systems, retail and e-commerce, and industrial or manufacturing workflows. In healthcare, RAG has been studied as a way to ground large language model outputs in external medical knowledge sources, although reviews have noted continuing challenges around evaluation, ethics, and clinical reliability. == Improvements == Improvements to the basic process above can be applied at different stages in the RAG flow. === Encoder === These methods focus on the encoding of text as either dense or sparse vectors. Sparse vectors, which encode the identity of a word, are typically dictionary-length and contain mostly zeros. Dense vectors, which encode meaning, are more compact and contain fewer zeros. Various enhancements can improve the way similarities are calculated in the vector stores (databases). Performance improves by optimizing how vector similarities are calculated. Dot products enhance similarity scoring, while approximate nearest neighbor (ANN) searches improve retrieval efficiency over K-nearest neighbors (KNN) searches. Accuracy may be improved with Late Interactions, which allow the system to compare words more precisely after retrieval. This helps refine document ranking and improve search relevance. Hybrid vector approaches may be used to combine dense vector representations with sparse one-hot vectors, taking advantage of the computational efficiency of sparse dot products over dense vector operations. Other retrieval techniques focus on improving accuracy by refining how documents are selected. Some retrieval methods combine sparse representations, such as SPLADE, with query expansion strategies to improve search accuracy and recall. === Retriever-centric methods === These methods aim to enhance the quality of document retrieval in vector databases: Pre-training the retriever using the Inverse Cloze Task (ICT), a technique that helps the model learn retrieval patterns by predicting masked text within documents. Supervised retriever optimization aligns retrieval probabilities with the generator model's likelihood distribution. This involves retrieving the top-k vectors for a given prompt, scoring the generated response's perplexity, and minimizing KL divergence between the retriever's selections and the model's likelihoods to refine retrieval. Reranking techniques can refine retriever performance by prioritizing the most relevant retrieved documents during training. === Language model === By redesigning the language model with the retriever in mind, a 25-time smaller network can get comparable perplexity as its much larger counterparts. Because it is trained from scratch, this method (Retro) incurs the high cost of training runs that the original RAG scheme avoided. The hypothesis is that by giving domain knowledge during training, Retro needs less focus on the domain and can devote its smaller weight resources only to language semantics. The redesigned language model is shown here. It has been reported that Retro is not reproducible, so modifications were made to make it so. The more reproducible version is called Retro++ and includes in-context RAG. === Chunking === Chunking involves various strategies for breaking up the data into vectors so the retriever can find details in it. Three types of chunking strategies are: Fixed length with overlap. This is fast and easy. Overlapping consecutive chunks helps to maintain semantic context across chunks. Syntax-based chunks can break the document up into sentences. Libraries such as spaCy or NLTK can also help. File format-based chunking. Certain file types have natural chunks built in, and it's best to respect them. For example, code files are best chunked and vectorized as whole functions or classes. HTML files should leave

    or base64 encoded elements

    Read more →
  • Seq2seq

    Seq2seq

    Seq2seq is a family of machine learning approaches used for natural language processing. Originally developed by Lê Viết Quốc, a Vietnamese computer scientist and a machine learning pioneer at Google Brain, this framework has become foundational in many modern AI systems. Applications include language translation, image captioning, conversational models, speech recognition, and text summarization. Seq2seq uses sequence transformation: it turns one sequence into another sequence. == History == One naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say: 'This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode. seq2seq is an approach to machine translation (or more generally, sequence transduction) with roots in information theory, where communication is understood as an encode-transmit-decode process, and machine translation can be studied as a special case of communication. This viewpoint was elaborated, for example, in the noisy channel model of machine translation. In practice, seq2seq maps an input sequence into a real-numerical vector by using a neural network (the encoder), and then maps it back to an output sequence using another neural network (the decoder). The idea of encoder-decoder sequence transduction had been developed in the early 2010s. The papers most commonly cited as the originators that produced seq2seq are two papers from 2014. In the seq2seq as proposed by them, both the encoder and the decoder were LSTMs. This had the "bottleneck" problem, since the encoding vector has a fixed size, so for long input sequences, information would tend to be lost, as they are difficult to fit into the fixed-length encoding vector. The attention mechanism, proposed in 2014, resolved the bottleneck problem. They called their model RNNsearch, as it "emulates searching through a source sentence during decoding a translation". A problem with seq2seq models at this point was that recurrent neural networks are difficult to parallelize. The 2017 publication of Transformers resolved the problem by replacing the encoding RNN with self-attention Transformer blocks ("encoder blocks"), and the decoding RNN with cross-attention causally-masked Transformer blocks ("decoder blocks"). === Priority dispute === One of the papers cited as the originator for seq2seq is (Sutskever et al 2014), published at Google Brain while they were on Google's machine translation project. The research allowed Google to overhaul Google Translate into Google Neural Machine Translation in 2016. Tomáš Mikolov claims to have developed the idea (before joining Google Brain) of using a "neural language model on pairs of sentences... and then [generating] translation after seeing the first sentence"—which he equates with seq2seq machine translation, and to have mentioned the idea to Ilya Sutskever and Quoc Le (while at Google Brain), who failed to acknowledge him in their paper. Mikolov had worked on RNNLM (using RNN for language modelling) for his PhD thesis, and is more notable for developing word2vec. == Architecture == The main reference for this section is. === Encoder === The encoder is responsible for processing the input sequence and capturing its essential information, which is stored as the hidden state of the network and, in a model with attention mechanism, a context vector. The context vector is the weighted sum of the input hidden states and is generated for every time instance in the output sequences. === Decoder === The decoder takes the context vector and hidden states from the encoder and generates the final output sequence. The decoder operates in an autoregressive manner, producing one element of the output sequence at a time. At each step, it considers the previously generated elements, the context vector, and the input sequence information to make predictions for the next element in the output sequence. Specifically, in a model with attention mechanism, the context vector and the hidden state are concatenated together to form an attention hidden vector, which is used as an input for the decoder. The seq2seq method developed in the early 2010s uses two neural networks: an encoder network converts an input sentence into numerical vectors, and a decoder network converts those vectors to sentences in the target language. The Attention mechanism was grafted onto this structure in 2014 and is shown below. Later it was refined into the encoder-decoder Transformer architecture of 2017. === Training vs prediction === There is a subtle difference between training and prediction. During training time, both the input and the output sequences are known. During prediction time, only the input sequence is known, and the output sequence must be decoded by the network itself. Specifically, consider an input sequence x 1 : n {\displaystyle x_{1:n}} and output sequence y 1 : m {\displaystyle y_{1:m}} . The encoder would process the input x 1 : n {\displaystyle x_{1:n}} step by step. After that, the decoder would take the output from the encoder, as well as the as input, and produce a prediction y ^ 1 {\displaystyle {\hat {y}}_{1}} . Now, the question is: what should be input to the decoder in the next step? A standard method for training is "teacher forcing". In teacher forcing, no matter what is output by the decoder, the next input to the decoder is always the reference. That is, even if y ^ 1 ≠ y 1 {\displaystyle {\hat {y}}_{1}\neq y_{1}} , the next input to the decoder is still y 1 {\displaystyle y_{1}} , and so on. During prediction time, the "teacher" y 1 : m {\displaystyle y_{1:m}} would be unavailable. Therefore, the input to the decoder must be y ^ 1 {\displaystyle {\hat {y}}_{1}} , then y ^ 2 {\displaystyle {\hat {y}}_{2}} , and so on. It is found that if a model is trained purely by teacher forcing, its performance would degrade during prediction time, since generation based on the model's own output is different from generation based on the teacher's output. This is called exposure bias or a train/test distribution shift. A 2015 paper recommends that, during training, randomly switch between teacher forcing and no teacher forcing. === Attention for seq2seq === The attention mechanism is an enhancement introduced by Bahdanau et al. in 2014 to address limitations in the basic Seq2Seq architecture where a longer input sequence results in the hidden state output of the encoder becoming irrelevant for the decoder. It enables the model to selectively focus on different parts of the input sequence during the decoding process. At each decoder step, an alignment model calculates the attention score using the current decoder state and all of the attention hidden vectors as input. An alignment model is another neural network model that is trained jointly with the seq2seq model used to calculate how well an input, represented by the hidden state, matches with the previous output, represented by attention hidden state. A softmax function is then applied to the attention score to get the attention weight. In some models, the encoder states are directly fed into an activation function, removing the need for alignment model. An activation function receives one decoder state and one encoder state and returns a scalar value of their relevance. Consider the seq2seq language English-to-French translation task. To be concrete, let us consider the translation of "the zone of international control ", which should translate to "la zone de contrôle international ". Here, we use the special token as a control character to delimit the end of input for both the encoder and the decoder. An input sequence of text x 0 , x 1 , … {\displaystyle x_{0},x_{1},\dots } is processed by a neural network (which can be an LSTM, a Transformer encoder, or some other network) into a sequence of real-valued vectors h 0 , h 1 , … {\displaystyle h_{0},h_{1},\dots } , where h {\displaystyle h} stands for "hidden vector". After the encoder has finished processing, the decoder starts operating over the hidden vectors, to produce an output sequence y 0 , y 1 , … {\displaystyle y_{0},y_{1},\dots } , autoregressively. That is, it always takes as input both the hidden vectors produced by the encoder, and what the decoder itself has produced before, to produce the next output word: ( h 0 , h 1 , … {\displaystyle h_{0},h_{1},\dots } , "") → "la" ( h 0 , h 1 , … {\displaystyle h_{0},h_{1},\dots } , " la") → "la zone" ( h 0 , h 1 , … {\displaystyle h_{0},h_{1},\dots } , " la zone") → "la zone de" ... ( h 0 , h 1 , … {\displaystyle h_{0},h_{1},\dots } , " la zone de contrôle international") → "la zone de contrôle international " Here, we use the special token as a control character to delimit the start of input for the decoder. The decoding terminates as soon as "" appears in the decoder output. ==

    Read more →