Rprop, short for resilient backpropagation, is a learning heuristic for supervised learning in feedforward artificial neural networks. This is a first-order optimization algorithm. This algorithm was created by Martin Riedmiller and Heinrich Braun in 1992. Similarly to the Manhattan update rule, Rprop takes into account only the sign of the partial derivative over all patterns (not the magnitude), and acts independently on each "weight". For each weight, if there was a sign change of the partial derivative of the total error function compared to the last iteration, the update value for that weight is multiplied by a factor η−, where η− < 1. If the last iteration produced the same sign, the update value is multiplied by a factor of η+, where η+ > 1. The update values are calculated for each weight in the above manner, and finally each weight is changed by its own update value, in the opposite direction of that weight's partial derivative, so as to minimise the total error function. η+ is empirically set to 1.2 and η− to 0.5. Rprop can result in very large weight increments or decrements if the gradients are large, which is a problem when using mini-batches as opposed to full batches. RMSprop addresses this problem by keeping the moving average of the squared gradients for each weight and dividing the gradient by the square root of the mean square. RPROP is a batch update algorithm. Next to the cascade correlation algorithm and the Levenberg–Marquardt algorithm, Rprop is one of the fastest weight update mechanisms. == Variations == Martin Riedmiller developed three algorithms, all named RPROP. Igel and Hüsken assigned names to them and added a new variant: RPROP+ is defined at A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm. RPROP− is defined at Advanced Supervised Learning in Multi-layer Perceptrons – From Backpropagation to Adaptive Learning Algorithms. Backtracking is removed from RPROP+. iRPROP− is defined in Rprop – Description and Implementation Details and was reinvented by Igel and Hüsken. This variant is very popular and most simple. iRPROP+ is defined at Improving the Rprop Learning Algorithm and is very robust and typically faster than the other three variants.
Grammar systems theory
Grammar systems theory is a field of theoretical computer science that studies systems of finite collections of formal grammars generating a formal language. Each grammar works on a string, a so-called sequential form that represents an environment. Grammar systems can thus be used as a formalization of decentralized or distributed systems of agents in artificial intelligence. Let A {\displaystyle \mathbb {A} } be a simple reactive agent moving on the table and trying not to fall down from the table with two reactions, t for turning and ƒ for moving forward. The set of possible behaviors of A {\displaystyle \mathbb {A} } can then be described as formal language L A = { ( f m t n f r ) + : 1 ≤ m ≤ k ; 1 ≤ n ≤ ℓ ; 1 ≤ r ≤ k } , {\displaystyle \mathbb {L_{A}} =\{(f^{m}t^{n}f^{r})^{+}:1\leq m\leq k;1\leq n\leq \ell ;1\leq r\leq k\},} where ƒ can be done maximally k times and t can be done maximally ℓ times considering the dimensions of the table. Let G A {\displaystyle \mathbb {G_{A}} } be a formal grammar which generates language L A {\displaystyle \mathbb {L_{A}} } . The behavior of A {\displaystyle \mathbb {A} } is then described by this grammar. Suppose the A {\displaystyle \mathbb {A} } has a subsumption architecture; each component of this architecture can be then represented as a formal grammar, too, and the final behavior of the agent is then described by this system of grammars. The schema on the right describes such a system of grammars which shares a common string representing an environment. The shared sequential form is sequentially rewritten by each grammar, which can represent either a component or generally an agent. If grammars communicate together and work on a shared sequential form, it is called a Cooperating Distributed (DC) grammar system. Shared sequential form is a similar concept to the blackboard approach in AI, which is inspired by an idea of experts solving some problem together while they share their proposals and ideas on a shared blackboard. Each grammar in a grammar system can also work on its own string and communicate with other grammars in a system by sending their sequential forms on request. Such a grammar system is then called a Parallel Communicating (PC) grammar system. PC and DC are inspired by distributed AI. If there is no communication between grammars, the system is close to the decentralized approaches in AI. These kinds of grammar systems are sometimes called colonies or Eco-Grammar systems, depending (besides others) on whether the environment is changing on its own (Eco-Grammar system) or not (colonies).
Vulnerabilities Equities Process
The Vulnerabilities Equities Process (VEP) is a process used by the U.S. federal government to determine on a case-by-case basis how it should treat zero-day computer security vulnerabilities: whether to disclose them to the public to help improve general computer security, or to keep them secret for offensive use against the government's adversaries. The VEP was first developed during the period 2008–2009, but only became public in 2016, when the government released a redacted version of the VEP in response to a FOIA request by the Electronic Frontier Foundation. Following public pressure for greater transparency in the wake of the Shadow Brokers affair, the U.S. government made a more public disclosure of the VEP process in November 2017. == Participants == According to the VEP plan published in 2017, the Equities Review Board (ERB) is the primary forum for interagency deliberation and determinations concerning the VEP. The ERB meets monthly, but may also be convened sooner if an immediate need arises. The ERB consists of representatives from the following agencies: Office of Management and Budget Office of the Director of National Intelligence (including the Intelligence Community-Security Coordination Center) United States Department of the Treasury United States Department of State United States Department of Justice (including the Federal Bureau of Investigation and the National Cyber Investigative Joint Task Force) Department of Homeland Security (including the National Cybersecurity and Communications Integration Center and the United States Secret Service) United States Department of Energy United States Department of Defense (to include the National Security Agency, including Information Assurance and Signals Intelligence elements), United States Cyber Command, and DoD Cyber Crime Center) United States Department of Commerce Central Intelligence Agency The National Security Agency serves as the executive secretariat for the VEP. == Process == According to the November 2017 version of the VEP, the process is as follows: === Submission and notification === When an agency finds a vulnerability, it will notify the VEP secretariat as soon as is possible. The notification will include a description of the vulnerability and the vulnerable products or systems, together with the agency's recommendation to either disseminate or restrict the vulnerability information. The secretariat will then notify all participants of the submission within one business day, requesting them to respond if they have an relevant interest. === Equity and discussions === An agency expressing an interest must indicate whether it concurs with the original recommendation to disseminate or restrict within five business days. If it does not, it will hold discussions with the submitting agency and the VEP secretariat within seven business days to attempt to reach consensus. If no consensus is reached, the participants will suggest options for the Equities Review Board. === Determination to disseminate or restrict === Decisions whether to disclose or restrict a vulnerability should be made quickly, in full consultation with all concerned agencies, and in the overall best interest of the competing interests of the missions of the U.S. government. As far as possible, determinations should be based on rational, objective methodologies, taking into account factors such as prevalence, reliance, and severity. If the review board members cannot reach consensus, they will vote on a preliminary determination. If an agency with an equity disputes that decision, they may, by providing notice to the VEP secretariat, elect to contest the preliminary determination. If no agency contests a preliminary determination, it will be treated as a final decision. === Handling and follow-on actions === If vulnerability information is released, this will be done as quickly as possible, preferably within seven business days. Disclosure of vulnerabilities will be conducted according to guidelines agreed on by all members. The submitting agency is presumed to be most knowledgeable about the vulnerability and, as such, will be responsible for disseminating vulnerability information to the vendor. The submitting agency may elect to delegate dissemination responsibility to another agency on its behalf. The releasing agency will promptly provide a copy of the disclosed information to the VEP secretariat for record keeping. Additionally, the releasing agency is expected to follow up so the ERB can determine whether the vendor's action meets government requirements. If the vendor chooses not to address a vulnerability, or is not acting with urgency consistent with the risk of the vulnerability, the releasing agency will notify the secretariat, and the government may take other mitigation steps. == Criticism == The VEP process has been criticized for a number of deficiencies, including restriction by non-disclosure agreements, lack of risk ratings, special treatment for the NSA, and less than whole-hearted commitment to disclosure as the default option. == UK equivalent == British intelligence agencies—GCHQ in particular—follow a similar approach, also known as the Equities Process, to determine whether to disclose or retain security vulnerabilities. The Investigatory Powers Act 2016 was amended in 2022 to bring oversight of the operation of the process within the remit of the Investigatory Powers Commissioner. Details of the process were made public in 2018.
SMBGhost
SMBGhost (or SMBleedingGhost or CoronaBlue) is a type of security vulnerability, with wormlike features, that affects Windows 10 computers and was first reported publicly on 10 March 2020. == Security vulnerability == A proof of concept (PoC) exploit code was published 1 June 2020 on GitHub by a security researcher. The code could possibly spread to millions of unpatched computers, resulting in as much as tens of billions of dollars in losses. Microsoft recommends all users of Windows 10 versions 1903 and 1909 and Windows Server versions 1903 and 1909 to install patches, and states, "We recommend customers install updates as soon as possible as publicly disclosed vulnerabilities have the potential to be leveraged by bad actors ... An update for this vulnerability was released in March [2020], and customers who have installed the updates, or have automatic updates enabled, are already protected." Workarounds, according to Microsoft, such as disabling SMB compression and blocking port 445, may help but may not be sufficient. According to the advisory division of Homeland Security, "Malicious cyber actors are targeting unpatched systems with the new [threat], ... [and] strongly recommends using a firewall to block server message block ports from the internet and to apply patches to critical- and high-severity vulnerabilities as soon as possible."
Line integral convolution
In scientific visualization, line integral convolution (LIC) is a method to visualize a vector field (such as fluid motion) at high spatial resolutions. The LIC technique was first proposed by Brian Cabral and Leith Casey Leedom in 1993. In LIC, discrete numerical line integration is performed along the field lines (curves) of the vector field on a uniform grid. The integral operation is a convolution of a filter kernel and an input texture, often white noise. In signal processing, this process is known as a discrete convolution. == Overview == Traditional visualizations of vector fields use small arrows or lines to represent vector direction and magnitude. This method has a low spatial resolution, which limits the density of presentable data and risks obscuring characteristic features in the data. More sophisticated methods, such as streamlines and particle tracing techniques, can be more revealing but are highly dependent on proper seed points. Texture-based methods, like LIC, avoid these problems since they depict the entire vector field at point-like (pixel) resolution. Compared to other integration-based techniques that compute field lines of the input vector field, LIC has the advantage that all structural features of the vector field are displayed, without the need to adapt the start and end points of field lines to the specific vector field. In other words, it shows the topology of the vector field. In user testing, LIC was found to be particularly good for identifying critical points. == Algorithm == === Informal description === LIC causes output values to be strongly correlated along the field lines, but uncorrelated in orthogonal directions. As a result, the field lines contrast each other and stand out visually from the background. Intuitively, the process can be understood with the following example: the flow of a vector field can be visualized by overlaying a fixed, random pattern of dark and light paint. As the flow passes by the paint, the fluid picks up some of the paint's color, averaging it with the color it has already acquired. The result is a randomly striped, smeared texture where points along the same streamline tend to have a similar color. Other physical examples include: whorl patterns of paint, oil, or foam on a river visualisation of magnetic field lines using randomly distributed iron filings fine sand being blown by strong wind === Formal mathematical description === Although the input vector field and the result image are discretized, it pays to look at it from a continuous viewpoint. Let v {\displaystyle \mathbf {v} } be the vector field given in some domain Ω {\displaystyle \Omega } . Although the input vector field is typically discretized, we regard the field v {\displaystyle \mathbf {v} } as defined in every point of Ω {\displaystyle \Omega } , i.e. we assume an interpolation. Streamlines, or more generally field lines, are tangent to the vector field in each point. They end either at the boundary of Ω {\displaystyle \Omega } or at critical points where v = 0 {\displaystyle \mathbf {v} =\mathbf {0} } . For the sake of simplicity, critical points and boundaries are ignored in the following. A field line σ {\displaystyle {\boldsymbol {\sigma }}} , parametrized by arc length s {\displaystyle s} , is defined as d σ ( s ) d s = v ( σ ( s ) ) | v ( σ ( s ) ) | . {\displaystyle {\frac {d{\boldsymbol {\sigma }}(s)}{ds}}={\frac {\mathbf {v} ({\boldsymbol {\sigma }}(s))}{|\mathbf {v} ({\boldsymbol {\sigma }}(s))|}}.} Let σ r ( s ) {\displaystyle {\boldsymbol {\sigma }}_{\mathbf {r} }(s)} be the field line that passes through the point r {\displaystyle \mathbf {r} } for s = 0 {\displaystyle s=0} . Then the image gray value at r {\displaystyle \mathbf {r} } is set to D ( r ) = ∫ − L / 2 L / 2 k ( s ) N ( σ r ( s ) ) d s {\displaystyle D(\mathbf {r} )=\int _{-L/2}^{L/2}k(s)N({\boldsymbol {\sigma }}_{\mathbf {r} }(s))ds} where k ( s ) {\displaystyle k(s)} is the convolution kernel, N ( r ) {\displaystyle N(\mathbf {r} )} is the noise image, and L {\displaystyle L} is the length of field line segment that is followed. D ( r ) {\displaystyle D(\mathbf {r} )} has to be computed for each pixel in the LIC image. If carried out naively, this is quite expensive. First, the field lines have to be computed using a numerical method for solving ordinary differential equations, like a Runge–Kutta method, and then for each pixel the convolution along a field line segment has to be calculated. The final image will normally be colored in some way. Typically, some scalar field in Ω {\displaystyle \Omega } (like the vector length) is used to determine the hue, while the grayscale LIC output determines the brightness. Different choices of convolution kernels and random noise produce different textures; for example, pink noise produces a cloudy pattern where areas of higher flow stand out as smearing, suitable for weather visualization. Further refinements in the convolution can improve the quality of the image. === Programming description === Algorithmically, LIC takes a vector field and noise texture as input, and outputs a texture. The process starts by generating in the domain of the vector field a random gray level image at the desired output resolution. Then, for every pixel in this image, the forward and backward streamline of a fixed arc length is calculated. The value assigned to the current pixel is computed by a convolution of a suitable convolution kernel with the gray levels of all the noise pixels lying on a segment of this streamline. This creates a gray level LIC image. == Versions == === Basic === Basic LIC images are grayscale images, without color and animation. While such LIC images convey the direction of the field vectors, they do not indicate orientation; for stationary fields, this can be remedied by animation. Basic LIC images do not show the length of the vectors (or the strength of the field). === Color === The length of the vectors (or the strength of the field) is usually coded in color; alternatively, animation can be used. === Animation === LIC images can be animated by using a kernel that changes over time. Samples at a constant time from the streamline would still be used, but instead of averaging all pixels in a streamline with a static kernel, a ripple-like kernel constructed from a periodic function multiplied by a Hann function acting as a window (in order to prevent artifacts) is used. The periodic function is then shifted along the period to create an animation. === Fast LIC (FLIC) === The computation can be significantly accelerated by re-using parts of already computed field lines, specializing to a box function as convolution kernel k ( s ) {\displaystyle k(s)} and avoiding redundant computations during convolution. The resulting fast LIC method can be generalized to convolution kernels that are arbitrary polynomials. === Oriented Line Integral Convolution (OLIC) === Because LIC does not encode flow orientation, it cannot distinguish between streamlines of equal direction but opposite orientation. Oriented Line Integral Convolution (OLIC) solves this issue by using a ramp-like asymmetric kernel and a low-density noise texture. The kernel asymmetrically modulates the intensity along the streamline, producing a trace that encodes orientation; the low-density of the noise texture prevents smeared traces from overlapping, aiding readability. Fast Rendering of Oriented Line Integral Convolution (FROLIC) is a variation that approximates OLIC by rendering each trace in discrete steps instead of as a continuous smear. === Unsteady Flow LIC (UFLIC) === For time-dependent vector fields (unsteady flow), a variant called Unsteady Flow LIC has been designed that maintains the coherence of the flow animation. An interactive GPU-based implementation of UFLIC has been presented. === Parallel === Since the computation of an LIC image is expensive but inherently parallel, the process has been parallelized and, with availability of GPU-based implementations, interactive on PCs. === Multidimensional === Note that the domain Ω {\displaystyle \Omega } does not have to be a 2D domain: the method is applicable to higher dimensional domains using multidimensional noise fields. However, the visualization of the higher-dimensional LIC texture is problematic; one way is to use interactive exploration with 2D slices that are manually positioned and rotated. The domain Ω {\displaystyle \Omega } does not have to be flat either; the LIC texture can be computed also for arbitrarily shaped 2D surfaces in 3D space. == Applications == This technique has been applied to a wide range of problems since it first was published in 1993, both scientific and creative, including: Representing vector fields: visualization of steady (time-independent) flows (streamlines) visual exploration of 2D autonomous dynamical systems wind mapping water flow mapping Artistic effects for image generation and stylization: pencil drawing (auto
Grokking (machine learning)
In machine learning, grokking, or delayed generalization, is a phenomenon observed in some settings where a model abruptly transitions from overfitting (performing well only on training data) to generalizing (performing well on both training and test data), after many training iterations with little or no improvement on the held-out data. This contrasts with what is typically observed in machine learning, where generalization occurs gradually alongside improved performance on training data. == Origin == Grokking was introduced by OpenAI researcher Alethea Power and colleagues in the January 2022 paper "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets". It is derived from the word grok coined by Robert Heinlein in his novel Stranger in a Strange Land. In ML research, "grokking" is not used as a synonym for "generalization"; rather, it names a sometimes-observed delayed‑generalization training phenomenon in which training and held‑out performance do not improve in tandem, and in which held‑out performance rises abruptly later. Authors also analyze the "grokking time", the epoch or step at which this transition occurs in those scenarios. == Interpretations == Grokking can be understood as a phase transition during the training process. In particular, recent work has shown that grokking may be due to a complexity phase transition in the model during training. While grokking has been thought of as largely a phenomenon of relatively shallow models, grokking has been observed in deep neural networks and non-neural models and is the subject of active research. One potential explanation is that the weight decay (a component of the loss function that penalizes higher values of the neural network parameters, also called regularization) slightly favors the general solution that involves lower weight values, but that is also harder to find. According to Neel Nanda, the process of learning the general solution may be gradual, even though the transition to the general solution occurs more suddenly later. Recent theories have hypothesized that grokking occurs when neural networks transition from a "lazy training" regime where the weights do not deviate far from initialization, to a "rich" regime where weights abruptly begin to move in task-relevant directions. Follow-up empirical and theoretical work has accumulated evidence in support of this perspective, and it offers a unifying view of earlier work as the transition from lazy to rich training dynamics is known to arise from properties of adaptive optimizers, weight decay, initial parameter weight norm, and more. This perspective is complementary to a unifying "pattern learning speeds" framework that links grokking and double descent; within this view, delayed generalization can arise across training time ("epoch‑wise") or across model size ("model‑wise"), and the authors report "model‑wise grokking".
AI content watermarking
AI content watermarking is the process of embedding imperceptible yet detectable signals into content generated by artificial intelligence systems, such as text, images, audio, or video. The technique allows the content to be traced and identified as machine-generated without compromising its quality for the end user. AI watermarking has emerged as a key approach to address growing concerns about misinformation, deepfakes, copyright infringement, and the traceability of synthetic content in the context of the rapid development of generative artificial intelligence. Unlike traditional visible watermarks used in photography, AI content watermarks are typically invisible to humans and can only be detected and deciphered algorithmically. The concept is distinct from the watermarking of AI models themselves (to prevent model theft) and from the watermarking of training data (to combat unauthorized data use). Modern AI watermarking schemes are typically formalized as a pair of algorithms, an embedding (or generation) algorithm and a detection algorithm, sharing a secret key, whose performance is evaluated along three competing axes: quality (the watermark must not noticeably degrade outputs), detectability (the watermark must be statistically distinguishable from unwatermarked content), and robustness (the watermark must persist under adversarial or incidental modifications). == Background == Digital watermarking has been used for decades to protect physical and digital media, from paper currency to photographs. Classical schemes typically embedded a fixed bit-string into a fixed cover signal, with robustness criteria defined against a small fixed set of distortions such as JPEG compression or additive Gaussian noise. The rapid advancement of generative AI in the early 2020s, however, created a new and qualitatively different demand: rather than protecting a single artifact, watermarks for AI content must be embedded automatically across an open-ended distribution of generated outputs while remaining robust to a much wider class of adversarial transformations, including paraphrasing, image regeneration via diffusion models, and re-recording. Large image generation models such as DALL-E, Stable Diffusion, and Midjourney, along with large language models like ChatGPT, made it possible to produce highly realistic synthetic text, images, audio, and video at scale, raising significant ethical and security concerns. In July 2023, the Biden administration secured voluntary commitments from leading AI companies, including OpenAI, Alphabet, Meta, and Amazon, to develop watermarking and other provenance technologies to help users identify AI-generated content. == Formal definitions and design goals == Most modern AI watermarking schemes can be formalized as a pair of algorithms ( W m , D e t e c t ) {\displaystyle ({\mathsf {Wm}},{\mathsf {Detect}})} parameterized by a secret key k {\displaystyle k} . The embedding algorithm W m {\displaystyle {\mathsf {Wm}}} takes a generative model M {\displaystyle M} (and optionally a prompt) and returns a watermarked output x {\displaystyle x} ; the detection algorithm D e t e c t ( x , k ) {\displaystyle {\mathsf {Detect}}(x,k)} outputs a real-valued score (typically a p-value or log-likelihood ratio) used to decide whether x {\displaystyle x} was produced by the watermarked generator. The literature evaluates such schemes along several largely conflicting criteria: Criteria for evaluation include imperceptibility or quality preservation, measured for text via perplexity and human preference judgments, and for images and audio via metrics such as PSNR, SSIM, LPIPS, or PESQ. Detectability is typically expressed as the true positive rate at a fixed false positive rate (e.g. 1% or 10^-6), or as the number of tokens or pixels needed to reach a given confidence level. Robustness refers to the requirement that the watermark should survive expected modifications like JPEG or MP3 compression, cropping, noise, paraphrasing, or machine translation. Distortion-freeness is a stronger property requiring that the marginal distribution of any single watermarked output be statistically identical to the unwatermarked model's distribution. Schemes due to Aaronson, Christ et al., and Kuditipudi et al. are distortion-free in this sense, while the original Kirchenbauer et al. scheme is not. Forgery resistance or unforgeability means an adversary without the secret key should be unable to produce content that passes detection. == Techniques == AI watermarking techniques vary significantly depending on the type of content being watermarked. At its core, the process involves two main stages: embedding (or encoding) the watermark, and detection. There are two primary methods for embedding: watermarking during content generation, which requires access to the AI model itself but is generally more robust, and post-generation watermarking, which can be applied to content from any source, including closed-source models. Watermarks can be broadly classified as visible, including overt marks such as logos or text overlays, or imperceptible, which are detectable only by algorithms. They can also be classified by durability: robust watermarks are designed to withstand common transformations such as compression, cropping, and re-encoding, while fragile watermarks are easily destroyed by any alteration, making them useful for tamper detection. A further axis distinguishes zero-bit watermarks, which only signal "this content was generated by model M," from multi-bit watermarks, which embed an arbitrary payload (such as a user identifier) that can be recovered at detection time. === Text === Text watermarking is considered one of the most challenging modalities because natural language offers relatively limited redundancy compared to images or audio. Modern approaches for large language models alter the autoregressive sampling process so that some statistical signature is left in the choice of tokens, while leaving the surface form of the text unchanged. The literature distinguishes three main families of generation-time text watermarks. Logit-biasing schemes (e.g. KGW) add a fixed bias δ {\displaystyle \delta } to a pseudorandomly selected subset of vocabulary logits before softmax sampling. Reweighting or sampling-based schemes (e.g. SynthID-Text) compose multiple pseudorandom tournaments over the model's full distribution. Distortion-free schemes based on the Gumbel-max trick or inverse transform sampling (Aaronson 2022; Kuditipudi et al. 2023; Christ et al. 2024) preserve the marginal output distribution of the model. ==== KGW: token-probability shifting ==== The pioneering "green list / red list" scheme of Kirchenbauer et al. (KGW), introduced at ICML 2023, is the foundation for most subsequent text watermarks. At each decoding step t {\displaystyle t} , a pseudorandom function (PRF) keyed by a secret k {\displaystyle k} is applied to a context window of h {\displaystyle h} previous tokens to deterministically partition the vocabulary V {\displaystyle V} of size N {\displaystyle N} into a "green list" G ⊂ V {\displaystyle G\subset V} of size γ N {\displaystyle \gamma N} and its complement, the "red list" R = V ∖ G {\displaystyle R=V\setminus G} , where γ ∈ ( 0 , 1 ) {\displaystyle \gamma \in (0,1)} (typically γ = 1 / 2 {\displaystyle \gamma =1/2} ) is the green fraction. A logits processor then increments every green-list logit by a fixed bias δ > 0 {\displaystyle \delta >0} before softmax: ℓ v ′ = ℓ v + δ ⋅ 1 [ v ∈ G ] {\displaystyle \ell '_{v}=\ell _{v}+\delta \cdot \mathbf {1} [v\in G]} so that, after sampling, green tokens are over-represented but generation is not constrained to green tokens alone; high-entropy positions tolerate the bias gracefully, while low-entropy positions (where one token dominates the logits) override the watermark and preserve correctness on factual content. Detection requires only the secret key and the candidate text, not the language model itself. The detector recomputes the partition g ( ⋅ ) {\displaystyle g(\cdot )} for each token, counts the number of green hits | G | hits {\displaystyle |G|_{\text{hits}}} in a sequence of length T {\displaystyle T} , and computes a one-proportion z-test statistic: z = | G | hits − γ T T γ ( 1 − γ ) {\displaystyle z={\frac {|G|_{\text{hits}}-\gamma T}{\sqrt {T\gamma (1-\gamma )}}}} Under the null hypothesis that the text was written by an unwatermarked source (human or another model), the green-hit count is approximately binomially distributed with mean γ T {\displaystyle \gamma T} ; a large positive z {\displaystyle z} rejects the null hypothesis. The original paper reports that fewer than 25 watermarked tokens are sufficient to detect a watermark with a false positive rate below 10^-5 on the OPT-1.3B model. A follow-up study by the same group documented robustness under temperature sampling, top-p (nucleus) sampling, and human paraphrasing, and proposed sliding-window