Neural scaling law

Neural scaling law

In machine learning, a neural scaling law is an empirical scaling law that describes how neural network performance changes as key factors are scaled up or down. These factors typically include the number of parameters, training dataset size, and training cost. Some models also exhibit performance gains by scaling inference through increased test-time compute (TTC), extending neural scaling laws beyond training to the deployment phase. == Introduction == In general, a deep learning model can be characterized by four parameters: model size, training dataset size, training cost, and the post-training error rate (e.g., the test set error rate). Each of these variables can be defined as a real number, usually written as N , D , C , L {\displaystyle N,D,C,L} (respectively: parameter count, dataset size, computing cost, and loss). A neural scaling law is a theoretical or empirical statistical law between these parameters. There are also other parameters with other scaling laws. === Size of the model === In most cases, the model's size is simply the number of parameters. However, one complication arises with the use of sparse models, such as mixture-of-expert models. With sparse models, during inference, only a fraction of their parameters are used. In comparison, most other kinds of neural networks, such as transformer models, always use all their parameters during inference. === Size of the training dataset === The size of the training dataset is usually quantified by the number of data points within it. Larger training datasets are typically preferred, as they provide a richer and more diverse source of information from which the model can learn. This can lead to improved generalization performance when the model is applied to new, unseen data. However, increasing the size of the training dataset also increases the computational resources and time required for model training. With the "pretrain, then finetune" method used for most large language models, there are two kinds of training dataset: the pretraining dataset and the finetuning dataset. Their sizes have different effects on model performance. Generally, the finetuning dataset is less than 1% the size of pretraining dataset. In some cases, a small amount of high quality data suffices for finetuning, and more data does not necessarily improve performance. Many scaling laws, due to their inherent diminishing returns nature, value data based on a submodular set function which was shown in a paper on this topic. === Cost of training === Training cost is typically measured in terms of time (how long it takes to train the model) and computational resources (how much processing power and memory are required). It is important to note that the cost of training can be significantly reduced with efficient training algorithms, optimized software libraries, and parallel computing on specialized hardware such as GPUs or TPUs. The cost of training a neural network model is a function of several factors, including model size, training dataset size, the training algorithm complexity, and the computational resources available. In particular, doubling the training dataset size does not necessarily double the cost of training, because one may train the model for several times over the same dataset (each being an "epoch"). === Performance === The performance of a neural network model is evaluated based on its ability to accurately predict the output given some input data. Common metrics for evaluating model performance include: Negative log-likelihood per token (logarithm of perplexity) for language modeling; Accuracy, precision, recall, and F1 score for classification tasks; Mean squared error (MSE) or mean absolute error (MAE) for regression tasks; Elo rating in a competition against other models, such as gameplay or preference by a human judge. Performance can be improved by using more data, larger models, different training algorithms, regularizing the model to prevent overfitting, and early stopping using a validation set. When the performance is a number bounded within the range of [ 0 , 1 ] {\displaystyle [0,1]} , such as accuracy, precision, etc., it often scales as a sigmoid function of cost, as seen in the figures. == Examples == === (Hestness, Narang, et al, 2017) === The 2017 paper is a common reference point for neural scaling laws fitted by statistical analysis on experimental data. Previous works before the 2000s, as cited in the paper, were either theoretical or orders of magnitude smaller in scale. Whereas previous works generally found the scaling exponent to scale like L ∝ D − α {\displaystyle L\propto D^{-\alpha }} , with α ∈ { 0.5 , 1 , 2 } {\displaystyle \alpha \in \{0.5,1,2\}} , the paper found that α ∈ [ 0.07 , 0.35 ] {\displaystyle \alpha \in [0.07,0.35]} . Of the factors they varied, only task can change the exponent α {\displaystyle \alpha } . Changing the architecture optimizers, regularizers, and loss functions, would only change the proportionality factor, not the exponent. For example, for the same task, one architecture might have L = 1000 D − 0.3 {\displaystyle L=1000D^{-0.3}} while another might have L = 500 D − 0.3 {\displaystyle L=500D^{-0.3}} . They also found that for a given architecture, the number of parameters necessary to reach lowest levels of loss, given a fixed dataset size, grows like N ∝ D β {\displaystyle N\propto D^{\beta }} for another exponent β {\displaystyle \beta } . They studied machine translation with LSTM ( α ∼ 0.13 {\displaystyle \alpha \sim 0.13} ), generative language modelling with LSTM ( α ∈ [ 0.06 , 0.09 ] , β ≈ 0.7 {\displaystyle \alpha \in [0.06,0.09],\beta \approx 0.7} ), ImageNet classification with ResNet ( α ∈ [ 0.3 , 0.5 ] , β ≈ 0.6 {\displaystyle \alpha \in [0.3,0.5],\beta \approx 0.6} ), and speech recognition with two hybrid (LSTMs complemented by either CNNs or an attention decoder) architectures ( α ≈ 0.3 {\displaystyle \alpha \approx 0.3} ). === (Henighan, Kaplan, et al, 2020) === A 2020 analysis studied statistical relations between C , N , D , L {\displaystyle C,N,D,L} over a wide range of values and found similar scaling laws, over the range of N ∈ [ 10 3 , 10 9 ] {\displaystyle N\in [10^{3},10^{9}]} , C ∈ [ 10 12 , 10 21 ] {\displaystyle C\in [10^{12},10^{21}]} , and over multiple modalities (text, video, image, text to image, etc.). In particular, the scaling laws it found are (Table 1 of ): For each modality, they fixed one of the two C , N {\displaystyle C,N} , and varying the other one ( D {\displaystyle D} is varied along using D = C / 6 N {\displaystyle D=C/6N} ), the achievable test loss satisfies L = L 0 + ( x 0 x ) α {\displaystyle L=L_{0}+\left({\frac {x_{0}}{x}}\right)^{\alpha }} where x {\displaystyle x} is the varied variable, and L 0 , x 0 , α {\displaystyle L_{0},x_{0},\alpha } are parameters to be found by statistical fitting. The parameter α {\displaystyle \alpha } is the most important one. When N {\displaystyle N} is the varied variable, α {\displaystyle \alpha } ranges from 0.037 {\displaystyle 0.037} to 0.24 {\displaystyle 0.24} depending on the model modality. This corresponds to the α = 0.34 {\displaystyle \alpha =0.34} from the Chinchilla scaling paper. When C {\displaystyle C} is the varied variable, α {\displaystyle \alpha } ranges from 0.048 {\displaystyle 0.048} to 0.19 {\displaystyle 0.19} depending on the model modality. This corresponds to the β = 0.28 {\displaystyle \beta =0.28} from the Chinchilla scaling paper. Given fixed computing budget, optimal model parameter count is consistently around N o p t ( C ) = ( C 5 × 10 − 12 petaFLOP-day ) 0.7 = 9.0 × 10 − 7 C 0.7 {\displaystyle N_{opt}(C)=\left({\frac {C}{5\times 10^{-12}{\text{petaFLOP-day}}}}\right)^{0.7}=9.0\times 10^{-7}C^{0.7}} The parameter 9.0 × 10 − 7 {\displaystyle 9.0\times 10^{-7}} varies by a factor of up to 10 for different modalities. The exponent parameter 0.7 {\displaystyle 0.7} varies from 0.64 {\displaystyle 0.64} to 0.75 {\displaystyle 0.75} for different modalities. This exponent corresponds to the ≈ 0.5 {\displaystyle \approx 0.5} from the Chinchilla scaling paper. It's "strongly suggested" (but not statistically checked) that D o p t ( C ) ∝ N o p t ( C ) 0.4 ∝ C 0.28 {\displaystyle D_{opt}(C)\propto N_{opt}(C)^{0.4}\propto C^{0.28}} . This exponent corresponds to the ≈ 0.5 {\displaystyle \approx 0.5} from the Chinchilla scaling paper. The scaling law of L = L 0 + ( C 0 / C ) 0.048 {\displaystyle L=L_{0}+(C_{0}/C)^{0.048}} was confirmed during the training of GPT-3 (Figure 3.1 ). === Chinchilla scaling (Hoffmann, et al, 2022) === One particular scaling law ("Chinchilla scaling") states that, for a large language model (LLM) autoregressively trained for one epoch, with a cosine learning rate schedule, we have: { C = C 0 N D L = A N α + B D β + L 0 {\displaystyle {\begin{cases}C=C_{0}ND\\L={\frac {A}{N^{\alpha }}}+{\frac {B}{D^{\beta }}}+L_{0}\end{cases}}} where the variables are C {\displaystyle C} is the cost o

Local-first software

Local-first software is a software engineering approach in which an application stores its data primarily on the user's own device rather than on remote servers. Users can read and write data without an Internet connection, and changes are synchronized across devices in the background when connectivity is available. The approach differs from conventional cloud-based applications, where the server holds the authoritative copy of user data and the client acts as a thin client. The term was coined in a 2019 paper published by researchers at Ink & Switch, an independent research lab, and presented at the Onward! conference at ACM SIGPLAN. The paper, sometimes referred to as a manifesto, was authored by Martin Kleppmann, Adam Wiggins, Peter van Hardenberg, and Mark McGranaghan. == Background == Before the widespread adoption of Internet-connected software in the 2000s, most desktop applications stored data as files on the user's local disk. Users had direct access to their files and could copy, back up, or delete them at will. The rise of software as a service (SaaS) and cloud-based applications like Google Docs shifted data storage to centralized servers. While cloud applications made real-time collaboration across devices straightforward, they introduced a dependency on the service provider: if the provider discontinued the service or experienced an outage, users could lose access to their data. A related concept, "offline-first," emerged in the early 2010s and focused on making web applications resilient to network interruptions. The local-first approach built on these earlier efforts while placing greater emphasis on long-term data ownership and end-to-end encryption. == Origins == === Ink & Switch manifesto === Ink & Switch is an industrial research lab co-founded by Adam Wiggins, who had earlier co-founded Heroku. Martin Kleppmann, an associate professor in the Department of Computer Science and Technology at the University of Cambridge, was a co-author of the 2019 paper. The manifesto proposed seven "ideals" for local-first software: Fast — Operations respond without network round-trips. Multi-device — Data synchronizes across a user's devices. Offline — Users can read and write data without a network connection. Collaboration — Multiple users can work on the same data concurrently. Longevity — Data remains accessible even if the software vendor ceases operation. Privacy — End-to-end encryption protects user data. User control — The vendor cannot restrict how users access or use their data. The paper surveyed existing approaches to data storage and collaboration — ranging from email attachments and Dropbox-style file synchronization to web applications and mobile backends — and argued that none of them satisfied all seven ideals simultaneously. === Role of CRDTs === The manifesto identified conflict-free replicated data types (CRDTs) as a promising technical foundation for local-first applications. CRDTs are data structures that allow multiple replicas to be edited independently and then merged without conflicts, a property first formalized in research by Marc Shapiro and colleagues around 2011. Kleppmann and collaborators at Ink & Switch developed Automerge, an open-source CRDT library for JSON documents, to make these algorithms available to application developers. == Adoption and community == Developer interest in the local-first approach grew after the 2019 paper spread on Hacker News and at developer conferences In August 2023, Wired published a feature article on the movement, describing it as an effort to reduce reliance on large cloud providers. The first Local-First Conf took place on 30 May 2024 in Berlin, with talks by Kleppmann and developers from companies including Linear and Anytype. The community has continued to expand, with regular "LoFi" meetups, a podcast (localfirst.fm), and a third edition of the conference planned for Berlin in July 2026. == Criticisms and limitations == Developers and commentators have pointed out practical difficulties with the local-first approach. Synchronizing data between multiple devices that may be offline for extended periods introduces complexity that cloud-based architectures avoid. Conflict resolution, even with CRDTs, can produce results that are technically consistent but semantically unexpected to users. Schema migrations across thousands of client devices running different application versions pose another difficulty that does not arise with server-side databases. Web browsers impose storage limits and may evict locally stored data. Safari, for instance, has been reported to clear IndexedDB data after seven days of inactivity on a given site, which undermines the assumption that local data is persistent. There is also disagreement within the local-first community about whether a fully decentralized architecture is required. The original manifesto described decentralization as the "logical end goal," but a number of products that identify as local-first still depend on centralized servers for authentication, backup, or synchronization. In a talk at Local-First Conf 2024, Kleppmann said the seven ideals are better understood as a "gradient" rather than a strict checklist.

PropBank

PropBank is a corpus that is annotated with verbal propositions and their arguments—a "proposition bank". Although "PropBank" refers to a specific corpus produced by Martha Palmer et al., the term propbank is also coming to be used as a common noun referring to any corpus that has been annotated with propositions and their arguments. The PropBank project has played a role in research in natural language processing, and has been used in semantic role labelling. == Comparison == PropBank differs from FrameNet, the resource to which it is most frequently compared, in several ways. PropBank is a verb-oriented resource, while FrameNet is centered on the more abstract notion of frames, which generalizes descriptions across similar verbs (e.g. "describe" and "characterize") as well as nouns and other words (e.g. "description"). PropBank does not annotate events or states of affairs described using nouns. PropBank commits to annotating all verbs in a corpus, whereas the FrameNet project chooses sets of example sentences from a large corpus and only in a few cases has annotated longer continuous stretches of text. PropBank-style annotations often remain close to the syntactic level, while FrameNet-style annotations are sometimes more semantically motivated. From the start, PropBank was developed with the idea of serving as training data for machine learning-based semantic role labeling systems in mind. It requires that all arguments to a verb be syntactic constituents and different senses of a word are only distinguished if the differences bear on the arguments. Due to such differences, semantic role labeling with respect to PropBank is often a somewhat easier task than producing FrameNet-style annotations.

Cleverbot

Cleverbot is a chatterbot web application. It was created by British AI scientist Rollo Carpenter and launched in October 2008. It was preceded by Jabberwacky, a chatbot project that began in 1988 and went online in 1997. In its first decade, Cleverbot held several thousand conversations with Carpenter and his associates. Since launching on the web, the number of conversations held has exceeded 150 million. Besides the web application, Cleverbot is also available as an iOS, Android, and Windows Phone app. == Operation == Cleverbot's responses are not pre-programmed because it learns from human input: Humans type into the box below the Cleverbot logo and the system finds all keywords or an exact phrase matching the input. After searching through its saved conversations, it responds to the input by finding how a human responded to that input when it was asked, in part or in full, by Cleverbot. Cleverbot participated in a formal Turing test at the 2011 Techniche festival at the Indian Institute of Technology Guwahati on 3 September 2011. Out of the 1334 votes cast, Cleverbot was judged to be 59.3% human, compared to the rating of 63.3% human achieved by human participants. A score of 50.05% or higher is often considered to be a passing grade. The software running for the event had to handle just 1 or 2 simultaneous requests, whereas online Cleverbot is usually talking to around 10,000 to 50,000 people at once. == Developments == Cleverbot is constantly growing in data size at the rate of 4 to 7 million interactions per day. Updates to the software have been mostly behind the scenes. In 2014, Cleverbot was upgraded to use GPU serving techniques. Unlike Eliza, the program does not respond in a fixed way, instead choosing its responses heuristically using fuzzy logic, the whole of the conversation being compared to the millions that have taken place before. Cleverbot now uses over 279 million interactions, about 3-4% of the data it has already accumulated. The developers of Cleverbot are attempting to build a new version using machine learning techniques. An app that uses the Cleverscript engine to play a game of 20 Questions has been launched under the name Clevernator. Unlike other such games, the player asks the questions and it is the role of the AI to understand, and answer factually. An app that allows owners to create and talk to their own small Cleverbot-like AI has been launched, called Cleverme! for Apple products. == In popular culture == Cleverbot received media attention after being featured in the popular 2010 creepypasta ARG web serial Ben Drowned by Alexander D. Hall. In early 2017, a Twitch stream of two Google Home devices modified to talk to each other using Cleverbot garnered over 700,000 visitors and over 30,000 peak concurrent viewers.

Video imprint (computer vision)

Proposed as an extension of image epitomes in the field of video content analysis, video imprint is obtained by recasting video contents into a fixed-sized tensor representation regardless of video resolution or duration. Specifically, statistical characteristics are retained to some degrees so that common video recognition tasks can be carried out directly on such imprints, e.g., event retrieval, temporal action localization. It is claimed that both spatio-temporal interdependences are accounted for and redundancies are mitigated during the computation of video imprints. The option of computing video imprints exploiting the epitome model has the advantage of more flexible input feature formats and more efficient training stage for video content analysis.

List of Java software and tools

This is a list of software and programming tools for the Java programming language, which includes frameworks, libraries, IDEs, build tools, application servers, and related projects. == Java frameworks == == Libraries == Apache Ant – build automation tool Apache Batik – SVG processing Apache Cayenne – object-relational mapping Apache Xerces – collection of software libraries for parsing, validating, serializing and manipulating XML. Applet – applet API Ardor3D – 3D graphics engine Bonita BPM – workflow engine Cassowary – constraint solving Checkstyle – static code analysis GNU Classpath – standard library implementation Colt – scientific computing and technical computing Commons Daemon – manages applications as daemons DESMO-J – discrete event simulation Diagrams.net – diagramming Disruptor – high-performance messaging Dom4j – XML processing Dynamic Languages Toolkit – support for dynamic programming languages on the JVM Echo – GUI Flying Saucer – XHTML/CSS rendering Formatting Objects Processor – XSL-FO to PDF H2 Database Engine – relational database IAIK-JCE – cryptography Internet Foundation Classes – legacy GUI JavaBeans – reusable component architecture for enabling encapsulation, events, and properties for software components JavaCC – open-source parser generator and lexical analyzer Java Class Library – standard library of Java and other JVM languages Java Native Access – provides Java programs easy access to native shared libraries without using the Java Native Interface Javolution – real-time computing Jblas – linear algebra JDBCFacade – simplifies JDBC use JExcel – Excel API JFugue – music programming JMusic – music programming Joget Workflow – workflow engine JOOQ Object Oriented Querying – fluent API for SQL JPOS – financial messaging JUNG – open-source graph modeling and visualization LanguageWare – language processing LibGDX – game development Modular Audio Recognition Framework – collection of voice, sound, speech, text and natural language processing algorithms. ASM – bytecode manipulation Open Inventor – 3D graphics OpenPDF – PDF Parallel Colt – parallel computing Parboiled – parser PlayN – game development QOCA – constraint solving QtJambi – Qt bindings SLF4J – logging StableUpdate – update management SWT – GUI SuanShu – numerical computing SwingLabs – GUI extensions UBY – natural language processing Undecimber – calendar XDoclet – attribute-oriented programming XINS – XML network services XStream – object serialization == Machine learning and AI == Apache Mahout – scalable machine learning library focused on clustering, classification, and collaborative filtering Apache MXNet – deep learning framework with Java API support Apache OpenNLP – machine learning based toolkit for natural language processing of text Deeplearning4j – distributed deep learning library Deep Java Library – open-source deep learning framework developed by Amazon Web Services Encog – framework for neural networks, genetic algorithms, Hidden Markov model, and Bayesian networks. LIBSVM – Support Vector Machine implementation Mallet – machine learning toolkit for classification, clustering, and topic modeling. MLlib – distributed machine-learning framework on top of Apache Spark Core Neuroph – lightweight neural network framework Weka – collection of machine learning algorithms for data mining Yooreeka – machine learning == Data mining == Java Data Mining (JDM) – standard Java API for data mining Massive Online Analysis (MOA) – data stream mining with concept drift == Math and scientific libraries == Apache Commons Math – general-purpose mathematics library including statistics, linear algebra, and optimization. Colt – high-performance scientific computing, including linear algebra and random numbers. Efficient Java Matrix Library (EJML) – dense and sparse matrix computations and linear algebra Easy Java Simulations – Open Source Physics project designed to create discrete computer simulations Exp4j – evaluates mathematical expressions at runtime GroovyLab – numerical computational environment Hipparchus – fork of Apache Commons Math with updated algorithms for statistics, linear algebra, and optimization. JAMA – numerical linear algebra library Jblas: Linear Algebra for Java (Jblas) – linear algebra library using native BLAS/LAPACK bindings Java Astrodynamics Toolkit – numerical library of software components for use in spaceflight applications for Java or MATLAB Matrix Toolkit Java (MTJ) – linear algebra library with BLAS and LAPACK support OjAlgo – optimization, linear algebra, and financial calculations. OptimJ – extension for mathematical optimization and constraint programming Parallel Colt – A parallel extension of Colt SuanShu – numerical analysis, linear algebra, statistics, and optimization. == Integrated development environments == See also: Java IDEs on Wikibooks Android Studio – IDE for Google's Android operating system BlueJ – educational IDE for teaching Java DrJava – lightweight Java IDE for beginners Eclipse IDE – open-source IDE with extensive plugin ecosystem Greenfoot – educational IDE IntelliJ IDEA – commercial and community editions from JetBrains JDeveloper – freeware IDE supplied by Oracle Corporation jGRASP – software visualizations MyEclipse – Java EE IDE NetBeans IDE – Apache NetBeans Visual Studio Code – general-purpose editor with Java extensions === Online IDEs === Eclipse Che GitHub Codespaces JDoodle Replit == Text editors with Java support == == Build tools and package managers == Apache Ant – automating software build Apache Ivy – subproject of Apache Ant Apache Maven – build automation and dependency management Boot – build automation for Clojure CMake – build tool with limited support for java Gradle – modern build automation tool Go continuous delivery (GoCD) – continuous delivery and build automation server Jenkins – automation server continuous delivery JitPack – package repository for Git projects Leiningen – build automation for Clojure Simple build tool (sbt) – open-source build tool Spring Roo – rapid application development of Java-based enterprise software WaveMaker – low-code development platform == Java runtimes, compilers and virtual machines == Android Runtime – runtime environment javac – Java programming language compiler Java Virtual Machine (JVM) – virtual machine that executes Java bytecode JD Decompiler JEB decompiler – disassembler and decompiler software for Android applications GraalVM – Just-in-time compilation HotSpot – JVM implementation included in OpenJDK == JVM languages and dialects == Clojure – Lisp dialect Groovy JRuby – Ruby implementation Jython – Python implementation Kotlin – popular for Android app development Renjin – R implementation Scala == Application servers and containers == Apache Geronimo – open source application server Apache MINA – event-driven asynchronous network application framework Apache Tomcat – web container and web server Apache TomEE – Apache Tomcat with Java EE features Borland Enterprise Server – discontinued application server by Borland ColdFusion – commercial application server by Adobe Systems GlassFish – application server for Jakarta EE IBM WebSphere Application Server – enterprise application server by IBM IBM WebSphere Application Server Community Edition – open source edition of WebSphere (discontinued) JBoss Enterprise Application Platform – Red Hat's supported distribution of JBoss/WildFly JEUS – commercial Java EE application server from TmaxSoft Jetty – HTTP server and web container Lucee (formerly Railo) – open source CFML application server Netty – non-blocking I/O client–server framework for network applications Oracle Containers for J2EE – discontinued application server by Oracle Oracle WebLogic Server – enterprise application server by Oracle Orion Application Server – early commercial Java EE server by IronFlare Payara Server – fork of GlassFish for production use Resin – Java application server by Caucho (open source and professional editions) SAP NetWeaver Application Server – enterprise application server by SAP WildFly – application server == Debugging and profiling tools == jdb – Java debugger bundled with the JDK JConsole – JMX-compliant monitoring tool JDK Flight Recorder – method profiling, allocation profiling, and garbage collection related events. JProfiler – commercial Java profiler VisualVM – visual tool integrating commandline JDK tools for profiling and monitoring == Testing and quality assurance == Apache JMeter – load testing tool JaCoCo – Java code coverage library JArchitect – analyzes code quality, architecture, and dependencies. Jtest – software testing and static analysis JUnit – unit testing framework Mockito – open-source testing framework for Java PMD – static program analysis source code analyzer Selenium – browser automation for web app testing Spock – test framework SpotBugs (formerly FindBugs) – static analysis tool TestNG – testing framework inspired by JUnit and NUnit == Other == Apache XMLBeans –

DreamLab

DreamLab was a volunteer computing Android and iOS app launched in 2015 by Imperial College London and the Vodafone Foundation. It was discontinued on 2nd April 2025. == Description == The app helped to research cancer, COVID-19, new drugs and tropical cyclones. To do this, DreamLab accessed part of the device's processing power, with the user's consent, while the owner charged their smartphone, to speed up the calculations of the algorithms from Imperial College London. The aim of the tropical cyclone project was to prepare for climate change risks. Other projects aimed to find existing drugs and food molecules that could help people with COVID-19 and other diseases. The performance of 100,000 smartphones would reach the annual output of all research computers at Imperial College in just three months, with a nightly runtime of six hours. The app was developed in 2015 by the Garvan Institute of Medical Research in Sydney and the Vodafone Foundation. In May 2020, the project had over 490,000 registered users.