AI Content Provenance

AI Content Provenance — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Neural scaling law

    Neural scaling law

    In machine learning, a neural scaling law is an empirical scaling law that describes how neural network performance changes as key factors are scaled up or down. These factors typically include the number of parameters, training dataset size, and training cost. Some models also exhibit performance gains by scaling inference through increased test-time compute (TTC), extending neural scaling laws beyond training to the deployment phase. == Introduction == In general, a deep learning model can be characterized by four parameters: model size, training dataset size, training cost, and the post-training error rate (e.g., the test set error rate). Each of these variables can be defined as a real number, usually written as N , D , C , L {\displaystyle N,D,C,L} (respectively: parameter count, dataset size, computing cost, and loss). A neural scaling law is a theoretical or empirical statistical law between these parameters. There are also other parameters with other scaling laws. === Size of the model === In most cases, the model's size is simply the number of parameters. However, one complication arises with the use of sparse models, such as mixture-of-expert models. With sparse models, during inference, only a fraction of their parameters are used. In comparison, most other kinds of neural networks, such as transformer models, always use all their parameters during inference. === Size of the training dataset === The size of the training dataset is usually quantified by the number of data points within it. Larger training datasets are typically preferred, as they provide a richer and more diverse source of information from which the model can learn. This can lead to improved generalization performance when the model is applied to new, unseen data. However, increasing the size of the training dataset also increases the computational resources and time required for model training. With the "pretrain, then finetune" method used for most large language models, there are two kinds of training dataset: the pretraining dataset and the finetuning dataset. Their sizes have different effects on model performance. Generally, the finetuning dataset is less than 1% the size of pretraining dataset. In some cases, a small amount of high quality data suffices for finetuning, and more data does not necessarily improve performance. Many scaling laws, due to their inherent diminishing returns nature, value data based on a submodular set function which was shown in a paper on this topic. === Cost of training === Training cost is typically measured in terms of time (how long it takes to train the model) and computational resources (how much processing power and memory are required). It is important to note that the cost of training can be significantly reduced with efficient training algorithms, optimized software libraries, and parallel computing on specialized hardware such as GPUs or TPUs. The cost of training a neural network model is a function of several factors, including model size, training dataset size, the training algorithm complexity, and the computational resources available. In particular, doubling the training dataset size does not necessarily double the cost of training, because one may train the model for several times over the same dataset (each being an "epoch"). === Performance === The performance of a neural network model is evaluated based on its ability to accurately predict the output given some input data. Common metrics for evaluating model performance include: Negative log-likelihood per token (logarithm of perplexity) for language modeling; Accuracy, precision, recall, and F1 score for classification tasks; Mean squared error (MSE) or mean absolute error (MAE) for regression tasks; Elo rating in a competition against other models, such as gameplay or preference by a human judge. Performance can be improved by using more data, larger models, different training algorithms, regularizing the model to prevent overfitting, and early stopping using a validation set. When the performance is a number bounded within the range of [ 0 , 1 ] {\displaystyle [0,1]} , such as accuracy, precision, etc., it often scales as a sigmoid function of cost, as seen in the figures. == Examples == === (Hestness, Narang, et al, 2017) === The 2017 paper is a common reference point for neural scaling laws fitted by statistical analysis on experimental data. Previous works before the 2000s, as cited in the paper, were either theoretical or orders of magnitude smaller in scale. Whereas previous works generally found the scaling exponent to scale like L ∝ D − α {\displaystyle L\propto D^{-\alpha }} , with α ∈ { 0.5 , 1 , 2 } {\displaystyle \alpha \in \{0.5,1,2\}} , the paper found that α ∈ [ 0.07 , 0.35 ] {\displaystyle \alpha \in [0.07,0.35]} . Of the factors they varied, only task can change the exponent α {\displaystyle \alpha } . Changing the architecture optimizers, regularizers, and loss functions, would only change the proportionality factor, not the exponent. For example, for the same task, one architecture might have L = 1000 D − 0.3 {\displaystyle L=1000D^{-0.3}} while another might have L = 500 D − 0.3 {\displaystyle L=500D^{-0.3}} . They also found that for a given architecture, the number of parameters necessary to reach lowest levels of loss, given a fixed dataset size, grows like N ∝ D β {\displaystyle N\propto D^{\beta }} for another exponent β {\displaystyle \beta } . They studied machine translation with LSTM ( α ∼ 0.13 {\displaystyle \alpha \sim 0.13} ), generative language modelling with LSTM ( α ∈ [ 0.06 , 0.09 ] , β ≈ 0.7 {\displaystyle \alpha \in [0.06,0.09],\beta \approx 0.7} ), ImageNet classification with ResNet ( α ∈ [ 0.3 , 0.5 ] , β ≈ 0.6 {\displaystyle \alpha \in [0.3,0.5],\beta \approx 0.6} ), and speech recognition with two hybrid (LSTMs complemented by either CNNs or an attention decoder) architectures ( α ≈ 0.3 {\displaystyle \alpha \approx 0.3} ). === (Henighan, Kaplan, et al, 2020) === A 2020 analysis studied statistical relations between C , N , D , L {\displaystyle C,N,D,L} over a wide range of values and found similar scaling laws, over the range of N ∈ [ 10 3 , 10 9 ] {\displaystyle N\in [10^{3},10^{9}]} , C ∈ [ 10 12 , 10 21 ] {\displaystyle C\in [10^{12},10^{21}]} , and over multiple modalities (text, video, image, text to image, etc.). In particular, the scaling laws it found are (Table 1 of ): For each modality, they fixed one of the two C , N {\displaystyle C,N} , and varying the other one ( D {\displaystyle D} is varied along using D = C / 6 N {\displaystyle D=C/6N} ), the achievable test loss satisfies L = L 0 + ( x 0 x ) α {\displaystyle L=L_{0}+\left({\frac {x_{0}}{x}}\right)^{\alpha }} where x {\displaystyle x} is the varied variable, and L 0 , x 0 , α {\displaystyle L_{0},x_{0},\alpha } are parameters to be found by statistical fitting. The parameter α {\displaystyle \alpha } is the most important one. When N {\displaystyle N} is the varied variable, α {\displaystyle \alpha } ranges from 0.037 {\displaystyle 0.037} to 0.24 {\displaystyle 0.24} depending on the model modality. This corresponds to the α = 0.34 {\displaystyle \alpha =0.34} from the Chinchilla scaling paper. When C {\displaystyle C} is the varied variable, α {\displaystyle \alpha } ranges from 0.048 {\displaystyle 0.048} to 0.19 {\displaystyle 0.19} depending on the model modality. This corresponds to the β = 0.28 {\displaystyle \beta =0.28} from the Chinchilla scaling paper. Given fixed computing budget, optimal model parameter count is consistently around N o p t ( C ) = ( C 5 × 10 − 12 petaFLOP-day ) 0.7 = 9.0 × 10 − 7 C 0.7 {\displaystyle N_{opt}(C)=\left({\frac {C}{5\times 10^{-12}{\text{petaFLOP-day}}}}\right)^{0.7}=9.0\times 10^{-7}C^{0.7}} The parameter 9.0 × 10 − 7 {\displaystyle 9.0\times 10^{-7}} varies by a factor of up to 10 for different modalities. The exponent parameter 0.7 {\displaystyle 0.7} varies from 0.64 {\displaystyle 0.64} to 0.75 {\displaystyle 0.75} for different modalities. This exponent corresponds to the ≈ 0.5 {\displaystyle \approx 0.5} from the Chinchilla scaling paper. It's "strongly suggested" (but not statistically checked) that D o p t ( C ) ∝ N o p t ( C ) 0.4 ∝ C 0.28 {\displaystyle D_{opt}(C)\propto N_{opt}(C)^{0.4}\propto C^{0.28}} . This exponent corresponds to the ≈ 0.5 {\displaystyle \approx 0.5} from the Chinchilla scaling paper. The scaling law of L = L 0 + ( C 0 / C ) 0.048 {\displaystyle L=L_{0}+(C_{0}/C)^{0.048}} was confirmed during the training of GPT-3 (Figure 3.1 ). === Chinchilla scaling (Hoffmann, et al, 2022) === One particular scaling law ("Chinchilla scaling") states that, for a large language model (LLM) autoregressively trained for one epoch, with a cosine learning rate schedule, we have: { C = C 0 N D L = A N α + B D β + L 0 {\displaystyle {\begin{cases}C=C_{0}ND\\L={\frac {A}{N^{\alpha }}}+{\frac {B}{D^{\beta }}}+L_{0}\end{cases}}} where the variables are C {\displaystyle C} is the cost o

    Read more →
  • Groundswell (book)

    Groundswell (book)

    Groundswell is a book by Forrester Research executives Charlene Li and Josh Bernoff that focuses on how companies can take advantage of emerging social technologies. It was published in 2008 by Harvard Business Press. A revised edition was published in 2011. The book attempts to explain a shift in the relationship between customers and companies, in which companies are no longer able to control customers' attitudes through market research, customer service, and advertising. Instead, customers are controlling the conversation by using new media to communicate about products and companies. == Synopsis == The groundswell is characterized by several tactics that guide companies into using social technologies strategically and effectively. Listening: Businesses should listen to their customers to understand what the market is looking for in their products. In order to do this, a company needs to find out if their customers are using social technologies and how they are using them. Talking: Instead of advertising to customers, marketing departments should find creative ways to connect with users about their experience with a product and their feelings about the brand. One common method is participation in social networks. Energizing: Enthusiastic customers are part of the groundswell, and companies can recognize and appreciate these customers by creating online communities and social platforms where they can connect with the brand and provide reviews. Supporting: Businesses can harness the support of their own employees by creating internal social applications for them to connect with the brand, also known as enterprise social software. == Groundswell in action == === Examples === Some companies distinguish their product through the use of social technologies. Tom Dickson successfully marketed his Blendtec line of blenders through the viral marketing campaign Will It Blend? The groundswell spread marketing messages through Digg and YouTube with a small budget and little marketing experience. Other companies have been able to listen to and talk with the groundswell by building their own online communities. Procter & Gamble created beinggirl.com Archived 2016-04-10 at the Wayback Machine to introduce girls to P&G feminine care products. The community approach worked because the company could reach girls with information that might seem embarrassing or sensitive in a traditional marketing campaign. === Risks === Features of particular industries or companies can make direct customer engagement more difficult. For instance, some companies must work within industry regulations, national or multinational corporations must balance corporate and local engagement, and other companies must find ways to engage with customers on time-sensitive issues. == Reception == Kevin Allison of the Financial Times praised the book for its focus on Web analytics: "[Groundswell] is not so much a manifesto or a dissection of online culture as it is a how-to manual for executives and mid-level managers trying to navigate this fast-changing and often confusing environment." The book won the American Marketing Association Foundation’s Berry-AMA Book Prize for best marketing book of 2009. It was also listed by: Amazon, as one of the Top 10 Business & Investing Books of 2008 CIO Insight, as one of the Top 10 Business-Tech Books of 2008 and one of 10 Insightful Web 2.0 Books Fortune as Magazine as one of the 3 best Web books of 2008 Advertising Age as number 3 of 10 Books You Should Have Read BusinessWeek as one of the Best Innovation & Design Books of 2008 "strategy+business" as one of the Best Business Books 2008 and “Top Shelf” in Marketing

    Read more →
  • European Information Technology Observatory

    European Information Technology Observatory

    The European Information Technology Observatory (EITO) gathers information on European and global markets for information technology, telecommunications and consumer electronics. The EITO is managed by Bitkom Research GmbH, a wholly owned subsidiary of BITKOM, the German Association for Information Technology, Telecommunications and New Media. EITO is sponsored by Deutsche Telekom, KPMG and Telecom Italia. The research activities of the EITO Task Force are supported by the European Commission and the OECD. The EITO exists thanks to an initiative of Enore Deotto from MIlan and the support of Luis-Alberto Petit Herrera (Madrid), Jörg Schomburg (Hanover) and Günther Möller (Frankfurt). Between 1993 and 2007, the market reports were published as printed annual reports ("EITO yearbook"). Since 2008 the market reports are available in electronic version and can be purchased on the EITO online portal. Currently, the ICT market reports are divided in following categories: International Reports International Reports include ICT market information of all EITO countries and all market segments or only specific segments. The newest ICT Market Report 2013/14, published in October 2013, includes market data of 36 countries: 28 European markets, BRIC countries, Japan, Turkey and the US as well as a deep analysis of ICT market developments in 9 European countries. The detailed market data and forecasts are available for the period 2010–2014. Country Reports This category includes EITO reports on a single country's ICT market. The Country ICT Market Reports are published biannually for France, Germany, Italy, Spain and the United Kingdom. Thematic Reports Thematic studies focusing on a specific topic. Customized Reports Market Reports made upon order.

    Read more →
  • The Morning After (web series)

    The Morning After (web series)

    The Morning After is a Hulu original web series that premiered on January 17, 2011, and ended April 24, 2014. It was produced by Hulu and Jace Hall's HDFilms, streaming Monday through Friday. The show originally featured Brian Kimmet and Ginger Gonzaga as hosts. Later shows used a rotation of hosts including Alison Haislip, Dave Holmes, Damien Fahey, Bradley Hasemeyer, Haley Mancini, Paul Nyhart, and Rachel Perry. The series advertises itself as "a smart, daily shot of pop culture to help Hulu users stay up to date" and typically highlights notable moments from television shows and current news in an entertaining fashion. In keeping with its focus on pop culture, The Morning After will sometimes stream an episode featuring past pop culture titled "From the Archives," such as its April Fools' Day episode. == History == While not the first original series to appear exclusively on Hulu, The Morning After is the company's first self-branded production. It was preceded by If I Can Dream, a reality series co-produced with 19 Entertainment and created by Simon Fuller. Hulu originated the idea in house, based on user feedback and observations from discussion boards hosted by the website. The concept was modeled after The Big Show with Olbermann and Patrick. The company sought out a production partner and ultimately chose Jace Hall and his team at HDFilms to executive produce. Initial stream of the series was held on January 17, 2011, and featured coverage of Piers Morgan, the Golden Globes, and The Bachelor. Senior VP of Content and Distribution Andy Forssell made the announcement for the show the same day. The show aired its last episode April 24, 2014. == Format == A typical episode usually begins with a cold open shared by the varying hosts listing the highlights to be covered. The topics focus on TV and Pop Culture Highlights from the previous night, with the intention of helping Hulu users digest hours of content in a matter of moments. The show has the hosts trade humorous remarks regarding the news and each other, taking turns reviewing the night's TV and injecting their own personality. The Morning After was named as an honoree by the Webbys on April 10, 2012, in the variety section of its online video category.

    Read more →
  • Israeli cybersecurity industry

    Israeli cybersecurity industry

    The Israeli cybersecurity industry is a rapidly growing sector within Israel's technology and innovation ecosystem. Israel is internationally recognized as a powerhouse in the cybersecurity domain, with numerous cybersecurity startups, established companies, research institutions, and government initiatives. Tel Aviv itself is being ranked 7th in annual list of best global tech ecosystems, as reported by the Jerusalem Post. == History == The roots of Israel's cybersecurity industry can be traced back to the country's strong focus on national security and intelligence. The establishment of elite military units such as Unit 8200, the Israeli Intelligence Corps unit responsible for signals intelligence and code decryption, played a significant role in the development of cybersecurity expertise in the country. Many former members of Unit 8200 have gone on to establish successful cybersecurity companies or join existing organizations, bringing their unique skill sets and experience to the private sector. == Market overview == As of 2024, Israel housed more than 450 cybersecurity startups and companies. In 2023, the value of exits by Israeli tech companies reached $7.5 billion. Israel's cybersecurity industry is characterized by a high concentration of startups develop new technologies in areas such as network security, endpoint protection, data security, cloud security, and threat intelligence. In recent years, the sector has attracted significant investment from both local and international venture capital firms, as well as major technology companies such as Microsoft, Google, and IBM. Several Israeli cybersecurity companies have gained global recognition and success, with some being acquired by major corporations or conducting successful initial public offerings (IPOs). === Key Israeli cybersecurity companies === Some key Israeli cybersecurity companies include: Check Point Software Technologies CyberArk Cato Networks Radware Wiz === Financial activity === Israel’s cybersecurity sector has seen significant financial activity. As of 2023, mergers and acquisitions in the cybersecurity sector totaled $2.8 billion. In the first quarter of 2024, the sector secured $846 million in private funding. == Background == The military experience helped much. Israel's mandatory military service, combined with the expertise developed within elite units such as Unit 8200, has fostered a strong talent pool with practical experience in cybersecurity. Israel's thriving startup ecosystem, often referred to as the "Startup Nation," has fostered an environment of innovation and collaboration that has contributed to the growth of the cybersecurity industry. Israeli cybersecurity companies often collaborate with international partners, both in the private and public sectors, to share knowledge and develop joint solutions. === Government Initiatives and Support === The government also supported well through various initiatives, such as the Israel National Cyber Directorate (INCD), which works to strengthen cybersecurity defenses and promote the development of the sector. === Academic institutions === Israeli universities and research centers are involved in cybersecurity research and education, contributing to the development of new technologies and training the next generation of cybersecurity professionals. Academic Tech transfer offices in Israel also facilitate the commercialization of cybersecurity technologies. Some academic institutions with cybersecurity laboratories include: Tel Aviv University Technion Ben-Gurion University

    Read more →
  • Ajax (programming)

    Ajax (programming)

    The Asynchronous JavaScript and XML, usually referred to as Ajax (or AJAX, ) is a set of web development techniques that uses various web technologies on the client-side to create asynchronous web applications. With Ajax, web applications can send and retrieve data from a server asynchronously (in the background) without interfering with the display and behaviour of the existing page. By decoupling the data interchange layer from the presentation layer, Ajax allows web pages and, by extension, web applications, to change content dynamically without the need to reload the entire page. In practice, modern implementations commonly utilize JSON instead of XML. Ajax is not a technology, but rather a programming pattern. HTML and CSS can be used in combination to mark up and style information. The webpage can be modified by JavaScript to dynamically display (and allow the user to interact with) the new information. The built-in XMLHttpRequest object is used to execute Ajax on webpages, allowing websites to load content onto the screen without refreshing the page. == History == In the early-to-mid 1990s, most Websites were based on complete HTML pages. Each user action required a complete new page to be loaded from the server. This process was inefficient, as reflected by the user experience: all page content disappeared, then the new page appeared. Each time the browser reloaded a page because of a partial change, all the content had to be re-sent, even though only some of the information had changed. This placed additional load on the server and made bandwidth a limiting factor in performance. The foundations of AJAX originate back in 1996 with the introduction of JavaScript 1. Developers quickly discovered that any HTML element which accepted a "src" attribute could be used to fetch remote data. By changing the src of a hidden frame, a developer could fetch remote data, process or display it without a page refresh. The remote data could be a string, JavaScript code, XML or a partial HTML page generated on the server. The same could be done with and tags, but many developers were alarmed at the concept of an executable GIF and preferred to use the hidden