DreamBooth is a deep learning generation model used to personalize existing text-to-image models by fine-tuning. It was developed by researchers from Google Research and Boston University in 2022. Originally developed using Google's own Imagen text-to-image model, DreamBooth implementations can be applied to other text-to-image models, where it can allow the model to generate more fine-tuned and personalized outputs after training on three to five images of a subject. == Technology == Pretrained text-to-image diffusion models, while often capable of offering a diverse range of different image output types, lack the specificity required to generate images of lesser-known subjects, and are limited in their ability to render known subjects in different situations and contexts. The methodology used to run implementations of DreamBooth involves the fine-tuning the full UNet component of the diffusion model using a few images (usually 3--5) depicting a specific subject. Images are paired with text prompts that contain the name of the class the subject belongs to, plus a unique identifier. As an example, a photograph of a [Nissan R34 GTR] car, with car being the class); a class-specific prior preservation loss is applied to encourage the model to generate diverse instances of the subject based on what the model is already trained on for the original class. Pairs of low-resolution and high-resolution images taken from the set of input images are used to fine-tune the super-resolution components, allowing the minute details of the subject to be maintained. == Usage == DreamBooth can be used to fine-tune models such as Stable Diffusion, where it may alleviate a common shortcoming of Stable Diffusion not being able to adequately generate images of specific individual people. Such a use case is quite VRAM intensive, however, and thus cost-prohibitive for hobbyist users. The Stable Diffusion adaptation of DreamBooth in particular is released as a free and open-source project based on the technology outlined by the original paper published by Ruiz et. al. in 2022. Concerns have been raised regarding the ability for bad actors to utilise DreamBooth to generate misleading images for malicious purposes, and that its open-source nature allows anyone to utilise or even make improvements to the technology. In addition, artists have expressed their apprehension regarding the ethics of using DreamBooth to train model checkpoints that are specifically aimed at imitating specific art styles associated with human artists; one such critic is Hollie Mengert, an illustrator for Disney and Penguin Random House who has had her art style trained into a checkpoint model via DreamBooth and shared online, without her consent.
Intel Management Engine
The Intel Management Engine (ME), also known as the Intel Manageability Engine, is an autonomous subsystem that has been incorporated in virtually all of Intel's processor chipsets since 2008. It is located in the Platform Controller Hub of modern Intel motherboards. The Intel Management Engine always runs as long as the motherboard is receiving power, even when the computer is turned off. This issue can be mitigated with the deployment of a hardware device which is able to disconnect all connections to mains power as well as all internal forms of energy storage. The Electronic Frontier Foundation and some security researchers have voiced concern that the Management Engine is a backdoor. Intel's main competitor, AMD, has incorporated the equivalent AMD Secure Technology (formally called Platform Security Processor) in virtually all of its post-2013 CPUs. == Difference from Intel AMT == The Management Engine is often confused with Intel AMT (Intel Active Management Technology). AMT runs on the ME, but is only available on processors with vPro. AMT gives device owners remote administration of their computer, such as powering it on or off, and reinstalling the operating system. However, the ME itself has been built into all Intel chipsets since 2008, not only those with AMT. While AMT can be unprovisioned by the owner, there is no official, documented way to disable the ME. == Design == The subsystem primarily consists of proprietary firmware running on a separate microprocessor that performs tasks during boot-up, while the computer is running, and while it is asleep. As long as the chipset or SoC is supplied with power (via battery or power supply), it continues to run even when the system is turned off. Intel claims the ME is required to provide full performance. Its exact workings are largely undocumented and its code is obfuscated using confidential Huffman tables stored directly in hardware, so the firmware does not contain the information necessary to decode its contents. === Hardware === Starting with ME 11 (introduced in Skylake CPUs), it is based on the Intel Quark x86-based 32-bit CPU and runs the MINIX 3 operating system. The ME firmware is stored in a partition of the SPI BIOS Flash, using the Embedded Flash File System (EFFS). Previous versions were based on an ARC core, with the Management Engine running the ThreadX RTOS. Versions 1.x to 5.x of the ME used the ARCTangent-A4 (32-bit only instructions) whereas versions 6.x to 8.x used the newer ARCompact (mixed 32- and 16-bit instruction set architecture). Starting with ME 7.1, the ARC processor could also execute signed Java applets. The ME has its own MAC and IP address for the out-of-band management interface, with direct access to the Ethernet controller; one portion of the Ethernet traffic is diverted to the ME even before reaching the host's operating system, for what support exists in various Ethernet controllers, exported and made configurable via Management Component Transport Protocol (MCTP). The ME also communicates with the host via PCI interface. Under Linux, communication between the host and the ME is done via /dev/mei or /dev/mei0. Until the release of Nehalem processors, the ME was usually embedded into the motherboard's northbridge, following the Memory Controller Hub (MCH) layout. With the newer Intel architectures (Intel 5 Series onwards), the ME is integrated into the Platform Controller Hub (PCH). === Firmware === By Intel's current terminology as of 2017, ME is one of several firmware sets for the Converged Security and Manageability Engine (CSME). Prior to AMT version 11, CSME was called Intel Management Engine BIOS Extension (Intel MEBx). Management Engine (ME) – mainstream chipsets Server Platform Services (SPS) – server chipsets and SoCs Trusted Execution Engine (TXE) – tablet/embedded/low power It was also found that the ME firmware version 11 runs MINIX 3. Management of the ME modules for provisioning inside the UEFI is done via a tool called Intel Flash Image Tool (FITC). ==== Modules ==== Active Management Technology (AMT) Intel Boot Guard (IBG) and Secure Boot Quiet System Technology (QST), formerly known as Advanced Fan Speed Control (AFSC), which provides support for acoustically optimized fan speed control, and monitoring of temperature, voltage, current and fan speed sensors that are provided in the chipset, CPU and other devices present on the motherboard. Communication with the QST firmware subsystem is documented and available through the official software development kit (SDK). Protected Audio Video Path, enforces HDCP Intel Anti-Theft Technology (AT), discontinued in 2015 Serial over LAN (SOL) Intel Platform Trust Technology (PTT), a firmware-based Trusted Platform Module (TPM) Near Field Communication, a middleware for NFC readers and vendors to access NFC cards and provide secure element access, found in later MEI versions. == The intricacies of working with Intel ME == It should also be noted that the ME region requires special cleaning and subsequent initialisation, for example, after replacing the platform hub on the motherboard. Usually, this requires an SPI programmer. There are known successful cases of this operation being performed. == Security vulnerabilities == Several weaknesses have been found in the ME. On May 1, 2017, Intel confirmed a Remote Elevation of Privilege bug (SA-00075) in its Management Technology. Every Intel platform with provisioned Intel Standard Manageability, Active Management Technology, or Small Business Technology, from Nehalem in 2008 to Kaby Lake in 2017 has a remotely exploitable security hole in the ME. Several ways to disable the ME without authorization that could allow ME's functions to be sabotaged have been found. Additional major security flaws in the ME affecting a very large number of computers incorporating ME, Trusted Execution Engine (TXE), and Server Platform Services (SPS) firmware, from Skylake in 2015 to Coffee Lake in 2017, were confirmed by Intel on November 20, 2017 (SA-00086). Unlike SA-00075, this bug is even present if AMT is absent, not provisioned or if the ME was "disabled" by any of the known unofficial methods. In July 2018, another set of vulnerabilities was disclosed (SA-00112). In September 2018, yet another vulnerability was published (SA-00125). === Ring −3 rootkit === A ring −3 rootkit was demonstrated by Invisible Things Lab for the Q35 chipset; it does not work for the later Q45 chipset as Intel implemented additional protections. The exploit worked by remapping the normally protected memory region (top 16 MB of RAM) reserved for the ME. The ME rootkit could be installed regardless of whether the AMT is present or enabled on the system, as the chipset always contains the ARC ME coprocessor. (The "−3" designation was chosen because the ME coprocessor works even when the system is in the S3 state. Thus, it was considered a layer below the System Management Mode rootkits.) For the vulnerable Q35 chipset, a keystroke logger ME-based rootkit was demonstrated by Patrick Stewin. === Zero-touch provisioning === Another security evaluation by Vassilios Ververis showed serious weaknesses in the GM45 chipset implementation. In particular, it criticized AMT for transmitting unencrypted passwords in the SMB provisioning mode when the IDE redirection and Serial over LAN features are used. It also found that the "zero touch" provisioning mode (ZTC) is still enabled even when the AMT appears to be disabled in BIOS. For about 60 euros, Ververis purchased from GoDaddy a certificate that is accepted by the ME firmware and allows remote "zero touch" provisioning of (possibly unsuspecting) machines, which broadcast their HELLO packets to would-be configuration servers. === SA-00075 (a.k.a. Silent Bob is Silent) === In May 2017, Intel confirmed that many computers with AMT have had an unpatched critical privilege escalation vulnerability (CVE-2017-5689). The vulnerability was nicknamed "Silent Bob is Silent" by the researchers who had reported it to Intel. It affects numerous laptops, desktops and servers sold by Dell, Fujitsu, Hewlett-Packard (later Hewlett Packard Enterprise and HP Inc.), Intel, Lenovo, and possibly others. Those researchers claimed that the bug affects systems made in 2010 or later. Other reports claimed the bug also affects systems made as long ago as 2008. The vulnerability was described as giving remote attackers: "full control of affected machines, including the ability to read and modify everything. It can be used to install persistent malware (possibly in firmware), and read and modify any data." === PLATINUM === In June 2017, the PLATINUM cybercrime group became notable for exploiting the serial over LAN (SOL) capabilities of AMT to perform data exfiltration of stolen documents. SOL is disabled by default and must be enabled to exploit this vulnerability. === SA-00086 === Some months after the previous bugs, and subsequent warnings from the EFF, securi
Digital signal
A digital signal is a signal that represents data as a sequence of discrete values; at any given time it can only take on, at most, one of a finite number of values. This contrasts with an analog signal, which represents continuous values; at any given time it represents a real number within an infinite set of values. Simple digital signals represent information in discrete bands of levels. All levels within a band of values represent the same information state. In most digital circuits, the signal can have two possible valid values; this is called a binary signal or logic signal. They are represented by two voltage bands: one near a reference value (typically termed as ground or zero volts), and the other a value near the supply voltage. These correspond to the two values zero and one (or false and true) of the Boolean domain, so at any given time a binary signal represents one binary digit (bit). Because of this discretization, relatively small changes to the signal levels do not leave the discrete envelope, and as a result are ignored by signal state sensing circuitry. As a result, digital signals have noise immunity; electronic noise, provided it is not too great, will not affect digital circuits, whereas noise always degrades the operation of analog signals to some degree. Digital signals having more than two states are occasionally used; circuitry using such signals is called multivalued logic. For example, signals that can assume three possible states are called three-valued logic. In a digital signal, the physical quantity representing the information may be a variable electric current or voltage, the intensity, phase or polarization of an optical or other electromagnetic field, acoustic pressure, the magnetization of a magnetic storage media, etcetera. Digital signals are used in all digital electronics, notably computing equipment and data transmission. == Definitions == The term digital signal has related definitions in different contexts. === In digital electronics === In digital electronics, a digital signal is a pulse amplitude modulated signal, i.e., a sequence of fixed-width electrical pulses or light pulses, each occupying one of a discrete number of levels of amplitude. A special case is a logic signal or a binary signal, which varies between a low and a high signal level. The pulse trains in digital circuits are typically generated by metal–oxide–semiconductor field-effect transistor (MOSFET) devices, due to their rapid on–off electronic switching speed and large-scale integration (LSI) capability. In contrast, bipolar junction transistors more slowly generate signals resembling sine waves. === In signal processing === In digital signal processing, a digital signal is a representation of a physical signal that is sampled and quantized. A digital signal is an abstraction that is discrete in time and amplitude. The signal's value only exists at regular time intervals, since only the values of the corresponding physical signal at those sampled moments are significant for further digital processing. The digital signal is a sequence of codes drawn from a finite set of values. The digital signal may be stored, processed or transmitted physically as a pulse-code modulation (PCM) signal. === In communications === In digital communications, a digital signal is a continuous-time physical signal, alternating between a discrete number of waveforms, representing a bitstream. The shape of the waveform depends on the transmission scheme, which may be either a line coding scheme allowing baseband transmission; or a digital modulation scheme, allowing passband transmission over long wires or over a limited radio frequency band. Such a carrier-modulated sine wave is considered a digital signal in literature on digital communications and data transmission, but considered as a bit stream converted to an analog signal in specific cases where the signal will be carried over a system meant for analog communication, such as an analog telephone line. In communications, sources of interference are usually present, and noise is frequently a significant problem. The effects of interference are typically minimized by filtering off interfering signals as much as possible and by using data redundancy. The main advantages of digital signals for communications are often considered to be noise immunity, and the ability, in many cases such as with audio and video data, to use data compression to greatly decrease the bandwidth that is required on the communication media. == Logic voltage levels == A waveform that switches representing the two states of a Boolean value (0 and 1, or low and high, or false and true) is referred to as a digital signal or logic signal or binary signal when it is interpreted in terms of only two possible digits. The two states are usually represented by some measurement of an electrical property: Voltage is the most common, but current is used in some logic families. Two ranges of voltages are typically defined for each logic family, which are frequently not directly adjacent. The signal is low when in the low range and high when in the high range, and in between the two ranges the behavior can vary between different types of gates. The clock signal is a special digital signal that is used to synchronize many digital circuits. The image shown can be considered the waveform of a clock signal. Logic changes are triggered either by the rising edge or the falling edge. The rising edge is the transition from a low voltage (level 1 in the diagram) to a high voltage (level 2). The falling edge is the transition from a high voltage to a low one. Although in a highly simplified and idealized model of a digital circuit, we may wish for these transitions to occur instantaneously, no real-world circuit is purely resistive, and therefore no circuit can instantly change voltage levels. This means that during a short, finite transition time, the output may not properly reflect the input, and will not correspond to either a logically high or low voltage. == Modulation == To create a digital signal, a signal must be modulated with a control signal to produce it. The simplest modulation, a type of unipolar encoding, is simply to switch on and off a DC signal so that high voltages represent a '1' and low voltages are '0'. In digital radio schemes, one or more carrier waves are amplitude, frequency or phase modulated by the control signal to produce a digital signal suitable for transmission. Asymmetric Digital Subscriber Line (ADSL) over telephone wires, does not primarily use binary logic; the digital signals for individual carriers are modulated with different-valued logics, depending on the Shannon capacity of the individual channel. == Clocking == Digital signals may be sampled by a clock signal at regular intervals by passing the signal through a flip-flop. When this is done, the input is measured at the clock edge and the signal from that time. The signal is then held steady until the next clock. This process is the basis of synchronous logic. Asynchronous logic also exists, which uses no single clock, and generally operates more quickly, and may use less power, but is significantly harder to design.
Feature detection (web development)
Feature detection (also feature testing) is a technique used in web development for handling differences between runtime environments (typically web browsers or user agents), by programmatically testing for clues that the environment may or may not offer certain functionality. This information is then used to make the application adapt in some way to suit the environment: to make use of certain APIs, or tailor for a better user experience. Its proponents claim it is more reliable and future-proof than other techniques like user agent sniffing and browser-specific CSS hacks. == Techniques == A feature test can take many forms. It is essentially any snippet of code which gives some level of confidence that a required feature is indeed supported. However, in contrast to other techniques, feature detection usually focuses on performing actions which directly relate to the feature to be detected, rather than heuristics. === JavaScript === JavaScript feature detection can inspect the DOM and the local JavaScript environment to test whether browser features or APIs are supported. The simplest technique is to check for the existence of a relevant object or property. For example, the Geolocation API (used for accessing the device's knowledge of its geographical location, possibly obtained from a GPS navigation device) exposes a geolocation property on the navigator object in the DOM; the presence of which implies the Geolocation API is supported: if ('geolocation' in navigator) { // Geolocation API is supported } For a higher level of confidence, some feature tests will attempt to invoke the feature then look for clues that it behaved properly. For example, a test for support for cookies might attempt to set a value as a cookie and then verify it can be read back. === CSS === In CSS, the at-rule @supports introduced in 2015 allows to test if a given feature is supported. For instance the following code activates the declarations only if the user agent supports display: flex: == Undetectables == Some browser features are considered undetectable, because no clues are known to give sufficient confidence that a feature is supported. These are often because of limited information available to the JavaScript environment in the browser; generally features must be exposed via the DOM in some way in order to be detectable using JavaScript. When undetectables are encountered, it is common to turn to user agent sniffing as an alternative mechanism, or to employ defensive coding to minimise the impact if the feature turns out not to be supported. The Modernizr project maintains a record of known undetectables on their wiki.
Mortimer Rogoff
Mortimer Alan Rogoff (May 2, 1921 – August 1, 2008) was an American inventor, businessman, and author as well as an amateur photographer and radio operator. He is recognized for his work in spread spectrum technology which is the technology that modern cell phones and GPS systems are based on. He is also considered the grandfather of the electronic navigation chart. == Early life == Rogoff was born in Brooklyn, New York. He earned his B.S.E.E. from Rensselaer Polytechnic Institute in 1943 and his M.S.E.E. from Columbia University in 1948. While at Rensselaer he was a member of Kappa Nu fraternity and the Features Editor for the student newspaper. During World War II, he enlisted in the United States Navy and worked on developing radio communication and aerial navigation systems. One of the techniques he developed was undetectable by Axis forces because its power was below that of the background noise and its frequency varied in random ways. This secure transmission was the beginning of spread spectrum technology which would become the basis for GPS and CDMA cellular telephone systems. Although he was never able to patent the technology because it was a military secret he did get some recognition for it almost forty years later when he received the Institute of Electrical and Electronics Engineers’ Pioneer Award in 1981. == Career == Rogoff worked for twenty-two years (1946 to 1968) for ITT Laboratories in New Jersey. In 1958, he became their deputy director of Engineering. He was Vice President of ITT Laboratories from 1962 to 1963. From 1963 to 1968, he was promoted to the corporate staff where he became head of European operations. In 1968 he left ITT to work for the Diebold Group where he became an Executive Vice President. After leaving the Diebold Group he founded several technology and automation businesses, including his own consulting firm, and Teletext Communications Corporation. Later in the 1970s, he was a Principal with Booz Allen Hamilton. In 1979, his book ‘’Calculator Navigation’’ was published. This book demonstrated practical methods for calculating precise ship locations using radio navigation with a consumer calculator. In 1981, he founded a new company, Navigation Sciences Inc., in Bethesda, Maryland. With this company he patented a method for marine navigation that combined radar maps with electronic charts in 1986. This was a major advancement in field. Today, this system is known as the Electronic Chart Display and Information System (ECDIS). Rogoff had seen the need for a new charting system in 1968 from his apartment at 180 East End Avenue in New York City. From there, he saw a boating accident where a life was lost and decided there had to be a way to automate navigation. Rogoff then became of member of the International Maritime Organization’s (IMO) sub-committee on Safety of Navigation, a representative to the International Electrotechnical Commission, and became the chairman of the Radio Technical Commission for Maritime Services Special Committee 109 on Electronic Charts. He was able to use his influence on these boards to push through a proposal of ECDIS standards in 1989 where none has been before. As his friend Giuseppe Carnevali said, “Although nobody could argue against the need for a standard, no one was ready to endorse one; however, nobody was brave enough to oppose it.” A Test Bed project on these proposals was conducted by the United States Coast Guard. The amended standards were accepted by the IMO in November, 1995. In 2000, he was named as a Fellow of the Institute of Navigation. He was also a Fellow of the Institute of Electrical and Electronics Engineers. During this time, he was also president of the Navigational Electronic Charts System Association. == Personal == In 1979, he moved to Washington, D.C. and bought a home in Nantucket, Massachusetts. He married Sheila Zunser in 1943 and they were together for sixty-five years. They had three daughters: Louisa Thompson, Alice Rogoff, and Julia Peach. His sister was sociologist Natalie Rogoff Ramsøy of the University of Oslo. He was a member of the Cosmos Club and President of The Navigational Electronic Chart System Association (NECSA). He was a very good amateur photographer and liked amateur radio (call sign W2EE). He died in Nantucket from bladder cancer. == Patents == Patent number: 4176316 – Secure Communication System – November 27, 1979 With Louis A. DeRosa Patent number: 4590569 – Electronic Navigation System – May 20, 1986 With Peter M. Winkler and John N. Ackley Patent number: RE34004 – Secure Communication System – July 21, 1992 With Louis A. DeRosa == Publications == Rogoff, Mortimer September 1957. Automatic Analysis of Infrared Spectra. Annals of the New York Academy of Sciences; vol. 69: no. 1: 27–37. Gen. P.C. Sandretto and Mortimer Rogoff. 1958 “A Novel Concept for Application to the Control of Airways Traffic.” NAVIGATION: Journal of The Institute of Navigation; vol. 6: no. 2: 102–107 Rogoff, Mortimer 1979. Calculator Navigation; ISBN 0-393-03192-6. Published by W.W. Norton & Company (New York and London). Rogoff, Mortimer December 1985. Electronic Charting. Yachting; vol. 158: no. 6: 54–57. Rogoff, Mortimer Winter 1990. Electronic Charts in the Nineties. NAVIGATION: Journal of The Institute of Navigation; vol. 37: no. 4: 305–318.
Quantification (machine learning)
In machine learning, quantification (variously called learning to quantify, or supervised prevalence estimation, or class prior estimation) is the task of using supervised learning in order to train models (quantifiers) that estimate the relative frequencies (also known as prevalence values) of the classes of interest in a sample of unlabelled data items. For instance, in a sample of 100,000 unlabelled tweets known to express opinions about a certain political candidate, a quantifier may be used to estimate the percentage of these tweets which belong to class `Positive' (i.e., which manifest a positive stance towards this candidate), and to do the same for classes `Neutral' and `Negative'. Quantification may also be viewed as the task of training predictors that estimate a (discrete) probability distribution, i.e., that generate a predicted distribution that approximates the unknown true distribution of the items across the classes of interest. Quantification is different from classification, since the goal of classification is to predict the class labels of individual data items, while the goal of quantification it to predict the class prevalence values of sets of data items. Quantification is also different from regression, since in regression the training data items have real-valued labels, while in quantification the training data items have class labels. It has been shown in multiple research works that performing quantification by classifying all unlabelled instances and then counting the instances that have been attributed to each class (the 'classify and count' method) usually leads to suboptimal quantification accuracy. This suboptimality may be seen as a direct consequence of 'Vapnik's principle', which states: If you possess a restricted amount of information for solving some problem, try to solve the problem directly and never solve a more general problem as an intermediate step. It is possible that the available information is sufficient for a direct solution but is insufficient for solving a more general intermediate problem. In our case, the problem to be solved directly is quantification, while the more general intermediate problem is classification. As a result of the suboptimality of the 'classify and count' method, quantification has evolved as a task in its own right, different (in goals, methods, techniques, and evaluation measures) from classification. == Quantification tasks == === Quantification tasks according to the set of classes === The main variants of quantification, according to the characteristics of the set of classes used, are: Binary quantification, corresponding to the case in which there are only n = 2 {\displaystyle n=2} classes and each data item belongs to exactly one of them; Single-label multiclass quantification, corresponding to the case in which there are n > 2 {\displaystyle n>2} classes and each data item belongs to exactly one of them; Multi-label multiclass quantification, corresponding to the case in which there are n ≥ 2 {\displaystyle n\geq 2} classes and each data item can belong to zero, one, or several classes at the same time; Ordinal quantification, corresponding to the single-label multiclass case in which a total order is defined on the set of classes. Regression quantification, a task which stands to 'standard' quantification as regression stands to classification. Strictly speaking, this task is not a quantification task as defined above (since the individual items do not have class labels but are labelled by real values), but has enough commonalities with other quantification tasks to be considered one of them. Most known quantification methods address the binary case or the single-label multiclass case, and only few of them address the multi-label, ordinal, and regression cases. Binary-only methods include the Mixture Model (MM) method, the HDy method, SVM(KLD), and SVM(Q). Methods that can deal with both the binary case and the single-label multiclass case include probabilistic classify and count (PCC), adjusted classify and count (ACC), probabilistic adjusted classify and count (PACC), the Saerens-Latinne-Decaestecker EM-based method (SLD), and KDEy. Methods for multi-label quantification include regression-based quantification (RQ) and label powerset-based quantification (LPQ). Methods for the ordinal case include ordinal versions of the above-mentioned ACC, PACC, and SLD methods, and ordinal versions of the above-mentioned HDy method. Methods for the regression case include Regress and splice and Adjusted regress and sum. === Quantification tasks according to the type of data === Several subtasks of quantification may be identified according to the type of data involved. Example such tasks are: Quantification of networked data. This task consists of performing quantification when the datapoints are members of a relation, i.e., are interlinked. As such, this task is a strict relative of collective classification. Quantification over time. This task consists of performing quantification on sets that become available in a temporal sequence, i.e., as a data stream, and finds application in contexts in which class prevalence values must be monitored over time. == Evaluation measures for quantification == Several evaluation measures can be used for evaluating the error of a quantification method. Since quantification consists of generating a predicted probability distribution that estimates a true probability distribution, these evaluation measures are ones that compare two probability distributions. Most evaluation measures for quantification belong to the class of divergences. Evaluation measures for binary quantification, single-label multiclass quantification, and multi-label quantification, are Absolute Error Squared Error Relative Absolute Error Kullback–Leibler divergence Pearson Divergence Evaluation measures for ordinal quantification are Normalized Match Distance (a particular case of the Earth Mover's Distance) Root Normalized Order-Aware Distance == Applications == Quantification is of special interest in fields such as the social sciences, epidemiology, market research, allocating resources, and ecological modelling, since these fields are inherently concerned with aggregate data. However, quantification is also useful as a building block for solving other downstream tasks, such as improving the accuracy of classifiers on out-of-distribution data, measuring classifier bias and ranker bias, and estimating the accuracy of classifiers on out-of-distribution data. == Resources == LQ 2021: the 1st International Workshop on Learning to Quantify LQ 2022: the 2nd International Workshop on Learning to Quantify LQ 2023: the 3rd International Workshop on Learning to Quantify LQ 2024: the 4th International Workshop on Learning to Quantify LQ 2025: the 5th International Workshop on Learning to Quantify LeQua 2022: the 1st Data Challenge on Learning to Quantify LeQua 2024: the 2nd Data Challenge on Learning to Quantify QuaPy: An open-source Python-based software library for quantification QuantificationLib: A Python library for quantification and prevalence estimation
Web performance
Web performance refers to the speed in which web pages are downloaded and displayed on the user's web browser. Web performance optimization (WPO), or website optimization is the field of knowledge about increasing web performance. Faster website download speeds have been shown to increase visitor retention and loyalty and user satisfaction, especially for users with slow internet connections and those on mobile devices. Web performance also leads to less data travelling across the web, which in turn lowers a website's power consumption and environmental impact. Some aspects which can affect the speed of page load include browser/server cache, image optimization, and encryption (for example SSL), which can affect the time it takes for pages to render. The performance of the web page can be improved through techniques such as multi-layered cache, light weight design of presentation layer components and asynchronous communication with server side components. == History == In the first decade or so of the web's existence, web performance improvement was focused mainly on optimizing website code and pushing hardware limitations. According to the 2002 book Web Performance Tuning by Patrick Killelea, some of the early techniques used were to use simple servlets or CGI, increase server memory, and look for packet loss and retransmission. Although these principles now comprise much of the optimized foundation of internet applications, they differ from current optimization theory in that there was much less of an attempt to improve the browser display speed. Steve Souders coined the term "web performance optimization" in 2004. At that time Souders made several predictions regarding the impact that WPO as an "emerging industry" would bring to the web, such as websites being fast by default, consolidation, web standards for performance, environmental impacts of optimization, and speed as a differentiator. One major point that Souders made in 2007 is that at least 80% of the time that it takes to download and view a website is controlled by the front-end structure. This lag time can be decreased through awareness of typical browser behavior, as well as of how HTTP works. == Optimization techniques == Web performance optimization improves user experience (UX) when visiting a website and therefore is highly desired by web designers and web developers. They employ several techniques that streamline web optimization tasks to decrease web page load times. This process is known as front end optimization (FEO) or content optimization. FEO concentrates on reducing file sizes and "minimizing the number of requests needed for a given page to load." In addition to the techniques listed below, the use of a content delivery network—a group of proxy servers spread across various locations around the globe—is an efficient delivery system that chooses a server for a specific user based on network proximity. Typically the server with the quickest response time is selected. The following techniques are commonly used web optimization tasks and are widely used by web developers: Web browsers open separate Transmission Control Protocol (TCP) connections for each Hypertext Transfer Protocol (HTTP) request submitted when downloading a web page. These requests total the number of page elements required for download. However, a browser is limited to opening only a certain number of simultaneous connections to a single host. To prevent bottlenecks, the number of individual page elements are reduced using resource consolidation whereby smaller files (such as images) are bundled together into one file. This reduces HTTP requests and the number of "round trips" required to load a web page. Web pages are constructed from code files such JavaScript and Hypertext Markup Language (HTML). As web pages grow in complexity, so do their code files and subsequently their load times. File compression can reduce code files by about 40 percent, thereby improving site responsiveness. Web Caching Optimization reduces server load, bandwidth usage and latency. CDNs use dedicated web caching software to store copies of documents passing through their system. Many website platforms, such as SiteGround, IONOS, Wix, and Hostinger, rely on global CDNs and caching technologies to deliver faster page loads across different geographical regions. Subsequent requests from the cache may be fulfilled should certain conditions apply. Web caches are located on either the client side (forward position) or web-server side (reverse position) of a CDN. Web browsers are also able to store content for re-use through the HTTP cache or web cache. Requests web browsers make are typically routed to the HTTP cache to validate if a cached response may be used to fulfill a request. If such a match is made, the response is fulfilled from the cache. This can be helpful for reducing network latency and costs associated with data-transfer. The HTTP cache is configured using request and response headers. Code minification distinguishes discrepancies between codes written by web developers and how network elements interpret code. Minification removes comments and extra spaces as well as crunch variable names in order to minimize code, decreasing files sizes by as much as 60%. In addition to caching and compression, lossy compression techniques (similar to those used with audio files) remove non-essential header information and lower original image quality on many high resolution images. These changes, such as pixel complexity or color gradations, are transparent to the end-user and do not noticeably affect perception of the image. Another technique is the replacement of raster graphics with resolution-independent vector graphics. Vector substitution is best suited for simple geometric images. Lazy loading of images and video reduces initial page load time, initial page weight, and system resource usage, all of which have positive impacts on website performance. It is used to defer initialization of an object right until the point at which it is needed. The browser loads the images in a page or post when they are needed such as when the user scrolls down the page and not all images at once, which is the default behavior, and naturally, takes more time. == HTTP/1.x and HTTP/2 == Since web browsers use multiple TCP connections for parallel user requests, congestion and browser monopolization of network resources may occur. Because HTTP/1 requests come with associated overhead, web performance is impacted by limited bandwidth and increased usage. Compared to HTTP/1, HTTP/2 is binary instead of textual is fully multiplexed instead of ordered and blocked can therefore use one connection for parallelism uses header compression to reduce overhead allows servers to "push" responses proactively into client caches Instead of a website's hosting server, CDNs are used in tandem with HTTP/2 in order to better serve the end-user with web resources such as images, JavaScript files and Cascading Style Sheet (CSS) files since a CDN's location is usually in closer proximity to the end-user. == Metrics == In recent years, several metrics have been introduced that help developers measure various aspects of the performance of their websites. In 2019, Google introduced metrics such as Time to First Byte (TTFB), First Contentful Paint (FCP), First Paint (FP), First Input Delay (FID), Cumulative Layout Shift (CLS) and Largest Contentful Paint (LCP) allow for website owner to gain insights into issues that might hurt the performance of their websites making it seem sluggish or slow to the user. Other metrics including Request Count (number of requests required to load a page), DOMContentLoaded (time when HTML document is completely loaded and parsed excluding CSS style sheets, images, etc.), Above The Fold Time (content that is visible without scrolling), Round Trip Time, number of Render Blocking Resources (such as scripts, stylesheets), Onload Time, Connection Time, Total Page Size help provide an accurate picture of latencies and slowdowns occurring at the networking level which might slow down a site. Modules to measure metrics such as TTFB, FCP, LCP, FP etc are provided with major frontend JavaScript libraries such as React, NuxtJS and Vue. Google publishes a library, the core-web-vitals library that allows for easy measurement of these metrics in frontend applications. In addition to this, Google also provides the Lighthouse, a Chrome dev-tools component and PageSpeed Insight a site that allows developers to measure and compare the performance of their website with Google's recommended minimums and maximums. In addition to this, tools such as the Network Monitor by Mozilla Firefox help provide insight into network-level slowdowns that might occur during transmission of data.