Unsupervised learning

Unsupervised learning is a framework in machine learning where, in contrast to supervised learning, algorithms learn patterns exclusively from unlabeled data. Other frameworks in the spectrum of supervisions include weak- or semi-supervision, where a small portion of the data is tagged, and self-supervision. Some researchers consider self-supervised learning a form of unsupervised learning. Conceptually, unsupervised learning divides into the aspects of data, training, algorithm, and downstream applications. Typically, the dataset is harvested cheaply "in the wild", such as massive text corpus obtained by web crawling, with only minor filtering (such as Common Crawl). This compares favorably to supervised learning, where the dataset (such as the ImageNet1000) is typically constructed manually, which is much more expensive. There are algorithms designed specifically for unsupervised learning, such as clustering algorithms like k-means, dimensionality reduction techniques like principal component analysis (PCA), Boltzmann machine learning, and autoencoders. After the rise of deep learning, most large-scale unsupervised learning has been done by training general-purpose neural network architectures by gradient descent, adapted to performing unsupervised learning by designing an appropriate training procedure. Sometimes a trained model can be used as-is, but more often they are modified for downstream applications. For example, the generative pretraining method trains a model to generate a textual dataset, before finetuning it for other applications, such as text classification. As another example, autoencoders are trained to produce good features, which can then be used as a module for other models, such as in a latent diffusion model. == Tasks == Tasks are often categorized as discriminative (recognition) or generative (imagination). Often but not always, discriminative tasks use supervised methods and generative tasks use unsupervised (see Venn diagram); however, the separation is very hazy. For example, object recognition favors supervised learning but unsupervised learning can also cluster objects into groups. Furthermore, as progress marches onward, some tasks employ both methods, and some tasks swing from one to another. For example, image recognition started off as heavily supervised, but became hybrid by employing unsupervised pre-training, and then moved towards supervision again with the advent of dropout, ReLU, and adaptive learning rates. A typical generative task is as follows. At each step, a datapoint is sampled from the dataset, and part of the data is removed, and the model must infer the removed part. This is particularly clear for the denoising autoencoders and BERT. == Neural network architectures == === Training === During the learning phase, an unsupervised network tries to mimic the data it is given and uses the error in its mimicked output to correct itself (i.e. correct its weights and biases). Sometimes the error is expressed as a low probability that the erroneous output occurs, or it might be expressed as an unstable high energy state in the network. In contrast to supervised methods' dominant use of backpropagation, unsupervised learning also employs other methods including: Hopfield learning rule, Boltzmann learning rule, Contrastive Divergence, Wake Sleep, Variational Inference, Maximum Likelihood, Maximum A Posteriori, Gibbs Sampling, and backpropagating reconstruction errors or hidden state reparameterizations. See the table below for more details. === Energy === An energy function is a macroscopic measure of a network's activation state. In Boltzmann machines, it plays the role of the Cost function. This analogy with physics is inspired by Ludwig Boltzmann's analysis of a gas' macroscopic energy from the microscopic probabilities of particle motion p ∝ e − E / k T {\displaystyle p\propto e^{-E/kT}} , where k is the Boltzmann constant and T is temperature. In the RBM network the relation is p = e − E / Z {\displaystyle p=e^{-E}/Z} , where p {\displaystyle p} and E {\displaystyle E} vary over every possible activation pattern and Z = ∑ All Patterns e − E ( pattern ) {\displaystyle \textstyle {Z=\sum _{\scriptscriptstyle {\text{All Patterns}}}e^{-E({\text{pattern}})}}} . To be more precise, p ( a ) = e − E ( a ) / Z {\displaystyle p(a)=e^{-E(a)}/Z} , where a {\displaystyle a} is an activation pattern of all neurons (visible and hidden). Hence, some early neural networks bear the name Boltzmann Machine. Paul Smolensky calls − E {\displaystyle -E\,} the Harmony. A network seeks low energy which is high Harmony. === Networks === This table shows connection diagrams of various unsupervised networks, the details of which will be given in the section Comparison of Networks. Circles are neurons and edges between them are connection weights. As network design changes, features are added on to enable new capabilities or removed to make learning faster. For instance, neurons change between deterministic (Hopfield) and stochastic (Boltzmann) to allow robust output, weights are removed within a layer (RBM) to hasten learning, or connections are allowed to become asymmetric (Helmholtz). Of the networks bearing people's names, only Hopfield worked directly with neural networks. Boltzmann and Helmholtz came before artificial neural networks, but their work in physics and physiology inspired the analytical methods that were used. === History === === Specific Networks === Here, we highlight some characteristics of select networks. The details of each are given in the comparison table below. Hopfield Network Ferromagnetism inspired Hopfield networks. A neuron corresponds to an iron domain with binary magnetic moments Up and Down, and neural connections correspond to the domain's influence on each other. Symmetric connections enable a global energy formulation. During inference the network updates each state using the standard activation step function. Symmetric weights and the right energy functions guarantees convergence to a stable activation pattern. Asymmetric weights are difficult to analyze. Hopfield nets are used as Content Addressable Memories (CAM). Boltzmann Machine These are stochastic Hopfield nets. Their state value is sampled from this pdf as follows: suppose a binary neuron fires with the Bernoulli probability p(1) = 1/3 and rests with p(0) = 2/3. One samples from it by taking a uniformly distributed random number y, and plugging it into the inverted cumulative distribution function, which in this case is the step function thresholded at 2/3. The inverse function = { 0 if x <= 2/3, 1 if x > 2/3 }. Sigmoid Belief Net Introduced by Radford Neal in 1992, this network applies ideas from probabilistic graphical models to neural networks. A key difference is that nodes in graphical models have pre-assigned meanings, whereas Belief Net neurons' features are determined after training. The network is a sparsely connected directed acyclic graph composed of binary stochastic neurons. The learning rule comes from Maximum Likelihood on p(X): Δwij ∝ {\displaystyle \propto } sj (si - pi), where pi = 1 / ( 1 + eweighted inputs into neuron i ). sj's are activations from an unbiased sample of the posterior distribution and this is problematic due to the Explaining Away problem raised by Judea Perl. Variational Bayesian methods uses a surrogate posterior and blatantly disregard this complexity. Deep Belief Network Introduced by Hinton, this network is a hybrid of RBM and Sigmoid Belief Network. The top 2 layers is an RBM and the second layer downwards form a sigmoid belief network. One trains it by the stacked RBM method and then throw away the recognition weights below the top RBM. As of 2009, 3-4 layers seems to be the optimal depth. Helmholtz machine These are early inspirations for the Variational Auto Encoders. Its 2 networks combined into one—forward weights operates recognition and backward weights implements imagination. It is perhaps the first network to do both. Helmholtz did not work in machine learning but he inspired the view of "statistical inference engine whose function is to infer probable causes of sensory input". the stochastic binary neuron outputs a probability that its state is 0 or 1. The data input is normally not considered a layer, but in the Helmholtz machine generation mode, the data layer receives input from the middle layer and has separate weights for this purpose, so it is considered a layer. Hence this network has 3 layers. Variational autoencoder These are inspired by Helmholtz machines and combines probability network with neural networks. An Autoencoder is a 3-layer CAM network, where the middle layer is supposed to be some internal representation of input patterns. The encoder neural network is a probability distribution qφ(z given x) and the decoder network is pθ(x given z). The weights are named phi & theta rather than W and V as in Helmholtz—a cosmetic difference. These 2 networks h

Kernel embedding of distributions

In machine learning, the kernel embedding of distributions (also called the kernel mean or mean map) comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space Ω {\displaystyle \Omega } on which a sensible kernel function (measuring similarity between elements of Ω {\displaystyle \Omega } ) may be defined. For example, various kernels have been proposed for learning from data which are: vectors in R d {\displaystyle \mathbb {R} ^{d}} , discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song, Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in. The analysis of distributions is fundamental in machine learning and statistics, and many algorithms in these fields rely on information theoretic approaches such as entropy, mutual information, or Kullback–Leibler divergence. However, to estimate these quantities, one must first either perform density estimation, or employ sophisticated space-partitioning/bias-correction strategies which are typically infeasible for high-dimensional data. Commonly, methods for modeling complex distributions rely on parametric assumptions that may be unfounded or computationally challenging (e.g. Gaussian mixture models), while nonparametric methods like kernel density estimation (Note: the smoothing kernels in this context have a different interpretation than the kernels discussed here) or characteristic function representation (via the Fourier transform of the distribution) break down in high-dimensional settings. Methods based on the kernel embedding of distributions sidestep these problems and also possess the following advantages: Data may be modeled without restrictive assumptions about the form of the distributions and relationships between variables Intermediate density estimation is not needed Practitioners may specify the properties of a distribution most relevant for their problem (incorporating prior knowledge via choice of the kernel) If a characteristic kernel is used, then the embedding can uniquely preserve all information about a distribution, while thanks to the kernel trick, computations on the potentially infinite-dimensional RKHS can be implemented in practice as simple Gram matrix operations Dimensionality-independent rates of convergence for the empirical kernel mean (estimated using samples from the distribution) to the kernel embedding of the true underlying distribution can be proven. Learning algorithms based on this framework exhibit good generalization ability and finite sample convergence, while often being simpler and more effective than information theoretic methods Thus, learning via the kernel embedding of distributions offers a principled drop-in replacement for information theoretic approaches and is a framework which not only subsumes many popular methods in machine learning and statistics as special cases, but also can lead to entirely new learning algorithms. == Definitions == Let X {\displaystyle X} denote a random variable with domain Ω {\displaystyle \Omega } and distribution P {\displaystyle P} . Given a symmetric, positive-definite kernel k : Ω × Ω → R {\displaystyle k:\Omega \times \Omega \rightarrow \mathbb {R} } the Moore–Aronszajn theorem asserts the existence of a unique RKHS H {\displaystyle {\mathcal {H}}} on Ω {\displaystyle \Omega } (a Hilbert space of functions f : Ω → R {\displaystyle f:\Omega \to \mathbb {R} } equipped with an inner product ⟨ ⋅ , ⋅ ⟩ H {\displaystyle \langle \cdot ,\cdot \rangle _{\mathcal {H}}} and a norm ‖ ⋅ ‖ H {\displaystyle \|\cdot \|_{\mathcal {H}}} ) for which k {\displaystyle k} is a reproducing kernel, i.e., in which the element k ( x , ⋅ ) {\displaystyle k(x,\cdot )} satisfies the reproducing property ⟨ f , k ( x , ⋅ ) ⟩ H = f ( x ) ∀ f ∈ H , ∀ x ∈ Ω . {\displaystyle \langle f,k(x,\cdot )\rangle _{\mathcal {H}}=f(x)\qquad \forall f\in {\mathcal {H}},\quad \forall x\in \Omega .} One may alternatively consider x ↦ k ( x , ⋅ ) {\displaystyle x\mapsto k(x,\cdot )} as an implicit feature mapping φ : Ω → H {\displaystyle \varphi :\Omega \rightarrow {\mathcal {H}}} (which is therefore also called the feature space), so that k ( x , x ′ ) = ⟨ φ ( x ) , φ ( x ′ ) ⟩ H {\displaystyle k(x,x')=\langle \varphi (x),\varphi (x')\rangle _{\mathcal {H}}} can be viewed as a measure of similarity between points x , x ′ ∈ Ω . {\displaystyle x,x'\in \Omega .} While the similarity measure is linear in the feature space, it may be highly nonlinear in the original space depending on the choice of kernel. === Kernel embedding === The kernel embedding of the distribution P {\displaystyle P} in H {\displaystyle {\mathcal {H}}} (also called the kernel mean or mean map) is given by: μ X := E [ k ( X , ⋅ ) ] = E [ φ ( X ) ] = ∫ Ω φ ( x ) d P ( x ) {\displaystyle \mu _{X}:=\mathbb {E} [k(X,\cdot )]=\mathbb {E} [\varphi (X)]=\int _{\Omega }\varphi (x)\ \mathrm {d} P(x)} If P {\displaystyle P} allows a square integrable density p {\displaystyle p} , then μ X = E k p {\displaystyle \mu _{X}={\mathcal {E}}_{k}p} , where E k {\displaystyle {\mathcal {E}}_{k}} is the Hilbert–Schmidt integral operator. A kernel is characteristic if the mean embedding μ : { family of distributions over Ω } → H {\displaystyle \mu :\{{\text{family of distributions over }}\Omega \}\to {\mathcal {H}}} is injective. Each distribution can thus be uniquely represented in the RKHS and all statistical features of distributions are preserved by the kernel embedding if a characteristic kernel is used. === Empirical kernel embedding === Given n {\displaystyle n} training examples { x 1 , … , x n } {\displaystyle \{x_{1},\ldots ,x_{n}\}} drawn independently and identically distributed (i.i.d.) from P , {\displaystyle P,} the kernel embedding of P {\displaystyle P} can be empirically estimated as μ ^ X = 1 n ∑ i = 1 n φ ( x i ) {\displaystyle {\widehat {\mu }}_{X}={\frac {1}{n}}\sum _{i=1}^{n}\varphi (x_{i})} === Joint distribution embedding === If Y {\displaystyle Y} denotes another random variable (for simplicity, assume the co-domain of Y {\displaystyle Y} is also Ω {\displaystyle \Omega } with the same kernel k {\displaystyle k} which satisfies ⟨ φ ( x ) ⊗ φ ( y ) , φ ( x ′ ) ⊗ φ ( y ′ ) ⟩ = k ( x , x ′ ) k ( y , y ′ ) {\displaystyle \langle \varphi (x)\otimes \varphi (y),\varphi (x')\otimes \varphi (y')\rangle =k(x,x')k(y,y')} ), then the joint distribution P ( x , y ) ) {\displaystyle P(x,y))} can be mapped into a tensor product feature space H ⊗ H {\displaystyle {\mathcal {H}}\otimes {\mathcal {H}}} via C X Y = E [ φ ( X ) ⊗ φ ( Y ) ] = ∫ Ω × Ω φ ( x ) ⊗ φ ( y ) d P ( x , y ) {\displaystyle {\mathcal {C}}_{XY}=\mathbb {E} [\varphi (X)\otimes \varphi (Y)]=\int _{\Omega \times \Omega }\varphi (x)\otimes \varphi (y)\ \mathrm {d} P(x,y)} By the equivalence between a tensor and a linear map, this joint embedding may be interpreted as an uncentered cross-covariance operator C X Y : H → H {\displaystyle {\mathcal {C}}_{XY}:{\mathcal {H}}\to {\mathcal {H}}} from which the cross-covariance of functions f , g ∈ H {\displaystyle f,g\in {\mathcal {H}}} can be computed as Cov ⁡ ( f ( X ) , g ( Y ) ) := E [ f ( X ) g ( Y ) ] − E [ f ( X ) ] E [ g ( Y ) ] = ⟨ f , C X Y g ⟩ H = ⟨ f ⊗ g , C X Y ⟩ H ⊗ H {\displaystyle \operatorname {Cov} (f(X),g(Y)):=\mathbb {E} [f(X)g(Y)]-\mathbb {E} [f(X)]\mathbb {E} [g(Y)]=\langle f,{\mathcal {C}}_{XY}g\rangle _{\mathcal {H}}=\langle f\otimes g,{\mathcal {C}}_{XY}\rangle _{{\mathcal {H}}\otimes {\mathcal {H}}}} Given n {\displaystyle n} pairs of training examples { ( x 1 , y 1 ) , … , ( x n , y n ) } {\displaystyle \{(x_{1},y_{1}),\dots ,(x_{n},y_{n})\}} drawn i.i.d. from P {\displaystyle P} , we can also empirically estimate the joint distribution kernel embedding via C ^ X Y = 1 n ∑ i = 1 n φ ( x i ) ⊗ φ ( y i ) {\displaystyle {\widehat {\mathcal {C}}}_{XY}={\frac {1}{n}}\sum _{i=1}^{n}\varphi (x_{i})\otimes \varphi (y_{i})} === Conditional distribution embedding === Given a conditional distribution P ( y ∣ x ) , {\displaystyle P(y\mid x),} one can define the corresponding RKHS embedding as μ Y ∣ x = E [ φ ( Y ) ∣ X ] = ∫ Ω φ ( y ) d P ( y ∣ x ) {\displaystyle \mu _{Y\mid x}=\mathbb {E} [\varphi (Y)\mid X]=\int _{\Omega

Digital video recorder

A digital video recorder (DVR), also referred to as a personal video recorder (PVR) particularly in Canadian and British English, is an electronic device that records video in a digital format to a disk drive, USB flash drive, SD memory card, SSD or other local or networked mass storage device. The term includes set-top boxes (STB) with direct to disk recording, portable media players and TV gateways with recording capability, and digital camcorders. Personal computers can be connected to video capture devices and used as DVRs; in such cases the application software used to record video is an integral part of the DVR. Many DVRs are classified as consumer electronic devices. Similar small devices with built-in (~5 inch diagonal) displays and SSD support may be used for professional film or video production, as these recorders often do not have the limitations that built-in recorders in cameras have, offering wider codec support, the removal of recording time limitations and higher bitrates. == History == In the 1980s, prototype high-definition (HD) digital video recorders were developed by Fujitsu, Hitachi, Sanyo and Canon Inc. In 1985, Hitachi demonstrated a prototype digital video tape recorder (VTR) that used digital recording video tape as storage media to record digital HD video content. In 1987, the first commercial digital video recorder was the Sony DVR-1000, a digital video cassette recorder (VCR) that recorded digital video content on D-1 (Sony) digital video cassettes. === Hard-disk-based DVR === In early 1995, Tektronix introduced the "Profile" series PDR100 Video Disk Recorder, which recorded and played back video stored on hard disk as motion JPEG. In 1996, Sweden's TV4 used the PDR100 extensively in building a new facility in Stockholm, and NBC used PDR100s at the Olympic games in Atlanta Georgia. The Tektronix Profile disk recorder won an Engineering, Science & Technology Emmy Award for "Outstanding Achievement in Engineering Development" at the 1996 Primetime Emmy Awards. In 1997 the U.S. Patent Office granted Tektronix patent 5,642,497 for two claims key to Profile. In 1998, Tektronix introduced two Profile models which were combined VDRs and file servers: the PDR200 and PDR300. The PDR300 stored its compressed video as MPEG-2 (ISO/IEC 13818-2) A working disk-based DVR prototype was developed in 1998 at Stanford University Computer Science department. The DVR design was a chapter of Edward Y. Chang's PhD dissertation, supervised by Professors Hector Garcia-Molina and Jennifer Widom. Two design papers were published at the 1998 VLDB conference, and the 1999 ICDE conference. The prototype was developed in 1998 at Pat Hanrahan's CS488 class: Experiments in Digital Television, and the prototype was demoed to industrial partners including Sony, Intel, and Apple. Consumer digital video recorders ReplayTV and TiVo were launched at the 1999 Consumer Electronics Show in Las Vegas, Nevada. Microsoft also demonstrated a unit with DVR capability, but this did not become available until the end of 1999 for full DVR features in Dish Network's DISHplayer receivers. TiVo shipped their first units on March 31, 1999. ReplayTV won the "Best of Show" award in the video category with Netscape co-founder Marc Andreessen as an early investor and board member, but TiVo was more successful commercially. Ad Age cited Forrester Research as saying that market penetration by the end of 1999 was "less than 100,000". In 2001, Toshiba introduced a combination DVR that allows video recording on both DVD recordable and hard disk drive. Legal action by media companies forced ReplayTV to remove many features such as automatic commercial skip and the sharing of recordings over the Internet, but newer devices have steadily regained these functions while adding complementary abilities, such as recording onto DVDs and programming and remote control facilities using PDAs, networked PCs, and Web browsers. In contrast to VCRs, hard-disk based digital video recorders make "time shifting" more convenient and also allow for functions such as pausing live TV, instant replay, chasing playback (viewing a recording before it has been completed) and skipping over advertising during playback. Many DVRs use the MPEG format for compressing the digital video. Video recording capabilities have become an essential part of the modern set-top box, as TV viewers have wanted to take control of their viewing experiences. As consumers have been able to converge increasing amounts of video content on their set-tops, delivered by traditional 'broadcast' cable, satellite and terrestrial as well as IP networks, the ability to capture programming and view it whenever they want has become a must-have function for many consumers. === DVR tied to video service === At the 1999 CES, Dish Network demonstrated the hardware that would later have DVR capability with the assistance of Microsoft software, which also included access to the WebTV service. By the end of 1999 the Dishplayer had full DVR capabilities and within a year, over 200,000 units were sold. In the UK, digital video recorders are often referred to as "plus boxes" (such as BSKYB's Sky+ and Virgin Media's V+ which integrates an HD capability, and the subscription free Freesat+ and Freeview+). Freeview+ have been around in the UK since the late 2000s, although the platform's first DVR, the Pace Twin, dates to 2002. British Sky Broadcasting marketed a popular combined receiver and DVR as Sky+, now replaced by the Sky Q box. TiVo launched a UK model in 2000, and is no longer supported, except for third party services, and the continuation of TiVo through Virgin Media in 2010. South African based Africa Satellite TV beamer Multichoice recently launched their DVR which is available on their DStv platform. In addition to ReplayTV and TiVo, there are a number of other suppliers of digital terrestrial (DTT) DVRs, including Technicolor SA, Topfield, Fusion, Commscope, Humax, VBox Communications, AC Ryan Playon and Advanced Digital Broadcast (ADB). Many satellite, cable and IPTV companies are incorporating digital video recording functions into their set-top box, such as with DirecTiVo, DISHPlayer/DishDVR, Scientific Atlanta Explorer 8xxx from Time Warner, Total Home DVR from AT&T U-verse, Motorola DCT6412 from Comcast and others, Moxi Media Center by Digeo (available through Charter, Adelphia, Sunflower, Bend Broadband, and soon Comcast and other cable companies), or Sky+. Astro introduced their DVR system, called Astro MAX, which was the first PVR in Malaysia but was phased out two years after its introduction. In the case of digital television, there is no encoding necessary in the DVR since the signal is already a digitally encoded MPEG stream. The digital video recorder simply stores the digital stream directly to disk. Having the broadcaster involved with, and sometimes subsidizing, the design of the DVR can lead to features such as the ability to use interactive TV on recorded shows, pre-loading of programs, or directly recording encrypted digital streams. It can, however, also force the manufacturer to implement non-skippable advertisements and automatically expiring recordings. In the United States, the FCC has ruled that starting on July 1, 2007, consumers will be able to purchase a set-top box from a third-party company, rather than being forced to purchase or rent the set-top box from their cable company. This ruling only applies to "navigation devices", otherwise known as a cable television set-top box, and not to the security functions that control the user's access to the content of the cable operator. The overall net effect on digital video recorders and related technology is unlikely to be substantial as standalone DVRs are currently readily available on the open market. In Europe Free-To-Air and Pay TV TV gateways with multiple tuners have whole house recording capabilities allowing recording of TV programs to Network Attached Storage or attached USB storage, recorded programs are then shared across the home network to tablet, smartphone, PC, Mac, Smart TV. === Introduction of dual tuners === In 2003 many Satellite and Cable providers introduced dual-tuner digital video recorders. In the UK, BSkyB introduced their first PVR Sky+ with dual tuner support in 2001. These machines have two independent tuners within the same receiver. The main use for this feature is the capability to record a live program while watching another live program simultaneously or to record two programs at the same time, possibly while watching a previously recorded one. Kogan.com introduced a dual-tuner PVR in the Australian market allowing free-to-air television to be recorded on a removable hard drive. Some dual-tuner DVRs also have the ability to output to two separate television sets at the same time. The PVR manufactured by UEC (Durban, South Africa) and used by Multichoice and Scientific Atlanta 8300DVB PVR have the ability to view two

Web3D

Web3D, also called 3D Web, is a group of technologies to display and navigate websites using 3D computer graphics. These technologies enable applications such as online games, virtual reality experiences, interactive product demonstrations, and 3D data visualization directly within web browsers. The emergence of Web3D dates back to 1994, with the advent of VRML, a file format designed to store and display 3D graphical data on the World Wide Web. Modern Web3D is primarily powered by WebGL, a JavaScript API that enables hardware-accelerated 3D graphics rendering in web browsers without requiring plug-ins. == Pre-WebGL era == The emergence of Web3D dates back to 1994, with the advent of VRML, a file format designed to store and display 3D graphical data on the World Wide Web. In October 1995, at Internet World, Template Graphics Software demonstrated a 3D/VRML plug-in for the beta release of Netscape 2.0 by Netscape Communications. The Web3D Consortium was formed to further the collective development of the format. VRML and its successor, X3D, have been accepted as international standards by the International Organization for Standardization and the International Electrotechnical Commission. The main drawback of the technology was the requirement to use third-party browser plug-ins to perform 3D rendering, which slowed the adoption of the standard. Between 2000 and 2010, one of these plug-ins, Adobe Flash Player, was widely installed on desktop computers and was used to display interactive web pages and online games and to play video and audio content. Several Flash-based frameworks appeared that used software rendering and ActionScript 3 to perform 3D computations such as transformations, lighting, and texturing. Most notable among them were Papervision3D and Away3D. Eventually, Adobe developed Stage3D, an API for rendering interactive 3D graphics with GPU-acceleration for its Flash player and AIR products, which was adopted by software vendors. In 2009, an open-source 3D web technology called O3D was introduced by Google. It also required a browser plug-in, but contrary to Flash/Stage3D, was based on JavaScript API. O3D was geared not only for games but also for advertisements, 3D model viewers, product demos, simulations, engineering applications, control and monitoring systems. == WebGL and glTF == WebGL (short for "Web Graphics Library") evolved out of the Canvas 3D experiments started by Vladimir Vukićević at Mozilla Foundation. Vukićević first demonstrated a Canvas 3D prototype in 2006. By the end of 2007, both Mozilla and Opera had made their own separate implementations. In early 2009, the nonprofit technology consortium Khronos Group started the WebGL Working Group, with initial participation from Apple, Google, Mozilla, Opera, and others. Version 1.0 of the WebGL specification was released in March 2011. Major advantages of the new technology include conformity with web standards and near-native 3D performance without the use of any browser plug-ins. Since WebGL is based on OpenGL ES, it works on mobile devices without any additional abstraction layers. For other platforms, WebGL implementations leverage ANGLE to translate OpenGL ES calls to DirectX, OpenGL, or Vulkan API calls. Among notable WebGL frameworks are A-Frame, which uses HTML-based markup for building virtual reality experiences; PlayCanvas, an open-source engine alongside a proprietary cloud-hosted creation platform for building browser games; Three.js, an MIT-licensed framework used to create demoscene from the early 2000s; Unity, which obtained a WebGL back-end in version 5; and Verge3D, which integrates with Blender, 3ds Max, and Maya to create 3D web content. With the rapid adoption of WebGL, a new problem arose—the lack of a 3D file format optimized for the Web. This issue was addressed by glTF, a format that was conceived in 2012 by members of the COLLADA working group. At SIGGRAPH 2012, Khronos presented a demo of glTF, which was then called WebGL Transmissions Format (WebGL TF). On 19 October 2015, the glTF 1.0 specification was released. Version 2.0 glTF uses a physically based rendering material model, proposed by Fraunhofer. Other upgrades include sparse accessors and morph targets for techniques such as facial animation, and schema tweaks and breaking changes for corner cases or performance, such as replacing top-level glTF object properties with arrays for faster index-based access. == Future == "WebGPU" is the working name for a potential web standard and JavaScript API for accelerated graphics and computing, aiming to provide "modern 3D graphics and computation capabilities". It is developed by the W3C "GPU for the Web" Community Group, with engineers from Apple, Mozilla, Microsoft, and Google, among others. WebGPU will not be based on any existing 3D API and will use Rust-like syntax for shaders.

1tik

1tik, pronounced Antik (Arabic: أنتيك; lit. "Everything is going well") is a fully Algerian instant messaging, social media and mobile payment app. designed, developed and built locally by the Algerian start-up, INTAJ Digital, with backing from the state-owned company ATM Mobilis (who's the company's main sponsor). It is described as Algeria's first super-app that is entirely designed and built by local developers. == Etymology == The name "1tik" (Arabic: أنتيك) is drawn from the popular Algerian vernacular (Antik), the neologism, which appeared several years ago, means "everything is going well" or "it's all good". == History == 1tik was officially launched and announced the 20th December 2025 by INTAJ Digital's founder Youcef Toulaib and a team of 50 employees, making it the first ever Algerian instant messaging, social media and mobile payment app, rivaling with the growing influence of Yassir in Algeria. it grew in popularity after the presidency of Algeria and several other state-owned companies, medias, and ministries opened official accounts on the app.

Spreading activation

Spreading activation is a method for searching associative networks, biological and artificial neural networks, or semantic networks. The search process is initiated by labeling a set of source nodes (e.g. concepts in a semantic network) with weights or "activation" and then iteratively propagating or "spreading" that activation out to other nodes linked to the source nodes. Most often these "weights" are real values that decay as activation propagates through the network. When the weights are discrete this process is often referred to as marker passing. Activation may originate from alternate paths, identified by distinct markers, and terminate when two alternate paths reach the same node. However brain studies show that several different brain areas play an important role in semantic processing. Spreading activation in semantic networks as a model were invented in cognitive psychology to model the fan out effect. Spreading activation can also be applied in information retrieval, by means of a network of nodes representing documents and terms contained in those documents. == Cognitive psychology == As it relates to cognitive psychology, spreading activation is the theory of how the brain iterates through a network of associated ideas to retrieve specific information. The spreading activation theory presents the array of concepts within our memory as cognitive units, each consisting of a node and its associated elements or characteristics, all connected together by edges. A spreading activation network can be represented schematically, in a sort of web diagram with shorter lines between two nodes meaning the ideas are more closely related and will typically be associated more quickly to the original concept. In memory psychology, the spreading activation model holds that people organize their knowledge of the world based on their personal experiences, which in turn form the network of ideas that is the person's knowledge of the world. When a word (the target) is preceded by an associated word (the prime) in word recognition tasks, participants seem to perform better in the amount of time that it takes them to respond. For instance, subjects respond faster to the word "doctor" when it is preceded by "nurse" than when it is preceded by an unrelated word like "carrot". This semantic priming effect with words that are close in meaning within the cognitive network has been seen in a wide range of tasks given by experimenters, ranging from sentence verification to lexical decision and naming. As another example, if the original concept is "red" and the concept "vehicles" is primed, they are much more likely to say "fire engine" instead of something unrelated to vehicles, such as "cherries". If instead "fruits" was primed, they would likely name "cherries" and continue on from there. The activation of pathways in the network has everything to do with how closely linked two concepts are by meaning, as well as how a subject is primed. == Algorithm == A directed graph is populated by Nodes[ 1...N ] each having an associated activation value A [ i ] which is a real number in the range [0.0 ... 1.0]. A Link[ i, j ] connects source node[ i ] with target node[ j ]. Each edge has an associated weight W [ i, j ] usually a real number in the range [0.0 ... 1.0]. Parameters: Firing threshold F, a real number in the range [0.0 ... 1.0] Decay factor D, a real number in the range [0.0 ... 1.0] Steps: Initialize the graph setting all activation values A [ i ] to zero. Set one or more origin nodes to an initial activation value greater than the firing threshold F. A typical initial value is 1.0. For each unfired node [ i ] in the graph having an activation value A [ i ] greater than the node firing threshold F: For each Link [ i, j ] connecting the source node [ i ] with target node [ j ], adjust A [ j ] = A [ j ] + (A [ i ] W [ i, j ] D) where D is the decay factor. If a target node receives an adjustment to its activation value so that it would exceed 1.0, then set its new activation value to 1.0. Likewise maintain 0.0 as a lower bound on the target node's activation value should it receive an adjustment to below 0.0. Once a node has fired it may not fire again, although variations of the basic algorithm permit repeated firings and loops through the graph. Nodes receiving a new activation value that exceeds the firing threshold F are marked for firing on the next spreading activation cycle. If activation originates from more than one node, a variation of the algorithm permits marker passing to distinguish the paths by which activation is spread over the graph The procedure terminates when either there are no more nodes to fire or in the case of marker passing from multiple origins, when a node is reached from more than one path. Variations of the algorithm that permit repeated node firings and activation loops in the graph, terminate after a steady activation state, with respect to some delta, is reached, or when a maximum number of iterations is exceeded. == Examples ==

Hyperscale computing

In computing, hyperscale is the ability of an architecture to scale appropriately as increased demand is added to the system. This typically involves the ability to seamlessly provide and add computing, memory, networking, and storage resources to a given node or set of nodes that make up a larger computing, distributed computing, or grid computing environment. Hyperscale computing is necessary in order to build a robust and scalable cloud, big data, map reduce, or distributed storage system and is often associated with the infrastructure required to run large distributed sites such as Google, Facebook, Twitter, Amazon, Microsoft, IBM Cloud, Oracle Cloud, or Cloudflare. Companies like Ericsson, AMD, and Intel provide hyperscale infrastructure kits for IT service providers. Companies like Scaleway, Switch, Alibaba, IBM, QTS, Neysa, Digital Realty Trust, Equinix, Oracle, Meta, Amazon Web Services, SAP, Microsoft, Google, and Cloudflare build data centers for hyperscale computing. Such companies are sometimes called "hyperscalers". They are recognized for their massive scale in cloud computing and data management, operating in environments that require extensive infrastructure to accommodate large-scale data processing and storage.