OpenWebRTC (OWR) is a free software stack that implements the WebRTC standard, a set of protocols and application programming interfaces defined by the World Wide Web Consortium (W3C) and the Internet Engineering Task Force (IETF). It is an alternative to the reference implementation that is based on software from Global IP Solutions (GIPS). It is published under the terms of the Simplified (2-clause) BSD license and officially supports iOS, Linux, OS X, and Android operating systems. It is meant to also work outside web browsers, e.g. to power native mobile apps. It is mostly written in C and based largely on the multimedia framework GStreamer and a number of other, smaller external libraries. It officially supports both VP8 and H.264 as video formats. For H.264 it uses OpenH264 to which Cisco pays the patent licensing bills. Development of OpenWebRTC started at Ericsson Research under the lead of Stefan Ålund. They released it as free software in September 2014, together with the proof-of-concept web browser "Bowser" that is based on the stack. Among other things, this initial version didn't support data channels yet and was said to still be less mature than Google's reference implementation.
NAPLPS
NAPLPS (North American Presentation Layer Protocol Syntax) is a graphics language for use originally with videotex and teletext services. NAPLPS was developed from the Telidon system developed in Canada, with a small number of additions from AT&T Corporation. The basics of NAPLPS were later used as the basis for several other microcomputer-based graphics systems. == History == The Canadian Communications Research Centre (CRC), based in Ottawa, had been working on various graphics systems since the late 1960s, much of it led by Herb Bown. Through the 1970s they turned their attention to building out a system of "picture description instructions", which encoded graphics commands as a text stream. Graphics were encoded as a series of instructions (graphics primitives) each represented by a single ASCII character. Graphic coordinates were encoded in multiple 6-bit strings of XY coordinate data, flagged to place them in the printable ASCII range so that they could be transmitted with conventional text transmission techniques. ASCII SI/SO characters were used to differentiate the text from graphic portions of a transmitted "page". These instructions were decoded by separate programs to produce graphics output, on a plotter for instance. Other work produced a fully interactive version. In 1975, the CRC gave a contract to Norpak to develop an interactive graphics terminal that could decode the instructions and display them on a color display. During this period, a number of companies were developing the first teletext systems, notably the BBC's Ceefax system. Ceefax encoded character data into the lines in the vertical blanking interval of normal television signals where they could not be seen on-screen, and then used a buffer and decoder in the user's television to convert these into "pages" of text on the display. The Independent Broadcasting Authority quickly introduced their own ORACLE system, and the two organizations subsequently agreed to use a single standard, the "Broadcast Teletext Specification". This later became World System Teletext. At about the same time, other organizations were developing videotex systems, similar to teletext except they used modems to transmit their data instead of television signals. This was potentially slower and used up a telephone line, but had the major advantage of allowing the user to transmit data back to the sender. The UK's General Post Office developed a system using the Ceefax/ORACLE standard, launching it as Prestel, while France prepared the first steps for its ultimately very successful Minitel system, using a rival display standard called Antiope. By 1977, the Norpak system was running, and from this work the CRC decided to create their own teletext/videotext system. Unlike the systems being rolled out in Europe, the CRC decided from the start that the system should be able to run on any combination of communications links. For instance, it could use the vertical blanking interval to send data to the user, and a modem to return selections to the servers. It could be used in a one-way or two-way system. In teletext mode, character codes were sent to users' televisions by encoding them as dot patterns in the vertical blanking interval of the video signal. Various technical "tweaks" and details of the NTSC signals used by North American televisions allowed the downstream videotex channel to increase to 600 bit/s, about twice that used in the European systems. In videotext mode, Bell 202 modems were typical, offering a 1,200 bit/s download rate. A set top box attached to the TV decoded these signals back into text and graphics pages, which the user could select among. The system was publicly launched as Telidon on August 15, 1978. Compared to the European standards, the CRC system was faster, bi-directional, and offered real graphics as opposed to simple character graphics. The downside of the system was that it required much more advanced decoders, typically featuring Zilog Z80 or Motorola 6809 processors with RGB and/or RF output. The Innovation, Science and Economic Development Canada (then Department of Communications) launched a four-year plan to fund public roll-outs of the technology in an effort to spur the development of a commercial Telidon system. AT&T Corporation was so impressed by Telidon that they decided to join the project. They added a number of useful extensions, notably the ability to define original graphics commands (macro) and character sets (DRCS). They also tabled algorithms for proportionally spaced text, which greatly improved the quality of the displayed pages. A joint CSA/ANSI working group (X3L2.1) revised the specifications, which were submitted for standardization. In 1983, they became CSA T500 and ANSI X3.110, or NAPLPS. The data encoding system was also standardized as the NABTS (North American Broadcast Teletext Specification) protocol. Business models for Telidon services were poorly developed. Unlike the UK, where teletext was supported by one of only two large companies whose whole revenue model was based on a read-only medium (television), in North America Telidon was being offered by companies who worked on a subscriber basis. == One-way systems == Telidon-based teletext was tested in a few North American trials in the early 1980s — CBC IRIS, TVOntario, MTS-sponsored Project IDA, to name a few. NAPLPS was also part of the NABTS teletext standard, for the encoding and display of teletext pages. In the late 1980s and early 1990s, affiliates of the regional sports network group SportsChannel ran a service called Sports Plus Network, which ran sports news and scores while SportsChannel was not otherwise on the air. The screens, which frequently featured team logos or likenesses of players in addition to text, were drawn entirely with NAPLPS graphics and resembled the loading of Prodigy pages over a modem, though slightly faster. == Two-way systems == Various two-way systems using NAPLPS appeared in North America in the early 1980s. The biggest North American examples were Knight Ridder's Viewtron (based in Miami) and the Los Angeles Times' Gateway service (based in Orange County). Both used the Sceptre NAPLPS terminal from AT&T. The Sceptre contained a slow modem that connected over the consumer's telephone line to host computers. The Sceptre was expensive whether purchased or rented. Despite huge investments by their parent companies, neither Viewtron nor Gateway lasted into the second half of the decade. Another system, Keyfax, was developed by Keycom Electronic Publishing, a joint venture of Honeywell, Centel (since acquired by Sprint) and Field Enterprises, then-owner of the Chicago Sun-Times newspaper. Keyfax had originally been a WST teletext service, broadcast overnights on Field's Chicago television station WFLD-32 and through the VBI of both WFLD and national superstation WTBS; the decision was made to convert Keyfax into a subscription service, using a proprietary NAPLPS terminal device in a last-ditch effort to save the service. It did not work and Keyfax had ceased operations by the end of 1986. Other early-1980s NAPLPS technology was deployed in Canada, both as a way for rural Canadians to get news and weather information and as the platform for touchscreen information kiosks. In Vancouver these were featured at Expo 86. The kiosks became ubiquitous in Toronto under the name Teleguide, and were deployed in many shopping centres and at major tourist attractions. The latter city was the North American nexus of NAPLPS and the home of Norpak, the most successful of NAPLPS-oriented developers. Norpak created and sold hardware and software for NAPLPS development and display. TVOntario also developed NAPLPS content creation software. London, Ontario - based Cableshare used NAPLPS as the basis of touch-screen information kiosks for shopping malls, the flagship of which was deployed at Toronto's Eaton Centre. The system relied on an 8085-based microcomputer which drove several NAPLPS terminals fitted with touch screens, all communicating via Datapac to a back end database. The system offered news, weather and sports information along with shopping mall guides and coupons. Cableshare also developed and sold a leading NAPLPS page creation utility called the "Picture Painter." In the late 1980s, Tribune Media Services (TMS) and the Associated Press operated a cable television channel called AP News Plus that provided NAPLPS-based news screens to cable television subscribers in many U.S. cities. The news pages were created and edited by TMS staffers working on an Atex editing system in Orlando, Florida, and sent by satellite to NAPLPS decoder devices located at the local cable television companies. Among the firms providing technology to TMS and the Associated Press for the AP News Plus channel was Minneapolis-based Electronic Publishers Inc. (1985–1988). In 1981, two amateur radio operators (VE3FTT and VE3GQW) received special permission from the Canad
Residuated Boolean algebra
In mathematics, a residuated Boolean algebra is a residuated lattice whose lattice structure is that of a Boolean algebra. Examples include Boolean algebras with the monoid taken to be conjunction, the set of all formal languages over a given alphabet Σ {\displaystyle \Sigma } under concatenation, the set of all binary relations on a given set X {\displaystyle X} under relational composition, and more generally the power set of any equivalence relation, again under relational composition. The original application was to relation algebras as a finitely axiomatized generalization of the binary relation example, but there exist interesting examples of residuated Boolean algebras that are not relation algebras, such as the language example. == Definition == A residuated Boolean algebra is an algebraic structure ( L , ∧ , ∨ , ¬ , 0 , 1 , ∙ , I , / , ∖ ) {\displaystyle (L,\wedge ,\vee ,\neg ,0,1,\bullet ,\mathbf {I} ,/,\backslash )} such that An equivalent signature better suited to the relation algebra application is ( L , ∧ , ∨ , ¬ , 0 , 1 , ∙ , I , ▹ , ◃ ) {\displaystyle (L,\wedge ,\vee ,\neg ,0,1,\bullet ,\mathbf {I} ,\triangleright ,\triangleleft )} where the unary operations x ∖ {\displaystyle x\backslash } and x ▹ {\displaystyle x\triangleright } are intertranslatable in the manner of De Morgan's laws via x ∖ y = ¬ ( x ▹ ¬ y ) {\displaystyle x\backslash y=\neg (x\triangleright \neg y)} , x ▹ y = ¬ ( x ∖ ¬ y ) {\displaystyle x\triangleright y=\neg (x\backslash \neg y)} , and dually / y {\displaystyle /y} and ◃ y {\displaystyle \triangleleft y} as x / y = ¬ ( ¬ x ◃ y ) {\displaystyle x/y=\neg (\neg x\triangleleft y)} , x ◃ y = ¬ ( ¬ x / y ) {\displaystyle x\triangleleft y=\neg (\neg x/y)} , with the residuation axioms in the residuated lattice article reorganized accordingly (replacing z {\displaystyle z} by ¬ z {\displaystyle \neg z} ) to read ( x ▹ z ) ∧ y = 0 ⇔ ( x ∙ y ) ∧ z = 0 ⇔ ( z ◃ y ) ∧ x = 0 {\displaystyle (x\triangleright z)\wedge y=0\ \Leftrightarrow \ (x\bullet y)\wedge z=0\ \Leftrightarrow \ (z\triangleleft y)\wedge x=0} This De Morgan dual reformulation is motivated and discussed in more detail in the section below on conjugacy. Since residuated lattices and Boolean algebras are each definable with finitely many equations, so are residuated Boolean algebras, whence they form a finitely axiomatizable variety. == Examples == Any Boolean algebra, with the monoid multiplication ∙ {\displaystyle \bullet } taken to be conjunction and both residuals taken to be material implication x → y {\displaystyle x\to y} . Of the remaining 15 binary Boolean operations that might be considered in place of conjunction for the monoid multiplication, only five meet the monotonicity requirement, namely 0 , 1 , x , y {\displaystyle 0,1,x,y} and x ∨ y {\displaystyle x\vee y} . Setting y = z = 0 {\displaystyle y=z=0} in the residuation axiom y ≤ x ∖ z ⇔ x ∙ y ≤ z {\displaystyle y\leq x\backslash z\ \Leftrightarrow \ x\bullet y\leq z} , we have 0 ≤ x ∖ 0 ⇔ x ∙ 0 ≤ 0 {\displaystyle 0\leq x\backslash 0\ \Leftrightarrow \ x\bullet 0\leq 0} , which is falsified by taking x = 1 {\displaystyle x=1} when x ∙ y = 1 {\displaystyle x\bullet y=1} , x {\displaystyle x} , or x ∨ y {\displaystyle x\vee y} . The dual argument for z / y {\displaystyle z/y} rules out x ∙ y = y {\displaystyle x\bullet y=y} . This just leaves x ∙ y = 0 {\displaystyle x\bullet y=0} (a constant binary operation independent of x {\displaystyle x} and y {\displaystyle y} ), which satisfies almost all the axioms when the residuals are both taken to be the constant operation x / y = x ∖ y = 1 {\displaystyle x/y=x\backslash y=1} . The axiom it fails is x ∙ I = x = I ∙ x {\displaystyle x\bullet \mathbf {I} =x=\mathbf {I} \bullet x} , for want of a suitable value for I {\displaystyle \mathbf {I} } . Hence conjunction is the only binary Boolean operation making the monoid multiplication that of a residuated Boolean algebra. The power set 2 X 2 {\displaystyle 2^{X^{2}}} made a Boolean algebra as usual with ∩ {\displaystyle \cap } , ∪ {\displaystyle \cup } and complement relative to X 2 {\displaystyle X^{2}} , and made a monoid with relational composition. The monoid unit I {\displaystyle \mathbf {I} } is the identity relation { ( x , x ) | x ∈ X } {\displaystyle \{(x,x)|x\in X\}} . The right residual R ∖ S {\displaystyle R\backslash S} is defined by x ( R ∖ S ) y ⇔ ∀ z ∈ X , z R x ⇒ z S y {\displaystyle x(R\backslash S)y\ \Leftrightarrow \ \forall z\in X,zRx\Rightarrow zSy} . Dually the left residual S / R {\displaystyle S/R} is defined by y ( S / R ) x ⇔ ∀ z ∈ X , x R z ⇒ y S z {\displaystyle y(S/R)x\ \Leftrightarrow \ \forall z\in X,xRz\Rightarrow ySz} . The power set 2 Σ ∗ {\displaystyle 2^{\Sigma ^{}}} made a Boolean algebra as for Example 2, but with language concatenation for the monoid. Here the set Σ {\displaystyle \Sigma } is used as an alphabet while Σ ∗ {\displaystyle \Sigma ^{}} denotes the set of all finite (including empty) words over that alphabet. The concatenation L M {\displaystyle LM} of languages L {\displaystyle L} and M {\displaystyle M} consists of all words u v {\displaystyle uv} such that u ∈ L {\displaystyle u\in L} and v ∈ M {\displaystyle v\in M} . The monoid unit is the language { ε } {\displaystyle \{\varepsilon \}} consisting of just the empty word ε {\displaystyle \varepsilon } . The right residual M ∖ L {\displaystyle M\backslash L} consists of all words w {\displaystyle w} over Σ {\displaystyle \Sigma } such that M w ⊆ L {\displaystyle Mw\subseteq L} . The left residual L / M {\displaystyle L/M} is the same with w M {\displaystyle wM} in place of M w {\displaystyle Mw} . == Conjugacy == The De Morgan duals ▹ {\displaystyle \triangleright } and ◃ {\displaystyle \triangleleft } of residuation arise as follows. Among residuated lattices, Boolean algebras are special by virtue of having a complementation operation ¬ {\displaystyle \neg } . This permits an alternative expression of the three inequalities y ≤ x ∖ z ⇔ x ∙ y ≤ z ⇔ x ≤ z / y {\displaystyle y\leq x\backslash z\ \Leftrightarrow \ x\bullet y\leq z\ \Leftrightarrow \ x\leq z/y} in the axiomatization of the two residuals in terms of disjointness, via the equivalence x ≤ y ⇔ x ∧ ¬ y = 0 {\displaystyle x\leq y\ \Leftrightarrow \ x\wedge \neg y=0} . Abbreviating x ∧ y = 0 {\displaystyle x\wedge y=0} to x # y {\displaystyle x\#y} as the expression of their disjointness, and substituting ¬ z {\displaystyle \neg z} for z {\displaystyle z} in the axioms, they become with a little Boolean manipulation ¬ ( x ∖ ¬ z ) # y ⇔ x ∙ y # z ⇔ ¬ ( ¬ z / y ) # x {\displaystyle \neg (x\backslash \neg z)\#y\ \Leftrightarrow \ x\bullet y\#z\ \Leftrightarrow \ \neg (\neg z/y)\#x} Now ¬ ( x ∖ ¬ z ) {\displaystyle \neg (x\backslash \neg z)} is reminiscent of De Morgan duality, suggesting that x ∖ {\displaystyle x\backslash } be thought of as a unary operation f {\displaystyle f} , defined by f ( y ) = x ∖ y {\displaystyle f(y)=x\backslash y} , that has a De Morgan dual ¬ f ( ¬ y ) {\displaystyle \neg f(\neg y)} , analogous to ∀ x ϕ ( x ) = ¬ ∃ x ¬ ϕ ( x ) {\displaystyle \forall x\phi (x)=\neg \exists x\neg \phi (x)} . Denoting this dual operation as x ▹ {\displaystyle x\triangleright } , we define x ▹ z {\displaystyle x\triangleright z} as ¬ x ∖ ¬ z {\displaystyle \neg x\backslash \neg z} . Similarly we define another operation z ◃ y {\displaystyle z\triangleleft y} as ¬ ( ¬ z / y ) {\displaystyle \neg (\neg z/y)} . By analogy with x ∖ {\displaystyle x\backslash } as the residual operation associated with the operation x ∙ {\displaystyle x\bullet } , we refer to x ▹ {\displaystyle x\triangleright } as the conjugate operation, or simply conjugate, of x ∙ {\displaystyle x\bullet } . Likewise ◃ y {\displaystyle \triangleleft y} is the conjugate of ∙ y {\displaystyle \bullet y} . Unlike residuals, conjugacy is an equivalence relation between operations: if f {\displaystyle f} is the conjugate of g {\displaystyle g} then g {\displaystyle g} is also the conjugate of f {\displaystyle f} , i.e. the conjugate of the conjugate of f {\displaystyle f} is f {\displaystyle f} . Another advantage of conjugacy is that it becomes unnecessary to speak of right and left conjugates, that distinction now being inherited from the difference between x ∙ {\displaystyle x\bullet } and ∙ x {\displaystyle \bullet x} , which have as their respective conjugates x ▹ {\displaystyle x\triangleright } and ◃ x {\displaystyle \triangleleft x} . (But this advantage accrues also to residuals when x ∖ {\displaystyle x\backslash } is taken to be the residual operation to x ∙ {\displaystyle x\bullet } .) All this yields (along with the Boolean algebra and monoid axioms) the following equivalent axiomatization of a residuated Boolean algebra. y # x ▹ z ⇔ x ∙ y # z ⇔ x # z ◃ y {\displaystyle y\#x\triangleright z\ \Leftrightarrow \ x\bullet y\#z\ \Leftrightarrow \ x\#z\triangleleft y} With this signature it remains the case that this axiomatization can be expressed as
Deep learning speech synthesis
Deep learning speech synthesis refers to the application of deep learning models to generate natural-sounding human speech from written text (text-to-speech) or spectrum (vocoder). Deep neural networks are trained using large amounts of recorded speech and, in the case of a text-to-speech system, the associated labels and/or input text. == Formulation == Given an input text or some sequence of linguistic units Y {\displaystyle Y} , the target speech X {\displaystyle X} can be derived by X = arg max P ( X | Y , θ ) {\displaystyle X=\arg \max P(X|Y,\theta )} where θ {\displaystyle \theta } is the set of model parameters. Typically, the input text will first be passed to an acoustic feature generator, then the acoustic features are passed to the neural vocoder. For the acoustic feature generator, the loss function is typically L1 loss (Mean Absolute Error, MAE) or L2 loss (Mean Square Error, MSE). These loss functions impose a constraint that the output acoustic feature distributions must be Gaussian or Laplacian. In practice, since the human voice band ranges from approximately 300 to 4000 Hz, the loss function will be designed to have more penalty on this range: l o s s = α loss human + ( 1 − α ) loss other {\displaystyle loss=\alpha {\text{loss}}_{\text{human}}+(1-\alpha ){\text{loss}}_{\text{other}}} where loss human {\displaystyle {\text{loss}}_{\text{human}}} is the loss from human voice band and α {\displaystyle \alpha } is a scalar, typically around 0.5. The acoustic feature is typically a spectrogram or Mel scale. These features capture the time-frequency relation of the speech signal, and thus are sufficient to generate intelligent outputs. The Mel-frequency cepstrum feature used in the speech recognition task is not suitable for speech synthesis, as it reduces too much information. == History == In September 2016, DeepMind released WaveNet, which demonstrated that deep learning-based models are capable of modeling raw waveforms and generating speech from acoustic features like spectrograms or mel-spectrograms. Although WaveNet was initially considered to be computationally expensive and slow to be used in consumer products at the time, a year after its release, DeepMind unveiled a modified version of WaveNet known as "Parallel WaveNet," a production model 1,000 faster than the original. This was followed by Google AI's Tacotron 2 in 2018, which demonstrated that neural networks could produce highly natural speech synthesis but required substantial training data—typically tens of hours of audio—to achieve acceptable quality. Tacotron 2 used an autoencoder architecture with attention mechanisms to convert input text into mel-spectrograms, which were then converted to waveforms using a separate neural vocoder. When trained on smaller datasets, such as 2 hours of speech, the output quality degraded while still being able to maintain intelligible speech, and with just 24 minutes of training data, Tacotron 2 failed to produce intelligible speech. In 2019, Microsoft Research introduced FastSpeech, which addressed speed limitations in autoregressive models like Tacotron 2. FastSpeech utilized a non-autoregressive architecture that enabled parallel sequence generation, significantly reducing inference time while maintaining audio quality. Its feedforward transformer network with length regulation allowed for one-shot prediction of the full mel-spectrogram sequence, avoiding the sequential dependencies that bottlenecked previous approaches. The same year saw the release of HiFi-GAN, a generative adversarial network (GAN)-based vocoder that improved the efficiency of waveform generation while producing high-fidelity speech. In 2020, the release of Glow-TTS introduced a flow-based approach that allowed for fast inference and voice style transfer capabilities. In March 2020, the free text-to-speech website 15.ai was launched. 15.ai gained widespread international attention in early 2021 for its ability to synthesize emotionally expressive speech of fictional characters from popular media with minimal amount of data. The creator of 15.ai (known pseudonymously as 15) stated that 15 seconds of training data is sufficient to perfectly clone a person's voice (hence its name, "15.ai"), a significant reduction from the previously known data requirement of tens of hours. 15.ai is credited as the first platform to popularize AI voice cloning in memes and content creation. 15.ai used a multi-speaker model that enabled simultaneous training of multiple voices and emotions, implemented sentiment analysis using DeepMoji, and supported precise pronunciation control via ARPABET. The 15-second data efficiency benchmark was later corroborated by OpenAI in 2024. == Semi-supervised learning == Currently, self-supervised learning has gained much attention through better use of unlabelled data. Research has shown that, with the aid of self-supervised loss, the need for paired data decreases. == Zero-shot speaker adaptation == Zero-shot speaker adaptation is promising because a single model can generate speech with various speaker styles and characteristic. In June 2018, Google proposed to use pre-trained speaker verification models as speaker encoders to extract speaker embeddings. The speaker encoders then become part of the neural text-to-speech models, so that it can determine the style and characteristics of the output speech. This procedure has shown the community that it is possible to use only a single model to generate speech with multiple styles. == Neural vocoder == In deep learning-based speech synthesis, neural vocoders play an important role in generating high-quality speech from acoustic features. The WaveNet model proposed in 2016 achieves excellent performance on speech quality. Wavenet factorised the joint probability of a waveform x = { x 1 , . . . , x T } {\displaystyle \mathbf {x} =\{x_{1},...,x_{T}\}} as a product of conditional probabilities as follows p θ ( x ) = ∏ t = 1 T p ( x t | x 1 , . . . , x t − 1 ) {\displaystyle p_{\theta }(\mathbf {x} )=\prod _{t=1}^{T}p(x_{t}|x_{1},...,x_{t-1})} where θ {\displaystyle \theta } is the model parameter including many dilated convolution layers. Thus, each audio sample x t {\displaystyle x_{t}} is conditioned on the samples at all previous timesteps. However, the auto-regressive nature of WaveNet makes the inference process dramatically slow. To solve this problem, Parallel WaveNet was proposed. Parallel WaveNet is an inverse autoregressive flow-based model which is trained by knowledge distillation with a pre-trained teacher WaveNet model. Since such inverse autoregressive flow-based models are non-auto-regressive when performing inference, the inference speed is faster than real-time. Meanwhile, Nvidia proposed a flow-based WaveGlow model, which can also generate speech faster than real-time. However, despite the high inference speed, parallel WaveNet has the limitation of needing a pre-trained WaveNet model, so that WaveGlow takes many weeks to converge with limited computing devices. This issue has been solved by Parallel WaveGAN, which learns to produce speech through multi-resolution spectral loss and GAN learning strategies.
A Very Fatal Murder
A Very Fatal Murder is a podcast produced by the satirical publication The Onion. A parody of true crime podcasts, A Very Fatal Murder is hosted by fictional New York City reporter David Pascall, who travels to the small town Bluff Springs, Nebraska to investigate the murder of prom queen Hayley Price. Pascall is voiced by David Sidorov, who also wrote for the podcast. The podcast premiered on January 23, 2018, and consists of 7 episodes. Season 2 was released in its entirety on May 11, 2019. == Production == A Very Fatal Murder satirizes popular true crime podcasts such as Serial, S-Town, and My Favorite Murder. According to head writer Katy Yeiser, the podcast is not meant as a take down of any particular podcast, but rather an ode to the genre. == Synopsis == The podcast follows fictional investigative reporter David Pascall (voiced by David Sidorov) who is searching for the perfect murder to create an award-winning podcast about. He is assisted by ETHL (the Extremely Timely Homicide Locator), an MIT-created computer programmed to find "the most interesting, violent, culturally relevant murder cases in America". == Episodes == === Season 1 === === Season 2 === == Reception == The podcast received mostly positive reviews, and was largely praised for attacking true-crime tropes such as the "hot dead girl" and the romanticization of small-town America. === Awards ===
Toggl Track
Toggl Track (formerly Toggl) is a time tracking software developed by Toggl OÜ which is headquartered in Tallinn, Estonia. The company offers online time tracking and reporting services through their website along with mobile and desktop applications. Time can be tracked through a start/stop button, manual entry, or dragging and resizing time blocks in a calendar view. == History == According to Alari Aho, Toggl's CEO and founder, the application has been fully self-funded from the start. The name was created using a random name generator.
Model collapse
Model collapse, also known by other names such as "AI inbreeding", "AI cannibalism", "Habsburg AI", and "model autophagy disorder" or "MAD" is a phenomenon noted in artificial intelligence studies, where machine learning models gradually degrade due to errors coming from uncurated synthetic data, or due to training on the outputs of another model such as prior versions of itself. It is unclear to what extent the phenomenon threatens the long-term development of such models, and some techniques have been proposed to mitigate the effect. == Characteristics == Shumailov et al. coined the term to describe two specific stages to the degradation of machine learning models: early model collapse and late model collapse: In early model collapse, the model begins losing information about the tails of the distribution – mostly affecting minority data. Later work highlighted that early model collapse is hard to notice, since overall performance may appear to improve, while the model loses performance on minority data. In late model collapse, the model loses a significant proportion of its performance, confusing concepts and losing most of its variance. == Mechanism == Using synthetic data as training data can lead to issues with the quality and reliability of the trained model. Model collapse occurs for three main reasons: functional approximation errors sampling errors learning errors Importantly, it happens in even the simplest of models, where not all of the error sources are present. In more complex models the errors often compound, leading to faster collapse. == Disagreement over real-world impact == Some researchers and commentators on model collapse warn that the phenomenon could fundamentally threaten future generative AI development: As AI-generated data is shared on the Internet, it will inevitably end up in future training datasets, which are often crawled from the Internet. If training on "slop" (large quantities of unlabeled synthetic data) inevitably leads to model collapse, this could therefore pose a difficult problem. However, recently, other researchers have disagreed with this argument, showing that if synthetic data accumulates alongside human-generated data, model collapse is avoided. The researchers argue that data accumulating over time is a more realistic description of reality than deleting all existing data every year, and that the real-world impact of model collapse may not be as catastrophic as feared. An alternative branch of the literature investigates the use of machine learning detectors and watermarking to identify model generated data and filter it out. == Mathematical models of the phenomenon == === 1D Gaussian model === In 2024, a first attempt has been made at illustrating collapse for the simplest possible model — a single dimensional normal distribution fit using unbiased estimators of mean and variance, computed on samples from the previous generation. To make this more precise, we say that original data follows a normal distribution X 0 ∼ N ( μ , σ 2 ) {\displaystyle X^{0}\sim {\mathcal {N}}(\mu ,\sigma ^{2})} , and we possess M 0 {\displaystyle M_{0}} samples X j 0 {\displaystyle X_{j}^{0}} for j ∈ { 1 , … , M 0 } {\displaystyle j\in {\{\,1,\dots ,M_{0}\,{}\}}} . Denoting a general sample X j i {\displaystyle X_{j}^{i}} as sample j ∈ { 1 , … , M i } {\displaystyle j\in {\{\,1,\dots ,M_{i}\,{}\}}} at generation i {\displaystyle i} , then the next generation model is estimated using the sample mean and variance: μ i + 1 = 1 M i ∑ j X j i ; σ i + 1 2 = 1 M i − 1 ∑ j ( X j i − μ i + 1 ) 2 . {\displaystyle \mu _{i+1}={\frac {1}{M_{i}}}\sum _{j}X_{j}^{i};\quad \sigma _{i+1}^{2}={\frac {1}{M_{i}-1}}\sum _{j}(X_{j}^{i}-\mu _{i+1})^{2}.} Leading to a conditionally normal next generation model X j i + 1 | μ i + 1 , σ i + 1 ∼ N ( μ i + 1 , σ i + 1 2 ) {\displaystyle X_{j}^{i+1}|\mu _{i+1},\;\sigma _{i+1}\sim {\mathcal {N}}(\mu _{i+1},\sigma _{i+1}^{2})} . In theory, this is enough to calculate the full distribution of X j i {\displaystyle X_{j}^{i}} . However, even after the first generation, the full distribution is no longer normal: It follows a variance-gamma distribution. To continue the analysis, instead of writing the probability density function at each generation, it is possible to explicitly construct them in terms of independent random variables using Cochran's theorem. To be precise, μ 1 {\displaystyle \mu _{1}} and σ 1 {\displaystyle \sigma _{1}} are independent, with μ 1 ∼ N ( μ , σ 2 M 0 ) {\displaystyle \mu _{1}\sim {\mathcal {N}}\left(\mu ,{\frac {\sigma ^{2}}{M_{0}}}\right)} and ( M 0 − 1 ) σ 1 2 ∼ σ 2 Γ ( M 0 − 1 2 , 1 2 ) {\displaystyle (M_{0}-1)\,\sigma _{1}^{2}\sim \sigma ^{2}\,\Gamma \left({\frac {M_{0}-1}{2}},{\frac {1}{2}}\right)} , following a Gamma distribution. Denoting with Z {\displaystyle Z} Gaussian random variables distributed according to N ( 0 , 1 ) {\displaystyle {\mathcal {N}}(0,1)} and with S i {\displaystyle S^{i}} random variables distributed with 1 M i − 1 − 1 Γ ( M i − 1 − 1 2 , 1 2 ) {\displaystyle {\frac {1}{M_{i-1}-1}}\Gamma \left({\frac {M_{i-1}-1}{2}},{\frac {1}{2}}\right)} , it turns out to be possible to write samples at each generation as X j 0 = μ + σ Z j 0 , {\textstyle X_{j}^{0}=\mu +\sigma Z_{j}^{0},} X j 1 = μ + σ M 0 Z 1 + σ S 1 Z j 1 , {\textstyle X_{j}^{1}=\mu +{\frac {\sigma }{\sqrt {M_{0}}}}Z^{1}+\sigma {\sqrt {S^{1}}}Z_{j}^{1},} and more generally X j n = μ + σ M 0 Z 1 + σ M 1 S 1 Z 2 + ⋯ + σ M n − 1 S 1 × ⋯ × S n − 1 Z n + σ S 1 × ⋯ × S n Z j n . {\displaystyle X_{j}^{n}=\mu +{\frac {\sigma }{\sqrt {M_{0}}}}Z^{1}+{\frac {\sigma }{\sqrt {M_{1}}}}{\sqrt {S^{1}}}Z^{2}+\dots +{\frac {\sigma }{\sqrt {M_{n-1}}}}{\sqrt {S^{1}\times \dots \times S^{n-1}}}Z^{n}+\sigma {\sqrt {S^{1}\times \dots \times S^{n}}}Z_{j}^{n}.} Note, that these are not joint distributions, as Z n {\displaystyle Z^{n}} and S n {\displaystyle S^{n}} depend directly on Z j n − 1 {\displaystyle Z_{j}^{n-1}} , but when considering X j n {\displaystyle X_{j}^{n}} on its own the formula above provides all the information about the full distribution. To analyse the model collapse, we can first calculate variance and mean of samples at generation n {\displaystyle n} . This would tell us what kind of distributions we expect to arrive at after n {\displaystyle n} generations. It is possible to find its exact value in closed form, but the mean and variance of the square root of gamma distribution are expressed in terms of gamma functions, making the result quite clunky. Following, it is possible to expand all results to second order in each of 1 / M i {\displaystyle 1/M_{i}} , assuming each sample size to be large. It is then possible to show that 1 σ 2 Var ( X j n ) = 1 M 0 + 1 M 1 + ⋯ + 1 M n − 1 + 1 + O ( M i − 2 ) . {\displaystyle {\frac {1}{\sigma ^{2}}}\operatorname {Var} (X_{j}^{n})={\frac {1}{M_{0}}}+{\frac {1}{M_{1}}}+\dots +{\frac {1}{M_{n-1}}}+1+{\mathcal {O}}\left(M_{i}^{-2}\right).} And if all sample sizes M i = M {\displaystyle M_{i}=M} are constant, this diverges linearly as n → ∞ {\displaystyle n\to \infty } : Var ( X j n ) = σ 2 ( 1 + n M ) ; E ( X j n ) = μ . {\displaystyle \operatorname {Var} (X_{j}^{n})=\sigma ^{2}\left(1+{\frac {n}{M}}\right);\quad \mathbb {E} (X_{j}^{n})=\mu .} This is the same scaling as for a single dimensional Gaussian random walk. However, divergence of the variance of X j n {\displaystyle X_{j}^{n}} does not directly provide any information about the corresponding estimates of μ n + 1 {\displaystyle \mu _{n+1}} and σ n + 1 {\displaystyle \sigma _{n+1}} , particularly how different they are from the original μ {\displaystyle \mu } and σ {\displaystyle \sigma } . It turns out to be possible to calculate the distance between the true distribution and the approximated distribution at step n + 1 {\displaystyle n+1} , using the Wasserstein-2 distance (which is also sometimes referred to as risk): E [ W 2 2 ( N ( μ , σ 2 ) , N ( μ n + 1 , σ n + 1 2 ) ) ] = 3 2 σ 2 ( 1 M 0 + 1 M 1 + ⋯ + 1 M n ) + O ( M i − 2 ) , {\displaystyle \mathbb {E} \left[\mathbb {W} _{2}^{2}\left({\mathcal {N}}(\mu ,\sigma ^{2}),{\mathcal {N}}(\mu _{n+1},\sigma _{n+1}^{2})\right)\right]={\frac {3}{2}}\sigma ^{2}\left({\frac {1}{M_{0}}}+{\frac {1}{M_{1}}}+\dots +{\frac {1}{M_{n}}}\right)+{\mathcal {O}}\left(M_{i}^{-2}\right),} Var [ W 2 2 ( N ( μ , σ 2 ) , N ( μ n + 1 , σ n + 1 2 ) ) ] = 1 2 σ 4 ( 3 M 0 2 + 3 M 1 2 + ⋯ + 3 M n 2 + ∑ i ≠ j 4 M i M j ) + O ( M i − 3 ) . {\displaystyle \operatorname {Var} \left[\mathbb {W} _{2}^{2}\left({\mathcal {N}}(\mu ,\sigma ^{2}),{\mathcal {N}}(\mu _{n+1},\sigma _{n+1}^{2})\right)\right]={\frac {1}{2}}\sigma ^{4}\left({\frac {3}{M_{0}^{2}}}+{\frac {3}{M_{1}^{2}}}+\dots +{\frac {3}{M_{n}^{2}}}+\sum _{i\neq j}{\frac {4}{M_{i}M_{j}}}\right)+{\mathcal {O}}\left(M_{i}^{-3}\right).} This directly shows why model collapse occurs in this simple model. Due to errors from re-sampling the approximated distribution, each generation ends up corresponding to a