GeForce RTX 50 series

The GeForce RTX 50 series of consumer graphics cards is the successor of Nvidia's GeForce 40 series. Announced at CES 2025, it debuted with the release of the RTX 5070, RTX 5080 and RTX 5090 in January 2025. It is based on Nvidia's Blackwell architecture featuring Nvidia RTX's fourth-generation RT cores for hardware-accelerated real-time ray tracing, and fifth-generation deep learning–focused Tensor Cores. The GPUs are manufactured by TSMC on a custom 4N process node. == Background == In March 2024, Nvidia announced the Blackwell architecture for its datacenter products. Like Ampere, the architecture is shared by consumer and datacenter products rather than having distinct architectures released simultaneously like Ada Lovelace for consumers and Hopper for datacenter. At the Game Awards in December 2024, a cinematic trailer for The Witcher IV was shown that had been pre-rendered on an "unannounced Nvidia GeForce RTX GPU". This was assumed to be an upcoming GeForce RTX 50 series GPU. Following the RTX 50 series announcement, Nvidia confirmed that the trailer was "pre-rendered in Unreal Engine 5 on a GeForce RTX 5090". Later in the same month, it was reported that Nvidia had begun stockpiling GeForce RTX 50 series units in U.S. warehouses due to a threatened 10% import tariff and 60% tariff on Chinese imports that Donald Trump promised in his re-election campaign. === Announcement === On January 6, 2025, the GeForce RTX 50 series was officially announced for desktop and mobile devices during Nvidia's CES keynote in Las Vegas. The pricing announcement was met with surprise as the RTX 5080 at $999 was the same price that the RTX 4080 Super released at a year earlier despite the anticipated price increases. Nvidia CEO Jensen Huang falsely claimed that the RTX 5070 could reach "RTX 4090 performance at $549", a figure that relies on the use of DLSS 4 upscaling and Multi Frame generation, and is not an indication of raw performance. == Features == === Blackwell architecture === The GeForce RTX 50 series is powered by the Blackwell microarchitecture, which continues Ada Lovelace's emphasis on high graphics frequencies and large L2 caches. The Blackwell architecture introduces Nvidia RTX's fourth-generation RT cores for hardware-accelerated real-time ray tracing and fifth-generation Tensor Cores for AI compute and performing floating-point calculations. === GDDR7 === RTX 50 series GPUs are the first consumer GPUs to feature GDDR7 video memory for greater memory bandwidth over the same bus width compared to the GDDR6 and GDDR6X memory used in the GeForce 40 series. RTX 50 series desktop GPUs use GDDR7 modules from Samsung due to them being available for validation earlier than modules from SK Hynix and Micron. === 12V-2×6 connector === The GeForce RTX 50 series uses the 16-pin 12V-2×6 connector, which is a revision of the 12VHPWR connector featured on the GeForce 40 series. There were problems with the 12VHPWR connector melting on some RTX 4090 GPUs due to the connector not being fully seated and connector design flaws that did not implement a high enough safety and error tolerance. The 12V-2×6 connector revision, published by PCI-SIG in July 2023, addressed this by shortening the four sense pins so the connector will not push any power if it has not been fully seated. The 12VHPWR design would still draw up to 150W of power even if the sense pins were not making full contact. 12V-2×6 is backwards compatible with existing 12VHPWR cables and adapters. Nvidia has mandated to its AIB partners that the 16-pin 12V-2×6 connector be used on all RTX 50 series designs. With the GeForce 40 series, the 12VHPWR connector was only mandated on higher power cards such as the RTX 4070 Super, RTX 4070 Ti, RTX 4070 Ti Super, RTX 4080, RTX 4080 Super and RTX 4090 while RTX 4060, RTX 4060 Ti and RTX 4070 AIB designs had the option of using 8-pin PCIe connectors. The 600W-capable 12VHPWR connector would not have been necessary on sub-200W cards. === DLSS 4 === The fourth generation of Deep Learning Super Sampling (DLSS) was unveiled alongside the RTX 50 series. DLSS 4 upscaling uses a new vision transformer-based model for enhanced image quality with reduced ghosting and greater image stability in motion compared to the previous convolutional neural network (CNN) model. DLSS 4 also allows a greater number of frames to be generated and interpolated based on a single traditionally rendered frame. This form of frame generation called Multi Frame Generation is exclusive to the RTX 50 series while the GeForce 40 series is limited to one interpolated frame per traditionally rendered frame. Nvidia claims that DLSS 4's frame generation model uses 30% less video memory with the example of Warhammer 40,000: Darktide using 400 MB less memory at 4K resolution with frame generation enabled. Nvidia claims that 75 titles will integrate DLSS 4 Multi Frame Generation at launch, including Alan Wake 2, Cyberpunk 2077, Indiana Jones and the Great Circle, and Star Wars Outlaws. === Media Engine and I/O === The RTX 50 series includes DisplayPort 2.1b UHBR20 (80Gbps) with higher display output data rates to support high resolution and high refresh rate displays. The GeForce 40 series received criticism for only including DisplayPort 1.4a (32Gbps) while the competing Radeon RX 7000 series included DisplayPort 2.1 UHBR13.5 (54Gbps). At CES 2025, VESA announced a collaboration with Nvidia on the new DP80LL ("low loss") UHBR20 active cable standard. DP80LL allows for 80Gbps DisplayPort 2.1 cables up to 3 meters long as passive DP80 cables are limited in length due to signal integrity concerns. The RTX 50 series introduces the ninth-generation NVENC encoder and sixth-generation NVDEC video decoder. For the first time in a consumer GeForce GPU, encoding and decoding video in the 4:2:2 color format for professional-grade higher color depth is supported. == List of GPUs == === Desktop === GeForce RTX 50 series desktop GPUs are the second consumer GPUs to utilize a PCIe 5.0 interface and the first to feature GDDR7 video memory (except for the entry level RTX 5050 that still uses GDDR6). They are fabricated by TSMC using a custom 5 nm process dubbed 4N. === Mobile === Laptops featuring GeForce RTX 50 series laptop GPUs were shown at CES 2025. Laptops with RTX 50 series GPUs were paired with Intel's Arrow Lake-HX and AMD's Strix Point and Fire Range CPUs. Nvidia claims that Blackwell architecture's new Max-Q features can increase battery life by up to 40% over GeForce 40 series laptops. For example, Advanced Power Gating saves power by turning off areas of the GPU that are unused and the paired GDDR7 memory can run in an "ultra" low-voltage state. Initial RTX 50 series laptops will become available in March 2025 starting at $1,299. == Controversies == === 12V-2x6 power connector issue === The 12V-2x6 connector used by multiple 5090 cards faces criticism due to a design flaw that can potentially cause the connector to melt. The flaw primarily affect Nvidia's own RTX 5090 FE and RTX 5080 FE cards and are similar to the failures seen on the RTX 40 series but models by third party OEMs have been affected as well. === Availability and pricing === The releases of the RTX 5090, 5080 and 5070 Ti were marked by severe availability issues and pricing well above MSRP. Pricing became an issue again at the end of 2025 due to an ongoing memory supply shortage. Nvidia has been rumored to cut production of 16GB VRAM cards, affecting the availability of the RTX 5060 Ti 16GB and RTX 5070 Ti SKUs. === 32-bit support removal for CUDA, OpenCL, and GPU PhysX === Support for 32-bit OpenCL, and CUDA applications (and as a result 32-bit GPU-accelerated PhysX), was dropped for the GeForce RTX 50 series, which resulted in several applications encountering performance issues with GPU PhysX options or not being able to run at all, causing negative reactions from numerous gaming communities. On December 4, 2025, with the release of driver version 591.44, 32-bit GPU-accelerated PhysX support was restored for certain games. Support for more games was promised in the future. === Incomplete dies and missing ROPs === The dies of certain RTX 5090/5090D, 5080, and 5070 Ti cards were missing eight render output units (ROPs), resulting in slower graphics while pure compute and AI workloads are unaffected. Nvidia claimed that less than 0.5% of cards are affected and that the "production anomaly" has been rectified. === Black screen issues === Some RTX 5080 and 5090 users reported an issue where the system would boot into a black screen after installing Nvidia drivers. Nvidia confirmed the issue and said that a new driver update would fix it for people who hadn't received a VBIOS update yet. Released on February 27, 2025 Nvidia drivers version 572.60 claim to have fixed the issue. Nvidia has since released multiple hotfix and Game Ready drivers that contain additional fixes for the issue. === Windows driver branch quality and stabilit

Inverse depth parametrization

In computer vision, the inverse depth parametrization is a parametrization used in methods for 3D reconstruction from multiple images such as simultaneous localization and mapping (SLAM). Given a point p {\displaystyle \mathbf {p} } in 3D space observed by a monocular pinhole camera from multiple views, the inverse depth parametrization of the point's position is a 6D vector that encodes the optical centre of the camera c 0 {\displaystyle \mathbf {c} _{0}} when in first observed the point, and the position of the point along the ray passing through p {\displaystyle \mathbf {p} } and c 0 {\displaystyle \mathbf {c} _{0}} . Inverse depth parametrization generally improves numerical stability and allows to represent points with zero parallax. Moreover, the error associated to the observation of the point's position can be modelled with a Gaussian distribution when expressed in inverse depth. This is an important property required to apply methods, such as Kalman filters, that assume normality of the measurement error distribution. The major drawback is the larger memory consumption, since the dimensionality of the point's representation is doubled. == Definition == Given 3D point p = ( x , y , z ) {\displaystyle \mathbf {p} =(x,y,z)} with world coordinates in a reference frame ( e 1 , e 2 , e 3 ) {\displaystyle (e_{1},e_{2},e_{3})} , observed from different views, the inverse depth parametrization y {\displaystyle \mathbf {y} } of p {\displaystyle \mathbf {p} } is given by: y = ( x 0 , y 0 , z 0 , θ , ϕ , ρ ) {\displaystyle \mathbf {y} =(x_{0},y_{0},z_{0},\theta ,\phi ,\rho )} where the first five components encode the camera pose in the first observation of the point, being c 0 = ( x 0 , y 0 , z 0 ) {\displaystyle \mathbf {c_{0}} =(x_{0},y_{0},z_{0})} the optical centre, ϕ {\displaystyle \phi } the azimuth, θ {\displaystyle \theta } the elevation angle, and ρ = 1 ‖ p − c 0 ‖ {\displaystyle \rho ={\frac {1}{\left\Vert \mathbf {p} -\mathbf {c} _{0}\right\Vert }}} the inverse depth of p {\displaystyle p} at the first observation.

Language Computer Corporation

Language Computer Corporation (LCC) is a natural language processing research company based in Richardson, Texas. The company develops a variety of natural language processing products, including software for question answering, information extraction, and automatic summarization. Since its founding in 1995, the low-profile company has landed significant United States Government contracts, with $8,353,476 in contracts in 2006-2008. While the company has focused primarily on the government software market, LCC has also used its technology to spin off three start-up companies. The first spin-off, known as Lymba Corporation, markets the PowerAnswer question answering product originally developed at LCC. In 2010, LCC's CEO, Andrew Hickl, co-founded two start-ups which made use of the company's technology. These included Swingly, an automatic question answering start-up, and Extractiv, an information extraction service that was founded in partnership with Houston, Texas-based 80legs.

WaveNet

WaveNet is a deep neural network for generating raw audio. It was created by researchers at London-based AI firm DeepMind. The technique, outlined in a paper in September 2016, is able to generate relatively realistic-sounding human-like voices by directly modelling waveforms using a neural network method trained with recordings of real speech. Tests with US English and Mandarin reportedly showed that the system outperforms Google's best existing text-to-speech (TTS) systems, although as of 2016 its text-to-speech synthesis still was less convincing than actual human speech. WaveNet's ability to generate raw waveforms means that it can model any kind of audio, including music. == History == Generating speech from text is an increasingly common task thanks to the popularity of software such as Apple's Siri, Microsoft's Cortana, Amazon Alexa and the Google Assistant. Most such systems use a variation of a technique that involves concatenated sound fragments together to form recognisable sounds and words. The most common of these is called concatenative TTS. It consists of large library of speech fragments, recorded from a single speaker that are then concatenated to produce complete words and sounds. The result sounds unnatural, with an odd cadence and tone. The reliance on a recorded library also makes it difficult to modify or change the voice. Another technique, known as parametric TTS, uses mathematical models to recreate sounds that are then assembled into words and sentences. The information required to generate the sounds is stored in the parameters of the model. The characteristics of the output speech are controlled via the inputs to the model, while the speech is typically created using a voice synthesiser known as a vocoder. This can also result in unnatural sounding audio. == Design and ongoing research == === Background === WaveNet is a type of feedforward neural network known as a deep convolutional neural network (CNN). In WaveNet, the CNN takes a raw signal as an input and synthesises an output one sample at a time. It does so by sampling from a softmax (i.e. categorical) distribution of a signal value that is encoded using μ-law companding transformation and quantized to 256 possible values. === Initial concept and results === According to the original September 2016 DeepMind research paper WaveNet: A Generative Model for Raw Audio, the network was fed real waveforms of speech in English and Mandarin. As these pass through the network, it learns a set of rules to describe how the audio waveform evolves over time. The trained network can then be used to create new speech-like waveforms at 16,000 samples per second. These waveforms include realistic breaths and lip smacks – but do not conform to any language. WaveNet is able to accurately model different voices, with the accent and tone of the input correlating with the output. For example, if it is trained with German, it produces German speech. The capability also means that if the WaveNet is fed other inputs – such as music – its output will be musical. At the time of its release, DeepMind showed that WaveNet could produce waveforms that sound like classical music. === Content (voice) swapping === According to the June 2018 paper Disentangled Sequential Autoencoder, DeepMind has successfully used WaveNet for audio and voice "content swapping": the network can swap the voice on an audio recording for another, pre-existing voice while maintaining the text and other features from the original recording. "We also experiment on audio sequence data. Our disentangled representation allows us to convert speaker identities into each other while conditioning on the content of the speech." (p. 5) "For audio, this allows us to convert a male speaker into a female speaker and vice versa [...]." (p. 1) According to the paper, a two-digit minimum amount of hours (c. 50 hours) of pre-existing speech recordings of both source and target voice are required to be fed into WaveNet for the program to learn their individual features before it is able to perform the conversion from one voice to another at a satisfying quality. The authors stress that "[a]n advantage of the model is that it separates dynamical from static features [...]." (p. 8), i. e. WaveNet is capable of distinguishing between the spoken text and modes of delivery (modulation, speed, pitch, mood, etc.) to maintain during the conversion from one voice to another on the one hand, and the basic features of both source and target voices that it is required to swap on the other. The January 2019 follow-up paper Unsupervised speech representation learning using WaveNet autoencoders details a method to successfully enhance the proper automatic recognition and discrimination between dynamical and static features for "content swapping", notably including swapping voices on existing audio recordings, in order to make it more reliable. Another follow-up paper, Sample Efficient Adaptive Text-to-Speech, dated September 2018 (latest revision January 2019), states that DeepMind has successfully reduced the minimum amount of real-life recordings required to sample an existing voice via WaveNet to "merely a few minutes of audio data" while maintaining high-quality results. Its ability to clone voices has raised ethical concerns about WaveNet's ability to mimic the voices of living and dead persons. According to a 2016 BBC article, companies working on similar voice-cloning technologies (such as Adobe Voco) intend to insert watermarking inaudible to humans to prevent counterfeiting, while maintaining that voice cloning satisfying, for instance, the needs of entertainment-industry purposes would be of a far lower complexity and use different methods than required to fool forensic evidencing methods and electronic ID devices, so that natural voices and voices cloned for entertainment-industry purposes could still be easily told apart by technological analysis. == Applications == At the time of its release, DeepMind said that WaveNet required too much computational processing power to be used in real world applications. As of October 2017, Google announced a 1,000-fold performance improvement along with better voice quality. WaveNet was then used to generate Google Assistant voices for US English and Japanese across all Google platforms. In November 2017, DeepMind researchers released a research paper detailing a proposed method of "generating high-fidelity speech samples at more than 20 times faster than real-time", called "Probability Density Distillation". At the annual I/O developer conference in May 2018, it was announced that new Google Assistant voices were available and made possible by WaveNet; WaveNet greatly reduced the number of audio recordings that were required to create a voice model by modeling the raw audio of the voice actor samples.

Convolutional layer

In artificial neural networks, a convolutional layer is a type of network layer that applies a convolution operation to the input. Convolutional layers are some of the primary building blocks of convolutional neural networks (CNNs), a class of neural network most commonly applied to images, video, audio, and other data that have the property of uniform translational symmetry. The convolution operation in a convolutional layer involves sliding a small window (called a kernel or filter) across the input data and computing the dot product between the values in the kernel and the input at each position. This process creates a feature map that represents detected features in the input. == Concepts == === Kernel === Kernels, also known as filters, are small matrices of weights that are learned during the training process. Each kernel is responsible for detecting a specific feature in the input data. The size of the kernel is a hyperparameter that affects the network's behavior. === Convolution === For a 2D input x {\displaystyle x} and a 2D kernel w {\displaystyle w} , the 2D convolution operation can be expressed as: y [ i , j ] = ∑ m = 0 k h − 1 ∑ n = 0 k w − 1 x [ i + m , j + n ] ⋅ w [ m , n ] {\displaystyle y[i,j]=\sum _{m=0}^{k_{h}-1}\sum _{n=0}^{k_{w}-1}x[i+m,j+n]\cdot w[m,n]} where k h {\displaystyle k_{h}} and k w {\displaystyle k_{w}} are the height and width of the kernel, respectively. This generalizes immediately to nD convolutions. Commonly used convolutions are 1D (for audio and text), 2D (for images), and 3D (for spatial objects, and videos). === Stride === Stride determines how the kernel moves across the input data. A stride of 1 means the kernel shifts by one pixel at a time, while a larger stride (e.g., 2 or 3) results in less overlap between convolutions and produces smaller output feature maps. === Padding === Padding involves adding extra pixels around the edges of the input data. It serves two main purposes: Preserving spatial dimensions: Without padding, each convolution reduces the size of the feature map. Handling border pixels: Padding ensures that border pixels are given equal importance in the convolution process. Common padding strategies include: No padding/valid padding. This strategy typically causes the output to shrink. Same padding: Any method that ensures the output size same as input size is a same padding strategy. Full padding: Any method that ensures each input entry is convolved over for the same number of times is a full padding strategy. Common padding algorithms include: Zero padding: Add zero entries to the borders of input. Mirror/reflect/symmetric padding: Reflect the input array on the border. Circular padding: Cycle the input array back to the opposite border, like a torus. The exact numbers used in convolutions is complicated, for which we refer to (Dumoulin and Visin, 2018) for details. == Variants == === Standard === The basic form of convolution as described above, where each kernel is applied to the entire input volume. === Depthwise separable === Depthwise separable convolution separates the standard convolution into two steps: depthwise convolution and pointwise convolution. The depthwise separable convolution decomposes a single standard convolution into two convolutions: a depthwise convolution that filters each input channel independently and a pointwise convolution ( 1 × 1 {\displaystyle 1\times 1} convolution) that combines the outputs of the depthwise convolution. This factorization significantly reduces computational cost. It was first developed by Laurent Sifre during an internship at Google Brain in 2013 as an architectural variation on AlexNet to improve convergence speed and model size. === Dilated === Dilated convolution, or atrous convolution, introduces gaps between kernel elements, allowing the network to capture a larger receptive field without increasing the kernel size. === Transposed === Transposed convolution, also known as deconvolution, fractionally strided convolution, and upsampling convolution, is a convolution where the output tensor is larger than its input tensor. It's often used in encoder-decoder architectures for upsampling. It's used in image generation, semantic segmentation, and super-resolution tasks. == History == The concept of convolution in neural networks was inspired by the visual cortex in biological brains. Early work by Hubel and Wiesel in the 1960s on the cat's visual system laid the groundwork for artificial convolution networks. An early convolution neural network was developed by Kunihiko Fukushima in 1969. It had mostly hand-designed kernels inspired by convolutions in mammalian vision. In 1979 he improved it to the Neocognitron, which learns all convolutional kernels by unsupervised learning (in his terminology, "self-organized by 'learning without a teacher'"). During the 1988 to 1998 period, a series of CNN were introduced by Yann LeCun et al., ending with LeNet-5 in 1998. It was an early influential CNN architecture for handwritten digit recognition, trained on the MNIST dataset, and was used in ATM. (Olshausen & Field, 1996) discovered that simple cells in the mammalian primary visual cortex implement localized, oriented, bandpass receptive fields, which could be recreated by fitting sparse linear codes for natural scenes. This was later found to also occur in the lowest-level kernels of trained CNNs. The field saw a resurgence in the 2010s with the development of deeper architectures and the availability of large datasets and powerful GPUs. AlexNet, developed by Alex Krizhevsky et al. in 2012, was a catalytic event in modern deep learning. In that year’s ImageNet competition, the AlexNet model achieved a 16% top-five error rate, significantly outperforming the next best entry, which had a 26% error rate. The network used eight trainable layers, approximately 650,000 neurons, and around 60 million parameters, highlighting the impact of deeper architectures and GPU acceleration on image recognition performance. From the 2013 ImageNet competition, most entries adopted deep convolutional neural networks, building on the success of AlexNet. Over the following years, performance steadily improved, with the top-five error rate falling from 16% in 2012 and 12% in 2013 to below 3% by 2017, as networks grew increasingly deep.

Alias Eclipse

Eclipse was a professional 2D image editing program available on Silicon Graphics and Windows workstations. Designed to manipulate high-resolution images like digitized movie frames and photographs for print, it offered color correction tools, image processing effects, rudimentary paint features, and spline-based drawing and masking. == History == Eclipse was originally developed in the late 1980s by Full Color Computing, an early provider of photo retouch and color prepress software for Silicon Graphics workstations. Alias Research (later Alias Systems Corporation), a developer of professional 3D graphics applications for the SGI platform, purchased the rights to Eclipse in fall 1990. Alias developed Eclipse through the early to mid-1990s, releasing version 2.5 in 1995 with improvements to the speed of color correction, effects, and rendering. Xyvision's Contex Prepress division purchased exclusive rights to Eclipse from Alias in 1996, and released version 3.0 the following year. Eclipse was subsequently sold to German developer Form & Vision GmbH, which continued development and ported it to the Windows platform. In 1999, Form & Vision released a demo of Eclipse 3.1.3 on the SGI platform which was limited to 1600 x 1600 pixel images, then ceased development of Eclipse on the SGI platform. Eclipse was thereafter developed exclusively for the Windows platform, culminating with version 3.1.4 in 2001. In the same year the firm went bankrupt. == Features == Eclipse was designed to work with very large images that could not be manipulated in real time on contemporary computer systems due to memory limitations, and thus allowed the user to make modifications to a lower-resolution copy of the original image in "proxy mode." Brush strokes, color corrections, and other edits were saved in proxy mode, then applied to the full-size image in post processing. This method also allowed for batch processing of a high-resolution image sequence using the edits applied to the original proxy image. Other features included color correction and separation, warping, special effects, text, and shape masking. Wavelet image compression created by LuraTech was added to Eclipse 3.1.4

CMU Pronouncing Dictionary

The CMU Pronouncing Dictionary (also known as CMUdict) is an open-source pronouncing dictionary originally created by the Speech Group at Carnegie Mellon University (CMU) for use in speech recognition research. CMUdict provides a mapping orthographic/phonetic for English words in their North American pronunciations. It is commonly used to generate representations for speech recognition (ASR), e.g. the CMU Sphinx system, and speech synthesis (TTS), e.g. the Festival system. CMUdict can be used as a training corpus for building statistical grapheme-to-phoneme (g2p) models that will generate pronunciations for words not yet included in the dictionary. The most recent release is 0.7b; it contains over 134,000 entries. An interactive lookup version is available. == Database format == The database is distributed as a plain text file with one entry to a line in the format "WORD " with a two-space separator between the parts. If multiple pronunciations are available for a word, variants are identified using numbered versions (e.g. WORD(1)). The pronunciation is encoded using a modified form of the ARPABET system, with the addition of stress marks on vowels of levels 0, 1, and 2. A line-initial ;;; token indicates a comment. A derived format, directly suitable for speech recognition engines is also available as part of the distribution; this format collapses stress distinctions (typically not used in ASR). The following is a table of phonemes used by CMU Pronouncing Dictionary. == History == == Applications == The Unifon converter is based on the CMU Pronouncing Dictionary. The Natural Language Toolkit contains an interface to the CMU Pronouncing Dictionary. The Carnegie Mellon Logios tool incorporates the CMU Pronouncing Dictionary. PronunDict, a pronunciation dictionary of American English, uses the CMU Pronouncing Dictionary as its data source. Pronunciation is transcribed in IPA symbols. This dictionary also supports searching by pronunciation. Some singing voice synthesizer software like CeVIO Creative Studio and Synthesizer V uses modified version of CMU Pronouncing Dictionary for synthesizing English singing voices. Transcriber, a tool for the full text phonetic transcription, uses the CMU Pronouncing Dictionary 15.ai, a real-time text-to-speech tool using artificial intelligence, uses the CMU Pronouncing Dictionary