Electronics is a peer-reviewed, scientific journal that covers the study of electronics, including the design, development, and application of electronic devices, systems, and circuits. The journal is published by MDPI and was established in 2012. The editor-in-chief is Flavio Canavero 'Politecnico di Torino). The journal covers a wide range of topics related to electronics, including: electronic devices, electronic materials, electronic circuits, electronic systems, communication electronics, power electronics, and biomedical electronics. The journal also includes articles on the application of electronics in various fields, such as consumer electronics, industrial electronics, automotive electronics, and military electronics. The journal publishes original research articles, review articles, and short communications. == Abstracting and indexing == EBSCO databases ProQuest databases Scopus According to the Journal Citation Reports, the journal has a 2021 impact factor of 2.690.
Sample (graphics)
In computer graphics, a sample is an intersection of a channel and a pixel. The diagram below depicts a 24-bit pixel, consisting of 3 samples for Red, Green, and Blue. In this particular diagram, the Red sample occupies 9 bits, the Green sample occupies 7 bits and the Blue sample occupies 8 bits, totaling 24 bits per pixel. Note that the samples do not have to be equal size and not all samples are mandatory in a pixel. Also, a pixel can consist of more than 3 samples (e.g. 4 samples of the RGBA color space). A sample is related to a subpixel on a physical display.
Oversampled binary image sensor
An oversampled binary image sensor is an image sensor with non-linear response capabilities reminiscent of traditional photographic film. Each pixel in the sensor has a binary response, giving only a one-bit quantized measurement of the local light intensity. The response function of the image sensor is non-linear and similar to a logarithmic function, which makes the sensor suitable for high dynamic range imaging. == Working principle == Before the advent of digital image sensors, photography, for the most part of its history, used film to record light information. At the heart of every photographic film are a large number of light-sensitive grains of silver-halide crystals. During exposure, each micron-sized grain has a binary fate: Either it is struck by some incident photons and becomes "exposed", or it is missed by the photon bombardment and remains "unexposed". In the subsequent film development process, exposed grains, due to their altered chemical properties, are converted to silver metal, contributing to opaque spots on the film; unexposed grains are washed away in a chemical bath, leaving behind the transparent regions on the film. Thus, in essence, photographic film is a binary imaging medium, using local densities of opaque silver grains to encode the original light intensity information. Thanks to the small size and large number of these grains, one hardly notices this quantized nature of film when viewing it at a distance, observing only a continuous gray tone. The oversampled binary image sensor is reminiscent of photographic film. Each pixel in the sensor has a binary response, giving only a one-bit quantized measurement of the local light intensity. At the start of the exposure period, all pixels are set to 0. A pixel is then set to 1 if the number of photons reaching it during the exposure is at least equal to a given threshold q. One way to build such binary sensors is to modify standard memory chip technology, where each memory bit cell is designed to be sensitive to visible light. With current CMOS technology, the level of integration of such systems can exceed 109~1010 (i.e., 1 giga to 10 giga) pixels per chip. In this case, the corresponding pixel sizes (around 50~nm ) are far below the diffraction limit of light, and thus the image sensor is oversampling the optical resolution of the light field. Intuitively, one can exploit this spatial redundancy to compensate for the information loss due to one-bit quantizations, as is classic in oversampling delta-sigma converters. Building a binary sensor that emulates the photographic film process was first envisioned by Fossum, who coined the name digital film sensor (now referred to as a quanta image sensor). The original motivation was mainly out of technical necessity. The miniaturization of camera systems calls for the continuous shrinking of pixel sizes. At a certain point, however, the limited full-well capacity (i.e., the maximum photon-electrons a pixel can hold) of small pixels becomes a bottleneck, yielding very low signal-to-noise ratios (SNRs) and poor dynamic ranges. In contrast, a binary sensor whose pixels need to detect only a few photon-electrons around a small threshold q has much less requirement for full-well capacities, allowing pixel sizes to shrink further. == Imaging model == === Lens === Consider a simplified camera model shown in Fig.1. The λ 0 ( x ) {\displaystyle \lambda _{0}(x)} is the incoming light intensity field. By assuming that light intensities remain constant within a short exposure period, the field can be modeled as only a function of the spatial variable x {\displaystyle x} . After passing through the optical system, the original light field λ 0 ( x ) {\displaystyle \lambda _{0}(x)} gets filtered by the lens, which acts like a linear system with a given impulse response. Due to imperfections (e.g., aberrations) in the lens, the impulse response, a.k.a. the point spread function (PSF) of the optical system, cannot be a Dirac delta, thus, imposing a limit on the resolution of the observable light field. However, a more fundamental physical limit is due to light diffraction. As a result, even if the lens is ideal, the PSF is still unavoidably a small blurry spot. In optics, such diffraction-limited spot is often called the Airy disk, whose radius R a {\displaystyle R_{a}} can be computed as R a = 1.22 w f , {\displaystyle R_{a}=1.22\,wf,} where w {\displaystyle w} is the wavelength of the light and f {\displaystyle f} is the F-number of the optical system. Due to the lowpass (smoothing) nature of the PSF, the resulting λ ( x ) {\displaystyle \lambda (x)} has a finite spatial-resolution, i.e., it has a finite number of degrees of freedom per unit space. === Sensor === Fig.2 illustrates the binary sensor model. The s m {\displaystyle s_{m}} denote the exposure values accumulated by the sensor pixels. Depending on the local values of s m {\displaystyle s_{m}} , each pixel (depicted as "buckets" in the figure) collects a different number of photons hitting on its surface. y m {\displaystyle y_{m}} is the number of photons impinging on the surface of the m {\displaystyle m} th pixel during an exposure period. The relation between s m {\displaystyle s_{m}} and the photon count y m {\displaystyle y_{m}} is stochastic. More specifically, y m {\displaystyle y_{m}} can be modeled as realizations of a Poisson random variable, whose intensity parameter is equal to s m {\displaystyle s_{m}} , As a photosensitive device, each pixel in the image sensor converts photons to electrical signals, whose amplitude is proportional to the number of photons impinging on that pixel. In a conventional sensor design, the analog electrical signals are then quantized by an A/D converter into 8 to 14 bits (usually the more bits the better). But in the binary sensor, the quantizer is 1 bit. In Fig.2, b m {\displaystyle b_{m}} is the quantized output of the m {\displaystyle m} th pixel. Since the photon counts y m {\displaystyle y_{m}} are drawn from random variables, so are the binary sensor output b m {\displaystyle b_{m}} . === Spatial and temporal oversampling === If it is allowed to have temporal oversampling, i.e., taking multiple consecutive and independent frames without changing the total exposure time τ {\displaystyle \tau } , the performance of the binary sensor is equivalent to the sensor with same number of spatial oversampling under certain condition. It means that people can make trade off between spatial oversampling and temporal oversampling. This is quite important, since technology usually gives limitation on the size of the pixels and the exposure time. == Advantages over traditional sensors == Due to the limited full-well capacity of conventional image pixel, the pixel will saturate when the light intensity is too strong. This is the reason that the dynamic range of the pixel is low. For the oversampled binary image sensor, the dynamic range is not defined for a single pixel, but a group of pixels, which makes the dynamic range high. == Reconstruction == One of the most important challenges with the use of an oversampled binary image sensor is the reconstruction of the light intensity λ ( x ) {\displaystyle \lambda (x)} from the binary measurement b m {\displaystyle b_{m}} . Maximum likelihood estimation can be used for solving this problem. Fig. 4 shows the results of reconstructing the light intensity from 4096 binary images taken by single photon avalanche diodes (SPADs) camera. A better reconstruction quality with fewer temporal measurements and faster, hardware friendly implementation, can be achieved by more sophisticated algorithms.
Companion robot
A companion robot is a robot created to create real or apparent companionship for human beings. Target markets for companion robots include the elderly and single children. Companions robots are expected to communicate with non-experts in a natural and intuitive way. They offer a variety of functions, such as monitoring the home remotely, communicating with people, or waking people up in the morning. Their aim is to perform a wide array of tasks including educational functions, home security, diary duties, entertainment and message delivery services, etc. The idea of companionship with robots has already existed on science fictions of 1970s, like R2-D2. Starting from the late 20th century, companion robots became a reality, mostly as robotic pets. Besides entertainment purposes, interactive robots were also introduced as a personal service robot for elderly care around 2000. == Characteristics == Companion robots try to interact with users. They gather information about users based on their interactions and yield feedback. This procedure varies slightly based on their specific roles. For example, social-companion robots make simple conversations, while pet-companion robots mimic being real pets. == Types == Companion robots can perform a variety of tasks and they are produced in a specialized manner according to their purpose or target audience in order to increase convenience and end user satisfaction. === Social companion robots === Social companion robots are designed to provide companionship and be a solution for unwanted solitude. They often mimic adult human, child or pet behaviours appealing to the user base. Robots which are specifically devised for simple conversations, conveying emotions and respond to user feelings fall under this category. === Assistive companion robots === Assistive companion robots are aimed at people who require constant care because of age, disability or rehabilitation purposes. Such robots can help disadvantaged users with their daily tasks, act as reminders (e.g., for regular medication) and facilitate mobility in everyday actions. Assistive companion robots reduce the intensity of labour that should be performed by caretakers, nurses and legal guardians. === Educational companion robots === Educational companion robots perform tutorship for students, regardless of their ages, and can teach desired subjects with activities tailored for the user such as interactive assignments and games. Rather than replacing teachers and instructors, educational companion robots are aides to them. === Therapeutic companion robots === Designed for individuals coping with stress (PTSD in severe cases), anxiety and loneliness; therapeutic companion robots support users' emotional and mental wellbeing. Such robots can be utilized in hospitals and care facilities as well as dwellings where the distressed user may need the most help. Therapeutic companion robots bear a vast resemblance to assistive companion robots to the extent of being a branch of them; the nuance between these two types of companion robots is that the former is for long-term/lifetime usage while the latter is mostly for the duration of the therapy received by the user. === Pet companion robots === Pet companion robots are for individuals who seek an alternative to live pets as live animals demand a considerable amount of care and may not be eligible for people with allergies. These robots aim to be perfect imitations of a pet while diminishing the chore aspect of having one. === Entertainment companion robots === Entertainment companion robots are designed solely for entertainment and can provide numerous ways of entertainment, ranging from dancing to playing games with the user. People who would appreciate an individual to have fun with are the main audience of such products. === Personal assistant robots === Personal assistant robots help people with daily tasks, management, scheduling, reminding etc. Their area of activity can be offices as well as homes and public spaces. === Sex robots === Sex robots are anthropomorphic robotic sex dolls that have human-like movement or behavior, and some degree of artificial intelligence. As of 2026, although elaborately instrumented sex dolls have been created by a number of inventors, no fully animated sex robots yet exist. Simple devices have been created which can speak, make facial expressions, or respond to touch. There is controversy as to whether developing them would be morally justifiable. In 2015, robot ethicist Kathleen Richardson called for a ban on the creation of anthropomorphic sex robots with concerns about normalizing relationships with machines and reinforcing female dehumanization. Questions about their ethics, effects, and possible legal regulations have been discussed since then. == Examples == There are several companion robot prototypes, and these include Paro, CompanionAble, and EmotiRob, among others. === Paro === Paro is a pet-type robot system developed by Japan's National Institute of Advanced Industrial Science and Technology (AIST). The robot, which looked like a small harp seal, was designed as a therapeutic tool for use in hospitals and nursing homes. The robot is programmed to cry for attention and respond to its name. Experiments showed that Paro facilitated elderly residents to communicate with each other, which led to psychological improvements. === CompanionAble === This robot is classified as an FP 7 EU project. It is built to "cooperate with Ambient Assistive Living environment". The autonomous device, which is also built to support the elderly, helps its owner interact with smart home environment as well as caregivers. The robot functions as a mobile friend, by which natural interaction is possible via speech and the touchscreen to detect and track people at home. === EmotiRob === EmotiRob is developed in a robotics project which is the continuity of the MAPH (Active Media For the Handicap) project in emotion synthesis. The aim of the project was to maintain emotional interaction with children. EmotiRob designed in a way that a child can hold it in a his/her arms and with which he/she could interact by talking to it, and then the robot would express itself through body postures or facial expressions. It has cognitive capabilities, which are further extended so that the robot can have a natural linguistic interaction with its owner through the DRAGON speech-recognition software developed by a company called NUANCE. Such interaction is expected to facilitate a child's cognitive development and develop new learning patterns. === LOVOT === Lovot is a Japanese company robot whose only purpose is "to make you happy". It features over 50 sensors that mimic the behavior of a human baby or small pet, a 360° camera with a microphone, the ability to distinguish humans from objects, neoteny eyes, and an internal warmth of 30° celsius. An interactive Lovot Café was opened in Japan October 3, 2020. === NICOBO === Nicobo was developed by Panasonic and was influenced by the loneliness of lockdowns created as a measure of the COVID-19 pandemic. It was designed to appear vulnerable, which creates empathy in its owners. Nicobo's name derives from the Japanese word for "smile". It wags its tail, engages in baby talk, and stays as a housemate. === Hyodol === Hyodol is an advanced care robot designed to support the elderly by reminding them to take their medications and monitoring their movements to keep their guardians informed. Additionally, this innovative robot can detect and respond to the emotional states of its elderly users, adding a layer of personalized care. Hyodol is designed with the appearance and speech style of a 7-year-old Korean grandchild, featuring a soft fabric exterior and user interaction methods such as striking the head or patting the back. It is equipped with various sensors and wireless communication technologies to collect and process data, supporting mobile apps and PC web monitoring systems for remote monitoring from anywhere. In South Korea, approximately 10,000 Hyodol robots are deployed to the homes of elderly individuals living alone, providing essential support and companionship. Local governments, including provincial and county offices, have embraced Hyodol as a solution to address social challenges stemming from the country's rapidly aging society.Furthermore, the robot is widely utilized in the treatment of dementia patients at a university hospital in Gangwon province. Hyodol was honored with the Mobile World Congress (MWC) Global Mobile Awards (GLOMO) in the "Best Mobile Innovation for Connected Health and Wellbeing" category on February 29, 2024. === Moxie === Moxie was a companion robot for autistic children developed by a company called Embodied. Although it had limited motion, it presented itself as a lifelike avatar. It was designed to help the children learn emotional cognition, using remotely hosted large language models to direct its respons
Luma (video)
In video, luma ( Y ′ {\displaystyle Y'} ) represents the brightness in an image (the "black-and-white" or achromatic portion of the image). Luma is typically paired with chroma. Luma represents the achromatic image, while the chroma components represent the color information. Converting R′G′B′ sources (such as the output of a three-CCD camera) into luma and chroma allows for chroma subsampling: because human vision has finer spatial sensitivity to luminance ("black and white") differences than chromatic differences, video systems can store and transmit chromatic information at lower resolution, optimizing perceived detail at a particular bandwidth. == Luma versus relative luminance == Luma is the weighted sum of gamma-compressed R′G′B′ components of a color video—the prime symbols ′ denote gamma compression. The word was proposed to prevent confusion between luma as implemented in video engineering and relative luminance as used in color science (i.e. as defined by CIE). Relative luminance is formed as a weighted sum of linear RGB components, not gamma-compressed ones. Even so, luma is sometimes erroneously called luminance. SMPTE EG 28 recommends the symbol Y ′ {\displaystyle Y'} to denote luma and the symbol Y {\displaystyle Y} to denote relative luminance. === Use of relative luminance === While luma is more often encountered, relative luminance is sometimes used in video engineering when referring to the brightness of a monitor. The formula used to calculate relative luminance uses coefficients based on the CIE color matching functions and the relevant standard chromaticities of red, green, and blue (e.g., the original NTSC primaries, SMPTE C, or Rec. 709). For the Rec. 709 (and sRGB) primaries, the linear combination, based on pure colorimetric considerations and the definition of relative luminance is: Y = 0.2126 R + 0.7152 G + 0.0722 B {\displaystyle Y=0.2126R+0.7152G+0.0722B} The formula used to calculate luma in the Rec. 709 spec arbitrarily also uses these same coefficients, but with gamma-compressed components: Y ′ = 0.2126 R ′ + 0.7152 G ′ + 0.0722 B ′ , {\displaystyle Y'=0.2126R'+0.7152G'+0.0722B',} where the prime symbol ′ denotes gamma compression. == Rec. 601 luma versus Rec. 709 luma coefficients == For digital formats following CCIR 601 (i.e. most digital standard definition formats), luma is calculated with this formula: Y 601 ′ = 0.299 R ′ + 0.587 G ′ + 0.114 B ′ {\displaystyle Y'_{\text{601}}=0.299R'+0.587G'+0.114B'} Formats following ITU-R Recommendation BT. 709 (i.e. most digital high definition formats) use a different formula: Y 709 ′ = 0.2126 R ′ + 0.7152 G ′ + 0.0722 B ′ {\displaystyle Y'_{\text{709}}=0.2126R'+0.7152G'+0.0722B'} Modern HDTV systems use the 709 coefficients, while transitional 1035i HDTV (MUSE) formats may use the SMPTE 240M coefficients: Y 240 ′ = 0.212 R ′ + 0.701 G ′ + 0.087 B ′ = Y 145 ′ {\displaystyle Y'_{\text{240}}=0.212R'+0.701G'+0.087B'=Y'_{\text{145}}} These coefficients correspond to the SMPTE RP 145 primaries (also known as "SMPTE C") in use at the time the standard was created. The change in the luma coefficients is to provide the "theoretically correct" coefficients that reflect the corresponding standard chromaticities ('colors') of the primaries red, green, and blue. However, there is some controversy regarding this decision. The difference in luma coefficients requires that component signals must be converted between Rec. 601 and Rec. 709 to provide accurate colors. In consumer equipment, the matrix required to perform this conversion may be omitted (to reduce cost), resulting in inaccurate color. == Luma and luminance errors == As well, the Rec. 709 luma coefficients may not necessarily provide better performance. Because of the difference between luma and relative luminance, luma does not exactly represent the luminance in an image. As a result, errors in chroma can affect luminance. Luma alone does not perfectly represent luminance; accurate luminance requires both accurate luma and chroma. Hence, errors in chroma "bleed" into the luminance of an image. Note the bleeding in lightness near the borders. Due to the widespread usage of chroma subsampling, errors in chroma typically occur when it is lowered in resolution/bandwidth. This lowered bandwidth, coupled with high frequency chroma components, can cause visible errors in luminance. An example of a high frequency chroma component would be the line between the green and magenta bars of the SMPTE color bars test pattern. Error in luminance can be seen as a dark band that occurs in this area.
Evaluation of binary classifiers
Evaluation of a binary classifier typically assigns a numerical value, or values, to a classifier that represent its accuracy. An example is error rate, which measures how frequently the classifier makes a mistake. There are many metrics that can be used; different fields have different preferences. For example, in medicine sensitivity and specificity are often used, while in computer science precision and recall are preferred. An important distinction is between metrics that are independent of the prevalence or skew (how often each class occurs in the population), and metrics that depend on the prevalence – both types are useful, but they have very different properties. Often, evaluation is used to compare two methods of classification, so that one can be adopted and the other discarded. Such comparisons are more directly achieved by a form of evaluation that results in a single unitary metric rather than a pair of metrics. == Contingency table == Given a data set, a classification (the output of a classifier on that set) gives two numbers: the number of positives and the number of negatives, which add up to the total size of the set. To evaluate a classifier, one compares its output to another reference classification – ideally a perfect classification, but in practice the output of another gold standard test – and cross tabulates the data into a 2×2 contingency table, comparing the two classifications. One then evaluates the classifier relative to the gold standard by computing summary statistics of these 4 numbers. Generally these statistics will be scale invariant (scaling all the numbers by the same factor does not change the output), to make them independent of population size, which is achieved by using ratios of homogeneous functions, most simply homogeneous linear or homogeneous quadratic functions. Say we test some people for the presence of a disease. Some of these people have the disease, and our test correctly says they are positive. They are called true positives (TP). Some have the disease, but the test incorrectly claims they don't. They are called false negatives (FN). Some don't have the disease, and the test says they don't – true negatives (TN). Finally, there might be healthy people who have a positive test result – false positives (FP). These can be arranged into a 2×2 contingency table (confusion matrix), conventionally with the test result on the vertical axis and the actual condition on the horizontal axis. These numbers can then be totaled, yielding both a grand total and marginal totals. Totaling the entire table, the number of true positives, false negatives, true negatives, and false positives add up to 100% of the set. Totaling the columns (adding vertically) the number of true positives and false positives add up to 100% of the test positives, and likewise for negatives. Totaling the rows (adding horizontally), the number of true positives and false negatives add up to 100% of the condition positives (conversely for negatives). The basic marginal ratio statistics are obtained by dividing the 2×2=4 values in the table by the marginal totals (either rows or columns), yielding 2 auxiliary 2×2 tables, for a total of 8 ratios. These ratios come in 4 complementary pairs, each pair summing to 1, and so each of these derived 2×2 tables can be summarized as a pair of 2 numbers, together with their complements. Further statistics can be obtained by taking ratios of these ratios, ratios of ratios, or more complicated functions. The contingency table and the most common derived ratios are summarized below; see sequel for details. Note that the rows correspond to the condition actually being positive or negative (or classified as such by the gold standard), as indicated by the color-coding, and the associated statistics are prevalence-independent, while the columns correspond to the test being positive or negative, and the associated statistics are prevalence-dependent. There are analogous likelihood ratios for prediction values, but these are less commonly used, and not depicted above. == Pairs of metrics == Often accuracy is evaluated with a pair of metrics composed in a standard pattern. === Sensitivity and specificity === The fundamental prevalence-independent statistics are sensitivity and specificity. Sensitivity or True Positive Rate (TPR), also known as recall, is the proportion of people that tested positive and are positive (True Positive, TP) of all the people that actually are positive (Condition Positive, CP = TP + FN). It can be seen as the probability that the test is positive given that the patient is sick. With higher sensitivity, fewer actual cases of disease go undetected (or, in the case of the factory quality control, fewer faulty products go to the market). Specificity (SPC) or True Negative Rate (TNR) is the proportion of people that tested negative and are negative (True Negative, TN) of all the people that actually are negative (Condition Negative, CN = TN + FP). As with sensitivity, it can be looked at as the probability that the test result is negative given that the patient is not sick. With higher specificity, fewer healthy people are labeled as sick (or, in the factory case, fewer good products are discarded). The relationship between sensitivity and specificity, as well as the performance of the classifier, can be visualized and studied using the Receiver Operating Characteristic (ROC) curve. In theory, sensitivity and specificity are independent in the sense that it is possible to achieve 100% in both (such as in the red/blue ball example given above). In more practical, less contrived instances, however, there is usually a trade-off, such that they are inversely proportional to one another to some extent. This is because we rarely measure the actual thing we would like to classify; rather, we generally measure an indicator of the thing we would like to classify, referred to as a surrogate marker. The reason why 100% is achievable in the ball example is because redness and blueness is determined by directly detecting redness and blueness. However, indicators are sometimes compromised, such as when non-indicators mimic indicators or when indicators are time-dependent, only becoming evident after a certain lag time. The following example of a pregnancy test will make use of such an indicator. Modern pregnancy tests do not use the pregnancy itself to determine pregnancy status; rather, human chorionic gonadotropin is used, or hCG, present in the urine of gravid females, as a surrogate marker to indicate that a woman is pregnant. Because hCG can also be produced by a tumor, the specificity of modern pregnancy tests cannot be 100% (because false positives are possible). Also, because hCG is present in the urine in such small concentrations after fertilization and early embryogenesis, the sensitivity of modern pregnancy tests cannot be 100% (because false negatives are possible). === Positive and negative predictive values === In addition to sensitivity and specificity, the performance of a binary classification test can be measured with positive predictive value (PPV), also known as precision, and negative predictive value (NPV). The positive prediction value answers the question "If the test result is positive, how well does that predict an actual presence of disease?". It is calculated as TP/(TP + FP); that is, it is the proportion of true positives out of all positive results. The negative prediction value is the same, but for negatives, naturally. ==== Impact of prevalence on predictive values ==== Prevalence has a significant impact on prediction values. As an example, suppose there is a test for a disease with 99% sensitivity and 99% specificity. If 2000 people are tested and the prevalence (in the sample) is 50%, 1000 of them are sick and 1000 of them are healthy. Thus about 990 true positives and 990 true negatives are likely, with 10 false positives and 10 false negatives. The positive and negative prediction values would be 99%, so there can be high confidence in the result. However, if the prevalence is only 5%, so of the 2000 people only 100 are really sick, then the prediction values change significantly. The likely result is 99 true positives, 1 false negative, 1881 true negatives and 19 false positives. Of the 19+99 people tested positive, only 99 really have the disease – that means, intuitively, that given that a patient's test result is positive, there is only 84% chance that they really have the disease. On the other hand, given that the patient's test result is negative, there is only 1 chance in 1882, or 0.05% probability, that the patient has the disease despite the test result. === Precision and recall === Precision and recall can be interpreted as (estimated) conditional probabilities: Precision is given by P ( C = P | C ^ = P ) {\displaystyle P(C=P|{\hat {C}}=P)} while recall is given by P ( C ^ = P | C = P ) {\displaystyle P({\hat {C}}=P|C=P)} , where C ^ {\
TIMIT
TIMIT is a corpus of phonemically and lexically transcribed speech of American English speakers of different sexes and dialects. Each transcribed element has been delineated in time. TIMIT was designed to further acoustic-phonetic knowledge and automatic speech recognition systems. It was commissioned by DARPA and corpus design was a joint effort between the Massachusetts Institute of Technology, SRI International, and Texas Instruments (TI). The speech was recorded at TI, transcribed at MIT, and verified and prepared for publishing by the National Institute of Standards and Technology (NIST). There is also a telephone bandwidth version called NTIMIT (Network TIMIT). TIMIT and NTIMIT are not freely available — either membership of the Linguistic Data Consortium, or a monetary payment, is required for access to the dataset. == Data == TIMIT contains ~5 hours of speech, of 10 sentences spoken by each of 630 speakers. The sentences were randomly sampled from a corpus of 2342 sentences. The speakers were native speakers of American English, classified under 8 major dialect regions: New England, Northern, North Midland, South Midland, Southern, New York City, Western, Army Brat (moved around). The speakers were 70% male and 30% female. Recordings were made in a noise-isolated recording booth at Texas Instrument, using a semi-automatic computer system (STEROIDS) to control the presentation of prompts to the speaker and the recording. Two-channel recordings were made using a Sennheiser HMD 414 headset-mounted microphone and a Brüel & Kjær 1/2" far-field pressure microphone (#4165). The speech was digitized at a sample rate of 20 kHz then and downsampled to 16 kHz. == History == The TIMIT telephone corpus was an early attempt to create a database with speech samples. It was published in the year 1988 on CD-ROM and consists of only 10 sentences per speaker. Two 'dialect' sentences were read by each speaker, as well as another 8 sentences selected from a larger set Each sentence averages 3 seconds long and is spoken by 630 different speakers. It was the first notable attempt in creating and distributing a speech corpus and the overall project has produced costs of 1.5 million US$. An update was released in October 1990. It included full 630-speaker corpus; checked and corrected transcriptions; word-alignment transcriptions; NIST SPHERE-headered waveform files and header manipulation software; phonemic dictionary; new test and training subsets balanced for dialectal and phonetic coverage; more extensive documentation. The full name of the project is DARPA-TIMIT Acoustic-Phonetic Continuous Speech Corpus and the acronym TIMIT stands for Texas Instruments/Massachusetts Institute of Technology. The main reason why a corpus of telephone speech was created was to train speech recognition software. In the Blizzard challenge, different software has the obligation to convert audio recordings into textual data and the TIMIT corpus was used as a standardized baseline.