Transcription software assists in the conversion of human speech into a text transcript. Audio or video files can be transcribed manually or automatically. Transcriptionists can replay a recording several times in a transcription editor and type what they hear. By using transcription hot keys, the manual transcription can be accelerated, the sound filtered, equalized or have the tempo adjusted when the clarity is not great. With speech recognition technology, transcriptionists can automatically convert recordings to text transcripts by opening recordings in a PC and uploading them to a cloud for automatic transcription, or transcribe recordings in real-time by using digital dictation. Depending on quality of recordings, machine generated transcripts may still need to be manually verified. The accuracy rate of the automatic transcription depends on several factors such as background noises, speakers' distance to the microphone, and accents. Transcription software, as with transcription services, is often used for business, legal, or medical purposes. Compared with audio content, a text transcript is searchable, takes up less computer memory, and can be used as an alternate method of communication, such as for subtitles and closed captions. Some clinical environments also use digital tools to support transcription workflows, including ambient documentation systems that employ Speech recognition to capture portions of clinical encounters and generate draft notes for later review. These tools are typically used alongside conventional transcription methods. The definition of transcription "software", as compared with transcription "service", is that the former is sufficiently automated that a user can run the entire system without engaging outside personnel. New software-as-a-service and cloud computing models use artificial intelligence, machine learning and natural language processing to convert speech to text and continuously learn new phrases and accents. AI transcription can, however, lead to hallucinations and other errors. == Development == Research at Google released a free android app Google Live Transcribe, it runs on Google Cloud. Google Chrome developed and has an available built in English Live Caption. Google Docs, Google Translate, Google Assistant, GBoard Google Text to Speech engine support transcription tool too. OpenAI launched Whisper, an open-source speech recognition deep learning model in September 2022. In 2024, an AI-powered transcription platform, Transkriptor, was launched, enabling the automatic conversion of audio and video recordings into text using speech recognition technology, with support for transcription in 100 languages and processing of content uploaded via a web interface as well as mobile and browser extensions. It is part of the Tor.app suite of AI-based language processing tools.
Logistics automation
Logistics automation is the application of computer software or automated machinery to logistics operations in order to improve its efficiency. Typically this refers to operations within a warehouse or distribution center, with broader tasks undertaken by supply chain engineering systems and enterprise resource planning systems. Logistics automation systems can powerfully complement the facilities provided by these higher level computer systems. The focus on an individual node within a wider logistics network allows systems to be highly tailored to the requirements of that node. == Components == Logistics automation systems comprise a variety of hardware and software components: Fixed machinery Automated storage and retrieval systems, including: Cranes serve a rack of locations, allowing many levels of stock to be stacked vertically, and allowing for higher storage densities and better space utilization than alternatives. In systems produced by Amazon Robotics, automated guided vehicles move items to a human picker. Conveyors: Containers can enter automated conveyors in one area of the warehouse and, either through hard-coded rules or data input, be moved to a selected destination. Vertical carousels based on the paternoster lift system or using space optimization, similar to vending machines, but on a larger scale. Sortation systems: similar to conveyors but typically with higher capacity and able to divert containers more quickly. Typically used to distribute high volumes of small cartons to a large set of locations. Industrial robots: four- to six-axis industrial robots, e.g. palletizing robots, are used for palletizing, depalletizing, packaging, commissioning and order picking. Typically all of these will automatically identify and track containers using barcodes or, increasingly, RFID tags. Motion check weighers may be used to reject cases or individual products that are under or over their specified weight. They are often used in kitting conveyor lines to ensure all pieces belonging in the kit are present. Mobile technology Radio data terminals: these are handheld or truck-mounted terminals which connect by radio to logistics automation software and provide instructions to operators moving throughout the warehouse. Many also have barcode scanners to allow identification of containers more quickly and accurately than manual keyboard entry. Software Integration software: this provides overall control of the automation machinery and allows cranes to be connected to conveyors for seamless stock movements. Operational control software: provides low-level decision-making, such as where to store incoming containers, and where to retrieve them when requested. Business control software: provides higher-level functionality, such as identification of incoming deliveries/stock, scheduling order fulfillment, and assignment of stock to outgoing trailers. == Benefits == A typical warehouse or distribution center will receive stock of a variety of products from suppliers and store these until the receipt of orders from customers, whether individual buyers (e.g. mail order), retail branches (e.g. chain stores), or other companies (e.g. wholesalers). A logistics automation system may provide the following: Automated goods in processes: Incoming goods can be marked with barcodes and the automation system notified of the expected stock. On arrival, the goods can be scanned and thereby identified, and taken via conveyors, sortation systems, and automated cranes into an automatically assigned storage location. Automated goods retrieval for orders: On receipt of orders, the automation system is able to immediately locate goods and retrieve them to a pick-face location. Automated dispatch processing: Combining knowledge of all orders placed at the warehouse the automation system can assign picked goods into dispatch units and then into outbound loads. Sortation systems and conveyors can then move these onto the outgoing trailers. If needed, repackaging to ensure proper protection for further distribution or to change the package format for specific retailers/customers. A complete warehouse automation system can drastically reduce the workforce required to run a facility, with human input required only for a few tasks, such as picking units of product from a bulk packed case. Even here, assistance can be provided with equipment such as pick-to-light units. Smaller systems may only be required to handle part of the process. Examples include automated storage and retrieval systems, which simply use cranes to store and retrieve identified cases or pallets, typically into a high-bay storage system which would be unfeasible to access using fork-lift trucks or any other means. The use of Automatic Guided Vehicles maximizes the output compared to humans since they can do repetitive tasks for long hours and with least to no supervision. An AGV is built and programmed for precision and accuracy thereby reducing the chances of errors in a warehouse, especially when dealing with fragile goods. == Automation software == Software or cloud-based SaaS solutions are used for logistics automation which helps the supply chain industry in automating the workflow as well as management of the system. Knowledge @ Wharton staff writers noted in 2011 that some manufacturers and retailers were weathering the Great Recession "by signing up for pay-as-you-go logistics services available through the Internet 'cloud'". They identified the benefits and reduced costs which came from sharing information about shipments with suppliers, hauliers and end users. There is little generalized software available in this market. This is because there is no rule to generalize the system as well as work flow even though the practice is more or less the same. Most of the commercial companies do use one or the other of the custom solutions. But there are various software solutions that are being used within the departments of logistics. There are a few departments in Logistics, namely: Conventional Department, Container Department, Warehouse, Marine Engineering, Heavy Haulage, etc. Software used in these departments Conventional department : CVT software / CTMS software. Container Trucking: CTMS software Warehouse : WMS/WCS Improving Effectiveness of Logistics Management Logistical Network Information Transportation Sound Inventory Management Warehousing, Materials Handling & Packaging
NETtalk (artificial neural network)
NETtalk is an artificial neural network that learns to pronounce written English text by supervised learning. It takes English text as input, and produces a matching phonetic transcriptions as output. It is the result of research carried out in the mid-1980s by Terrence Sejnowski and Charles Rosenberg. The intent behind NETtalk was to construct simplified models that might shed light on the complexity of learning human level cognitive tasks, and their implementation as a connectionist model that could also learn to perform a comparable task. The authors trained it by backpropagation. The network was trained on a large amount of English words and their corresponding pronunciations, and is able to generate pronunciations for unseen words with a high level of accuracy. The output of the network was a stream of phonemes, which fed into DECtalk to produce audible speech. It achieved popular success, appearing on the Today show. From the point of view of modeling human cognition, NETtalk does not specifically model the image processing stages and letter recognition of the visual cortex. Rather, it assumes that the letters have been pre-classified and recognized. It is NETtalk's task to learn proper associations between the correct pronunciation with a given sequence of letters based on the context in which the letters appear. A similar architecture was subsequently used for the opposite task, that of converting continuous speech signal to a phoneme sequence. == Training == The training dataset was a 20,008-word subset of the Brown Corpus, with manually annotated phoneme and stress for each letter. The development process was described in a 1993 interview. It took three months -- 250 person-hours -- to create the training dataset, but only a few days to train the network. After it was run successfully on this, the authors tried it on a phonological transcription of an interview with a young Latino boy from a barrio in Los Angeles. This resulted in a network that reproduced his Spanish accent. The original NETtalk was implemented on a Ridge 32, which took 0.275 seconds per learning step (one forward and one backward pass). Training NETtalk became a benchmark to test for the efficiency of backpropagation programs. For example, an implementation on Connection Machine-1 (with 16384 processors) ran at 52x speedup. An implementation on a 10-cell Warp ran at 340x speedup. The following table compiles the benchmark scores as of 1988. Speed is measured in "millions of connections per second" (MCPS). For example, the original NETtalk on Ridge 32 took 0.275 seconds per forward-backward pass, giving 18629 / 10 6 0.275 = 0.068 {\displaystyle {\frac {18629/10^{6}}{0.275}}=0.068} MCPS. Relative times are normalized to the MicroVax. == Architecture == The network had three layers and 18,629 adjustable weights, large by the standards of 1986. There were worries that it would overfit the dataset, but it was trained successfully. The input of the network has 203 units, divided into 7 groups of 29 units each. Each group is a one-hot encoding of one character. There are 29 possible characters: 26 letters, comma, period, and word boundary (whitespace). To produce the pronunciation of a single character, the network takes the character itself, as well as 3 characters before and 3 characters after it. The hidden layer has 80 units. The output has 26 units. 21 units encode for articulatory features (point of articulation, voicing, vowel height, etc.) of phonemes, and 5 units encode for stress and syllable boundaries. Sejnowski studied the learned representation in the network, and found that phonemes that sound similar are clustered together in representation space. The output of the network degrades, but remains understandable, when some hidden neurons are removed.
Tensor sketch
In statistics, machine learning and algorithms, a tensor sketch is a type of dimensionality reduction that is particularly efficient when applied to vectors that have tensor structure. Such a sketch can be used to speed up explicit kernel methods, bilinear pooling in neural networks and is a cornerstone in many numerical linear algebra algorithms. == Mathematical definition == Mathematically, a dimensionality reduction or sketching matrix is a matrix M ∈ R k × d {\displaystyle M\in \mathbb {R} ^{k\times d}} , where k < d {\displaystyle k Multi Expression Programming (MEP) is an evolutionary algorithm for generating mathematical functions describing a given set of data. MEP is a Genetic Programming variant encoding multiple solutions in the same chromosome. MEP representation is not specific (multiple representations have been tested). In the simplest variant, MEP chromosomes are linear strings of instructions. This representation was inspired by Three-address code. MEP strength consists in the ability to encode multiple solutions, of a problem, in the same chromosome. In this way, one can explore larger zones of the search space. For most of the problems this advantage comes with no running-time penalty compared with genetic programming variants encoding a single solution in a chromosome. == Representation == MEP chromosomes are arrays of instructions represented in Three-address code format. Each instruction contains a variable, a constant, or a function. If the instruction is a function, then the arguments (given as instruction's addresses) are also present. === Example of MEP program === Here is a simple MEP chromosome (labels on the left side are not a part of the chromosome): 1: a 2: b 3: + 1, 2 4: c 5: d 6: + 4, 5 7: 3, 5 == Fitness computation == When the chromosome is evaluated it is unclear which instruction will provide the output of the program. In many cases, a set of programs is obtained, some of them being completely unrelated (they do not have common instructions). For the above chromosome, here is the list of possible programs obtained during decoding: E1 = a, E2 = b, E4 = c, E5 = d, E3 = a + b. E6 = c + d. E7 = (a + b) d. Each instruction is evaluated as a possible output of the program. The fitness (or error) is computed in a standard manner. For instance, in the case of symbolic regression, the fitness is the sum of differences (in absolute value) between the expected output (called target) and the actual output. == Fitness assignment process == Which expression will represent the chromosome? Which one will give the fitness of the chromosome? In MEP, the best of them (which has the lowest error) will represent the chromosome. This is different from other GP techniques: In Linear genetic programming the last instruction will give the output. In Cartesian Genetic Programming the gene providing the output is evolved like all other genes. Note that, for many problems, this evaluation has the same complexity as in the case of encoding a single solution in each chromosome. Thus, there is no penalty in running time compared to other techniques. == Software == === MEPX === MEPX is a cross-platform (Windows, macOS, and Linux Ubuntu) free software for the automatic generation of computer programs. It can be used for data analysis, particularly for solving symbolic regression, statistical classification and time-series problems. === libmep === Libmep is a free and open source library implementing Multi Expression Programming technique. It is written in C++. === hmep === hmep is a new open source library implementing Multi Expression Programming technique in Haskell programming language. Pulse-coupled networks or pulse-coupled neural networks (PCNNs) are neural models proposed by modeling a cat's visual cortex, and developed for high-performance biomimetic image processing. In 1989, Eckhorn introduced a neural model to emulate the mechanism of cat's visual cortex. The Eckhorn model provided a simple and effective tool for studying small mammal’s visual cortex, and was soon recognized as having significant application potential in image processing. In 1994, Johnson adapted the Eckhorn model to an image processing algorithm, calling this algorithm a pulse-coupled neural network. The basic property of the Eckhorn's linking-field model (LFM) is the coupling term. LFM is a modulation of the primary input by a biased offset factor driven by the linking input. These drive a threshold variable that decays from an initial high value. When the threshold drops below zero it is reset to a high value and the process starts over. This is different than the standard integrate-and-fire neural model, which accumulates the input until it passes an upper limit and effectively "shorts out" to cause the pulse. LFM uses this difference to sustain pulse bursts, something the standard model does not do on a single neuron level. It is valuable to understand, however, that a detailed analysis of the standard model must include a shunting term, due to the floating voltages level in the dendritic compartment(s), and in turn this causes an elegant multiple modulation effect that enables a true higher-order network (HON). A PCNN is a two-dimensional neural network. Each neuron in the network corresponds to one pixel in an input image, receiving its corresponding pixel's color information (e.g. intensity) as an external stimulus. Each neuron also connects with its neighboring neurons, receiving local stimuli from them. The external and local stimuli are combined in an internal activation system, which accumulates the stimuli until it exceeds a dynamic threshold, resulting in a pulse output. Through iterative computation, PCNN neurons produce temporal series of pulse outputs. The temporal series of pulse outputs contain information of input images and can be used for various image processing applications, such as image segmentation and feature generation. Compared with conventional image processing means, PCNNs have several significant merits, including robustness against noise, independence of geometric variations in input patterns, capability of bridging minor intensity variations in input patterns, etc. A simplified PCNN called a spiking cortical model was developed in 2009. == Applications == PCNNs are useful for image processing, as discussed in a book by Thomas Lindblad and Jason M. Kinser. PCNNs have been used in a variety of image processing applications, including: image segmentation, pattern recognition, feature generation, face extraction, motion detection, region growing, image denoising and image enhancement Multidimensional pulse image processing of chemical structure data using PCNN has been discussed by Kinser, et al. They have also been applied to an all pairs shortest path problem. The randomized weighted majority algorithm is an algorithm in machine learning theory for aggregating expert predictions to a series of decision problems. It is a simple and effective method based on weighted voting which improves on the mistake bound of the deterministic weighted majority algorithm. In fact, in the limit, its prediction rate can be arbitrarily close to that of the best-predicting expert. == Example == Imagine that every morning before the stock market opens, we get a prediction from each of our "experts" about whether the stock market will go up or down. Our goal is to somehow combine this set of predictions into a single prediction that we then use to make a buy or sell decision for the day. The principal challenge is that we do not know which experts will give better or worse predictions. The RWMA gives us a way to do this combination such that our prediction record will be nearly as good as that of the single expert which, in hindsight, gave the most accurate predictions. == Motivation == In machine learning, the weighted majority algorithm (WMA) is a deterministic meta-learning algorithm for aggregating expert predictions. In pseudocode, the WMA is as follows: initialize all experts to weight 1 for each round: add each expert's weight to the option they predicted predict the option with the largest weighted sum multiply the weights of all experts who predicted wrongly by 1 2 {\displaystyle {\frac {1}{2}}} Suppose there are n {\displaystyle n} experts and the best expert makes m {\displaystyle m} mistakes. Then, the weighted majority algorithm (WMA) makes at most 2.4 ( log 2 n + m ) {\displaystyle 2.4(\log _{2}n+m)} mistakes. This bound is highly problematic in the case of highly error-prone experts. Suppose, for example, the best expert makes a mistake 20% of the time; that is, in N = 100 {\displaystyle N=100} rounds using n = 10 {\displaystyle n=10} experts, the best expert makes m = 20 {\displaystyle m=20} mistakes. Then, the weighted majority algorithm only guarantees an upper bound of 2.4 ( log 2 10 + 20 ) ≈ 56 {\displaystyle 2.4(\log _{2}10+20)\approx 56} mistakes. As this is a known limitation of the weighted majority algorithm, various strategies have been explored in order to improve the dependence on m {\displaystyle m} . In particular, we can do better by introducing randomization. Drawing inspiration from the Multiplicative Weights Update Method algorithm, we will probabilistically make predictions based on how the experts have performed in the past. Similarly to the WMA, every time an expert makes a wrong prediction, we will decrement their weight. Mirroring the MWUM, we will then use the weights to make a probability distribution over the actions and draw our action from this distribution (instead of deterministically picking the majority vote as the WMA does). == Randomized weighted majority algorithm (RWMA) == The randomized weighted majority algorithm is an attempt to improve the dependence of the mistake bound of the WMA on m {\displaystyle m} . Instead of predicting based on majority vote, the weights, are used as probabilities for choosing the experts in each round and are updated over time (hence the name randomized weighted majority). Precisely, if w i {\displaystyle w_{i}} is the weight of expert i {\displaystyle i} , let W = ∑ i w i {\displaystyle W=\sum _{i}w_{i}} . We will follow expert i {\displaystyle i} with probability w i W {\displaystyle {\frac {w_{i}}{W}}} . This results in the following algorithm: initialize all experts to weight 1. for each round: add all experts' weights together to obtain the total weight W {\displaystyle W} choose expert i {\displaystyle i} randomly with probability w i W {\displaystyle {\frac {w_{i}}{W}}} predict as the chosen expert predicts multiply the weights of all experts who predicted wrongly by β {\displaystyle \beta } The goal is to bound the worst-case expected number of mistakes, assuming that the adversary has to select one of the answers as correct before we make our coin toss. This is a reasonable assumption in, for instance, the stock market example provided above: the variance of a stock price should not depend on the opinions of experts that influence private buy or sell decisions, so we can treat the price change as if it was decided before the experts gave their recommendations for the day. The randomized algorithm is better in the worst case than the deterministic algorithm (weighted majority algorithm): in the latter, the worst case was when the weights were split 50/50. But in the randomized version, since the weights are used as probabilities, there would still be a 50/50 chance of getting it right. In addition, generalizing to multiplying the weights of the incorrect experts by β < 1 {\displaystyle \beta <1} instead of strictly 1 2 {\displaystyle {\frac {1}{2}}} allows us to trade off between dependence on m {\displaystyle m} and log 2 n {\displaystyle \log _{2}n} . This trade-off will be quantified in the analysis section. == Analysis == Let W t {\displaystyle W_{t}} denote the total weight of all experts at round t {\displaystyle t} . Also let F t {\displaystyle F_{t}} denote the fraction of weight placed on experts which predict the wrong answer at round t {\displaystyle t} . Finally, let N {\displaystyle N} be the total number of rounds in the process. By definition, F t {\displaystyle F_{t}} is the probability that the algorithm makes a mistake on round t {\displaystyle t} . It follows from the linearity of expectation that if M {\displaystyle M} denotes the total number of mistakes made during the entire process, E [ M ] = ∑ t = 1 N F t {\displaystyle E[M]=\sum _{t=1}^{N}F_{t}} . After round t {\displaystyle t} , the total weight is decreased by ( 1 − β ) F t W t {\displaystyle \ (1-\beta )F_{t}W_{t}} , since all weights corresponding to a wrong answer are multiplied by β < 1 {\displaystyle \ \beta <1} . It then follows that W t + 1 = W t ( 1 − ( 1 − β ) F t ) {\displaystyle W_{t+1}=W_{t}(1-(1-\beta )F_{t})} . By telescoping, since W 1 = n {\displaystyle W_{1}=n} , it follows that the total weight after the process concludes is On the other hand, suppose that m {\displaystyle \ m} is the number of mistakes made by the best-performing expert. At the end, this expert has weight β m {\displaystyle \ \beta ^{m}} . It follows, then, that the total weight is at least this much; in other words, W ≥ β m {\displaystyle \ W\geq \beta ^{m}} . This inequality and the above result imply Taking the natural logarithm of both sides yields Now, the Taylor series of the natural logarithm is In particular, it follows that ln ( 1 − ( 1 − β ) F t ) < − ( 1 − β ) F t {\displaystyle \ \ln(1-(1-\beta )F_{t})<-(1-\beta )F_{t}} . Thus, Recalling that E [ M ] = ∑ t = 1 N F t {\displaystyle E[M]=\sum _{t=1}^{N}F_{t}} and rearranging, it follows that Now, as β → 1 {\displaystyle \beta \to 1} from below, the first constant tends to 1 {\displaystyle 1} ; however, the second constant tends to + ∞ {\displaystyle +\infty } . To quantify this tradeoff, define ε = 1 − β {\displaystyle \varepsilon =1-\beta } to be the penalty associated with getting a prediction wrong. Then, again applying the Taylor series of the natural logarithm, It then follows that the mistake bound, for small ε {\displaystyle \varepsilon } , can be written in the form ( 1 + ϵ 2 + O ( ε 2 ) ) m + ϵ − 1 ln ( n ) {\displaystyle \ \left(1+{\frac {\epsilon }{2}}+O(\varepsilon ^{2})\right)m+\epsilon ^{-1}\ln(n)} . In English, the less that we penalize experts for their mistakes, the more that additional experts will lead to initial mistakes but the closer we get to capturing the predictive accuracy of the best expert as time goes on. In particular, given a sufficiently low value of ε {\displaystyle \varepsilon } and enough rounds, the randomized weighted majority algorithm can get arbitrarily close to the correct prediction rate of the best expert. In particular, as long as m {\displaystyle m} is sufficiently large compared to ln ( n ) {\displaystyle \ln(n)} (so that their ratio is sufficiently small), we can assign we can obtain an upper bound on the number of mistakes equal to This implies that the "regret bound" on the algorithm (that is, how much worse it performs than the best expert) is sublinear, at O ( m ln ( n ) ) {\displaystyle O({\sqrt {m\ln(n)}})} . == Revisiting the motivation == Recall that the motivation for the randomized weighted majority algorithm was given by an example where the best expert makes a mistake 20% of the time. Precisely, in N = 100 {\displaystyle N=100} rounds, with n = 10 {\displaystyle n=10} experts, where the best expert makes m = 20 {\displaystyle m=20} mistakes, the deterministic weighted majority algorithm only guarantees an upper bound of 2.4 ( log 2 10 + 20 ) ≈ 56 {\displaystyle 2.4(\log _{2}10+20)\approx 56} . By the analysis above, it follows that minimizing the number of worst-case expected mistakes is equivalent to minimizing the funMulti expression programming
Pulse-coupled networks
Randomized weighted majority algorithm