Speech recognition

Speech recognition

Speech recognition (automatic speech recognition (ASR), computer speech recognition, or speech-to-text (STT)) is a sub-field of computational linguistics concerned with methods and technologies that translate spoken language into text or other interpretable forms. Speech recognition applications include voice user interfaces, where the user speaks to a device, which "listens" and processes the audio. Common voice applications include interpreting commands for calling, call routing, home automation, and aircraft control. These applications are called direct voice input. Productivity applications include searching audio recordings, creating transcripts, and dictation. Speech recognition can be used to analyse speaker characteristics, such as identifying native language using pronunciation assessment. Voice recognition (speaker identification) refers to identifying the speaker, rather than speech contents. Recognizing the speaker can simplify the task of translating speech in systems trained on a specific person's voice. It can also be used to authenticate the speaker as part of a security process. == History == Applications for speech recognition developed over many decades, with progress accelerated due to advances in deep learning and the use of big data. These advances are reflected in an increase in academic papers, and greater system adoption. Key areas of growth include vocabulary size, more accurate recognition for unfamiliar speakers (speaker independence), and faster processing speed. === Pre-1970 === 1952 – Bell Labs researchers, Stephen Balashek, R. Biddulph, and K. H. Davis, built Audrey for single-speaker digit recognition. Their system located the formants in the power spectrum of each utterance. 1960 – Gunnar Fant developed and published the source–filter model of speech production. 1962 – IBM's 16-word "Shoebox" machine's speech recognition debuted at the 1962 World's Fair. 1966 – Linear predictive coding, a speech coding method, was proposed by Fumitada Itakura of Nagoya University and Shuzo Saito of Nippon Telegraph and Telephone. 1969 – Funding at Bell Labs came to a halt for several years after the company's head engineer, John R. Pierce, wrote an open letter criticizing speech recognition research. This defunding lasted until Pierce retired and James L. Flanagan took over. Raj Reddy was the first person to work on continuous speech recognition, as a graduate student at Stanford University in the late 1960s. Previous systems required users to pause after each word. Reddy's system issued spoken commands for playing chess. Around this time, Soviet researchers invented the dynamic time warping (DTW) algorithm and used it to create a recognizer capable of operating on a 200-word vocabulary. DTW processed speech by dividing it into short frames (e.g. 10 ms segments) and treating each frame as a unit. Speaker independence, however, remained unsolved. === 1970–1990 === 1971 – DARPA funded a five-year speech recognition research project, Speech Understanding Research, seeking a minimum vocabulary size of 1,000 words. The project considered speech understanding a key to achieving progress in speech recognition, which was later disproved. BBN, IBM, Carnegie Mellon (CMU), and Stanford Research Institute participated. 1972 – The IEEE Acoustics, Speech, and Signal Processing group held a conference in Newton, Massachusetts. 1976 – The first ICASSP was held in Philadelphia, which became a major venue for publishing on speech recognition. During the late 1960s, Leonard Baum developed the mathematics of Markov chains at the Institute for Defense Analysis. A decade later, at CMU, Raj Reddy's students James Baker and Janet M. Baker began using the hidden Markov model (HMM) for speech recognition. James Baker had learned about HMMs while at the Institute for Defense Analysis. HMMs enabled researchers to combine sources of knowledge, such as acoustics, language, and syntax, in a unified probabilistic model. By the mid-1980s, Fred Jelinek's team at IBM created a voice-activated typewriter called Tangora, which could handle a 20,000-word vocabulary. Jelinek's statistical approach placed less emphasis on emulating human brain processes in favor of statistical modelling. (Jelinek's group independently discovered the application of HMMs to speech.) This was controversial among linguists since HMMs are too simplistic to account for many features of human languages. However, the HMM proved to be a highly useful way for modelling speech and replaced dynamic time warping as the dominant speech recognition algorithm in the 1980s. 1982 – Dragon Systems, founded by James and Janet M. Baker, was one of IBM's few competitors. === Practical speech recognition === The 1980s also saw the introduction of the n-gram language model. 1987 – The back-off model enabled language models to use multiple-length n-grams, and CSELT used HMM to recognize languages (in software and hardware, e.g. RIPAC). At the end of the DARPA program in 1976, the best computer available to researchers was the PDP-10 with 4 MB of RAM. It could take up to 100 minutes to decode 30 seconds of speech. Practical products included: 1984 – the Apricot Portable was released with up to 4096 words support, of which only 64 could be held in RAM at a time. 1987 – a recognizer from Kurzweil Applied Intelligence 1990 – Dragon Dictate, a consumer product released in 1990. AT&T deployed the Voice Recognition Call Processing service in 1992 to route telephone calls without a human operator. The technology was developed by Lawrence Rabiner and others at Bell Labs. By the early 1990s, the vocabulary of the typical commercial speech recognition system had exceeded the average human vocabulary. Reddy's former student, Xuedong Huang, developed the Sphinx-II system at CMU. Sphinx-II was the first to do speaker-independent, large vocabulary, continuous speech recognition, and it won DARPA's 1992 evaluation. Handling continuous speech with a large vocabulary was a major milestone. Huang later founded the speech recognition group at Microsoft in 1993. Reddy's student Kai-Fu Lee joined Apple, where, in 1992, he helped develop the Casper speech interface prototype. Lernout & Hauspie, a Belgium-based speech recognition company, acquired other companies, including Kurzweil Applied Intelligence in 1997 and Dragon Systems in 2000. L&H was used in Windows XP. L&H was an industry leader until an accounting scandal destroyed it in 2001. L&H speech technology was bought by ScanSoft, which became Nuance in 2005. Apple licensed Nuance software for its digital assistant Siri. ==== 2000s ==== In the 2000s, DARPA sponsored two speech recognition programs: Effective Affordable Reusable Speech-to-Text (EARS) in 2002, followed by Global Autonomous Language Exploitation (GALE) in 2005. Four teams participated in EARS: IBM; a team led by BBN with LIMSI and the University of Pittsburgh; Cambridge University; and a team composed of ICSI, SRI, and the University of Washington. EARS funded the collection of the Switchboard telephone speech corpus, which contained 260 hours of recorded conversations from over 500 speakers. The GALE program focused on Arabic and Mandarin broadcast news. Google's first effort at speech recognition came in 2007 after recruiting Nuance researchers. Its first product, GOOG-411, was a telephone-based directory service. Since at least 2006, the U.S. National Security Agency has employed keyword spotting, allowing analysts to index large volumes of recorded conversations and identify speech containing "interesting" keywords. Other government research programs focused on intelligence applications, such as DARPA's EARS program and IARPA's Babel program. In the early 2000s, speech recognition was dominated by hidden Markov models combined with feed-forward artificial neural networks (ANN). Later, speech recognition was taken over by long short-term memory (LSTM), a recurrent neural network (RNN) published by Sepp Hochreiter & Jürgen Schmidhuber in 1997. LSTM RNNs avoid the vanishing gradient problem and can learn "Very Deep Learning" tasks that require memories of events that happened thousands of discrete time steps earlier, which is important for speech. Around 2007, LSTMs trained with Connectionist Temporal Classification (CTC) began to outperform. In 2015, Google reported a 49 percent error‑rate reduction in its speech recognition via CTC‑trained LSTM. Transformers, a type of neural network based solely on attention, were adopted in computer vision and language modelling, and then to speech recognition. Deep feed-forward (non-recurrent) networks for acoustic modelling were introduced in 2009 by Geoffrey Hinton and his students at the University of Toronto, and by Li Deng and colleagues at Microsoft Research. In contrast to the prioer incremental improvements, deep learning decreased error rates by 30%. Both shallow and deep forms (e.g., recurrent nets) of ANNs had been explored since the 1980s. Howev

Business Controls Corporation

Business Controls Corporation is a privately held computer company that developed an application-program-generator and also a series of accounting software packages. These packages were widely enough used for various business magazines to have back-of-the-book ads for companies seeking accountants with experience in one or more of them. Computer magazines ran coverage for their SB-5 application-program-generator as from time to time new versions were released, each with new or improved features. == Early days == The company's initial offerings were packages for the DEC PDP-8, although Business Controls Corporation also wrote custom-written programs for customers. Large customers with mainframes who also used smaller systems for departmental use and distributed processing also used BCC's services. == SB-5 == The addition of an application-program-generator named SB-5 that, from specifications, could generate COBOL code was a major step forward. Although this began with supporting the DEC PDP-11, they subsequently began to support COBOL on DEC's DECsystem-10 & DECSYSTEM-20. VAX support came later. The specifications also permitted COBOL inserts and overrides: SB-5 could build an application that was all COBOL, yet only code the portions that varied from BCC's "vanilla" accounting packages. === Similar offerings === A similar idea was done for the IBM mainframe world in the form of a series of application-program-generators from Dylakor Corporation. They were named DYL-250, DYL-260, DYL-270 & DYL-280. Dylakor was acquired by Computer Associates. The specific syntax was different, but it had wider use, and - a mark of success and recognition in the industry - syntax-compatible implementations were released by a competitor. Still another alternative was Peat Marwick Mitchell's PMM2170 application-program-generator package. Like the others, it supported COBOL inserts and overrides. === Extended integration === Business Controls Corporation subsequently extended SB-5's feature set to provide support for System 1022, a product for the DECsystem-10 & DECSYSTEM-20; 1022's vendor also had a VAX/VMS (later OpenVMS) product, System 1032.

Blocknots

Blocknots were random sequences of numbers contained in a book and organized by numbered rows and columns and were used as additives in the reciphering of Soviet Union codes, during World War II. The Blocknot consisted of a booklet of fifty sheets of 5-figure random additive, 100 additive groups to a sheet. No sheet was used more than once, thus the blocknots were in effect a form of one-time pad. The Soviet Unions highest grade ciphers that were used in the East, were the 5-figure codebook enciphered with the Blocknot book, and were generally considered unbreakable. == Technical Description == Blocknots were distributed centrally from an office in Moscow. Every Blocknot contained 5-figure groups in a number of sheets, for the enciphering of 5-figure messages. The encipherment was effected by applying additives taken from the pad, of which 50-100 5-figure groups appeared. Each pad had a 5-figure number and each sheet had a 2-figure number running consecutively. There were 5 different types of Blocknots, in two different categories The Individual in which each table of random numbers was used only once. The General in which each page of the Blocknot was valid for one day. The security of the additive sequence rested on the choice of different starting points for each message. In 5-figure messages, the blocknot was one of the first 10 Groups in the message. Its position changed at long intervals, but was always easy to re-identify. The Russians differentiated between three types of blocks: The 3-block, DRIERBLOCK. I-block for Individual Block: 50 pages, additive read off in one direction only. The messages could be used and read only between 2 wireless telegraphy stations on one net. The 6-block, SECHSERBLOCK. Z-block for Circular Block: 30 pages, additive read off in either direction. The messages could be used and read, between all W/T stations in a net. The 2-block, ZWEIERBLOCK. OS-block. Used only in traffic from lower to higher formations. Two other types were used, in lower echelons. Notblock: Used in an emergency. Blocknot used for passing on traffic. The distribution of Blocknots was carried out centrally from Moscow to Army Groups then to Armies. The Army was responsible for their distribution throughout the lower levels of the army down to company level. Independent units took their cipher material with them. Occasionally the same blocknot was distributed to two units on different parts of the front, which enabled Depth to be established. Records of all Blocknots used were kept in Berlin and when a repeat was noticed a BLOCKNOT ANGEBOT message was sent out to all German Signals units, to indicate that it may have been possible to break the code using it. There was no certainty in this. A cryptanalyst with the General der Nachrichtenaufklärung stated while being interrogated by TICOM: It seems that depths of up to 8 were established at the beginning of the Russian Campaign but that no 5-figure code was broken after May 1943 German cryptanalysts who were prisoners of war stated under interrogation, that each of the figures 0 to 9 were placed en clair usually within the first ten groups of the text or sometimes at the end. One indicator was the Blocknot number and the consisted of two random figures, the figure representing the type, and the remaining two, the page of the Blocknot being used. In long messages, 000000 was placed in the message when the end of a page had been reached. == Chi number == The Chi-number was the serial numbering of all 5-figure messages passing through the hands of the Cipher Officer, starting on the first of January and ending on thirty-first December of the current year. It always appeared as the last group in an intercepted message, e.g. 00001 on the 1st January, or when the unit was newly set up. The progression of Chi-numbers was carefully observed and recorded in the form of a graph. A Russian corps had about 10 5-figure messages per day, and Army about 20-30 and a Front about 60–100. After only a relatively short time, the individual curves separated sharply and the type of formation could be recognized by the height of the Chi-number alone. == Monitoring == Blocknots were tracked in a card index, that was maintained by the Signal Intelligence Evaluation Centre (NAAS). The NAAS functionality included evaluation and traffic analysis, cryptanalysis, collation and dissemination of intelligence. The card index, which was one amongst several Card Indexes. A careful recording and study of blocks provided the positive clues in the identification and tracking of formations using 5-figure ciphers. The index was subdivided into two files: Search card index, contained all blocknots and chi-numbers whether or not they were known. Unit card index, contained only known Block and Chi-numbers. Inspector Berger, who was the chief cryptanalyst of NAAS 1 stated that the two files formed: The most important and surest instruments for identifying Russian radio nets, known to him. The Blocknots were also used in the Stationary Intercept Company (Feste), the military unit that were designed to work at a lower level to the NAAS, at the Army level and were semi-motorized, and closer to the front. The Feste used the Blocknot value along with several other parameters to build a network diagram. The network diagram was studied extensively, as part of a 6-stage process, that involved several departments within the Feste. The outcome was a metric which determined the most interesting circuit for traffic monitoring, and least interesting, where monitoring of traffic should cease. == Analysis == Johannes Marquart was a mathematician and cryptanalyst who initially worked for Inspectorate 7/VI and later led Referat Ia of Group IV of the General der Nachrichtenaufklärung. Marquart was assigned the study of the Soviet Union Blocknot traffic. Marquart and his unit conducted extensive research in an attempt to discover the method by which they were produced. All the counts which they made, however, failed to reveal any non-random characteristics in the design of the tables, and while they thought the Blocknots must have been generated by machine, they were never able to draw any concrete deductions as a result of their research. == Example == The Soviet 3rd Guard Tank Army transmits a 5-figure message with the Blocknot of 37581 (one of the first 10 groups in the message). On the same day the Block 37582 was used by the same formation. The next day 37583 appeared. Thereafter, for a period, the Army was not heard by German Wireless telegraphy intercept operators, as it was maintaining wireless silence. After a few days, an unidentified net with the Blocknot 37588 is picked up. This message net is claimed, because of the proximity of the blocks (88/83) to be the 3rd Guard Tank Army. The missing Blocknots 84-87 were presumably used in telegraphic, telephonic or courier communications. The Chi number provides confirmation of the first assumption, based on proximity of blocknots in most cases.

Social business model

The social business model is use of social media tools and social networking behavioral standards by businesses for communication with customers, suppliers, and others. Combining social networking etiquette (being helpful, transparent and authentic) with business engagement on LinkedIn (for one-to-one interaction), Twitter (for immediacy) and Facebook (for content sharing) more fully involves employees in the organization and increases customer intimacy and trust. == Overview == Traditional business models, particularly in large organizations, have had as one common characteristic careful limitation of direct contact between those within the organization and those outside of it. Only certain specific individuals (most frequently in roles such as sales, customer service and field consulting) were designated as "customer-facing" personnel. Organizations further limited outside access to internal employees through filtering mechanisms such as publishing only a main switchboard number (whether routed through a live receptionist or an interactive voice response system) and generic "sales@" or "info@" email addresses. The Cluetrain Manifesto (written by Rick Levine, Christopher Locke, Doc Searls, and David Weinberger and published in 1999) was among the first books to predict the demise of this old order and the emergence of more open business models, though most of the business world was slow to adopt the book's recommended cultural changes. Thirteen years later, authors Dion Hinchcliffe and Peter Kim added structural underpinnings to the cultural shifts outlined in The Cluetrain Manifesto in their book, Social Business by Design. The book details many of the ways social media tools and practices are being adopted within organizations, to support both internal employee collaboration and external customer engagement (which the authors describe as the "bigger problem"). == Elements == In implementing the social business model, organizations apply social networking protocols and tools in a range of areas, potentially including: Marketing Customer Support Recruiting Crowdsourcing Internal employee collaboration Sales Product Development Supply Chain Operations Investor Relations == Characteristics of organizations adopting the social business model == Organizations that fully adopt the social business model will exhibit four key characteristics: Connected – employees will be able to seamlessly engage one-on-one in real-time with other employees and individuals outside the organization (customers, prospects, partners, media, etc.) using a variety of communications methods including text chat, voice, file sharing, email, and video chat. Social – employees will follow social networking etiquette (being authentic, helpful and transparent) in external interactions. The focus will be on answering questions and providing information rather than overt sales or promotion. Presence – these conversations may originate on the company's website or elsewhere online (e.g., publication websites, industry portals, or social networking sites such as LinkedIn or Facebook). Intelligent – organizations will use in-depth analytics to monitor connections, social interactions and presence; measure corresponding business results; and continually adjust and improve practices for increased effectiveness. == Technical and functional requirements == While much of the change inherent in adopting the social business model is cultural, it also requires process changes enabled by social business technology. Functional requirements for a social business technology platform include: Analytics (including the cost of engagement as well as various measures of return on investment such as leads, sales, referrals, recommendations, and retained customers). Integration with other social media and business tools such as CRM systems, partner relationship management (PRM) software, product development, website analytics, and employee-recruiting applications. Rules-based workflow (e.g. routing a comment to the appropriate individual for a response, based on content). Geolocation (so customers or prospects can be automatically routed to local sales or customer service representatives). Content sharing. Collaboration tools. Transparency (i.e., people should know who they are engaging with) Unified communications (the ability to engage via voice, text, video, email, and share a wide variety of file types) Storage (the ability to store interactions for legal, training, compliance or compensation purposes, and purge stored data when no longer needed based on company policy or regulatory requirements). Immediacy (real-time monitoring and response).

Cryptochannel

In telecommunications, a cryptochannel is a complete system of crypto-communications between two or more holders or parties. It includes: (a) the cryptographic aids prescribed; (b) the holders thereof; (c) the indicators or other means of identification; (d) the area or areas in which effective; (e) the special purpose, if any, for which provided; and (f) pertinent notes as to distribution, usage, etc. A cryptochannel is analogous to a radio circuit.

Automated restaurant

An automated restaurant or robotic restaurant is a restaurant that uses robots to do tasks such as delivering food and drink to the tables or cooking the food. Restaurant automation means the use of a restaurant management system to automate some or occasionally all of the major operations of a restaurant establishment. More recently, restaurants are opening that have completely or partially automated their services. These may include: taking orders, preparing food, serving, and billing. A few fully automated restaurants operate without any human intervention whatsoever. Robots are designed to help and sometimes replace human labour (such as waiters and chefs). The automation of restaurants may also allow for the option for greater customization of an order. == History == === Vending machines === In the late 19th and early 20th century a number of restaurants served food solely through vending machines. These restaurants were called automats or, in Japan, shokkenki. Customers ordered their food directly through the machines. === Sushi conveyors === Yoshiaki Shiraishi is a Japanese innovator who is known for the creation of conveyor belt sushi. He had the idea following difficulty staffing his small sushi restaurant and managing the restaurant on his own. He was inspired seeing beer bottles on a conveyor belt in an Asahi brewery. Yoshiaki's restaurants are an early example of restaurant automation; they used a conveyor belt to distribute dishes around the restaurant, eliminating the need for waiters. This example of automation dates back to the Japanese economic miracle; the first of Yoshiaki's conveyor belt sushi restaurants was opened under the name Mawaru Genroku Sushi in 1958, in Osaka. === Partial automation === As of 2011, across Europe, McDonald's had already begun implementing 7,000 touch screen kiosks that could handle cashiering duties. From 2015 to 2020, Zume had an automated pizza parlor. Later companies would try to produce smaller, less ambitious devices, with one robotics company producing a machine that could automate the slowest and most repetitive parts of assembling a pizza, such as spreading pizza sauce or placing slices of pepperoni, while leaving other customizations to employees. In 2020, a restaurant in the Netherlands began trialling the use of a robot to serve guests. In September 2021, Karakuri's 'Semblr' food service robot served personalised lunches for the 4,000 employees of grocery technology solutions provider ocado Group's head offices in Hatfield, UK. 2,700 different combinations of dishes were on offer. Customers could specify in grams what hot and cold items, proteins, sauces and fresh toppings they wanted. In 2021, Columbia University School of Engineering and Applied Science engineers developed a method of cooking 3D printed chicken with software-controlled robotic lasers. The “Digital Food” team exposed raw 3D printed chicken structures to both blue and infrared light. They then assessed the cooking depth, colour development, moisture retention and flavour differences of the laser-cooked 3D printed samples in comparison to stove-cooked meat. In June 2022 a California nonprofit chain of residential communities, Front Porch, experimented with robots in dining rooms at two locations to supplement wait staff by carrying plated food and drink to tables, and removing dishes. 65% of residents found the robots helpful, with 51% saying they let the staff spend more quality time with diners. 51% of staff were "excited" and 58% said they enabled more quality time with diners. The chain has 19 senior living communities (and 35 affordable housing communities), so it has potential to expand robots to more dining rooms. It is shifting to memory care, which may affect plans. == Rationales == === Advantages === Efficiency: Automated restaurants can significantly enhance operational efficiency by minimizing human error and reducing service time. With automated ordering, payment, and food preparation systems, customers can enjoy faster service and reduced waiting times. Cost savings: By reducing the need for human staff, automated restaurants can potentially lower labor costs. This can be particularly beneficial in areas with high labor expenses, as it allows for better resource allocation and cost management. Consistency: Automation ensures consistency in food quality and presentation. With precise portion control and standardized cooking methods, customers can expect the same quality and taste in their meals every time they visit. Enhanced customer experience: Self-service kiosks and automated systems provide customers with control and convenience. They can customize their orders, browse through menu options, and pay seamlessly, creating a more interactive and satisfying dining experience. === Disadvantages === Lack of personal touch: Automated restaurants may lack the personal interaction and warmth that traditional restaurants provide. Some customers prefer the human touch, personalized recommendations, and the social aspect of dining out. Technical issues: Reliance on technology means that technical glitches and malfunctions can occur, resulting in service disruptions or delays. Maintenance and technical support become critical in ensuring smooth operations. Limited menu complexity: The automation process may be better suited for standardized menu items rather than complex or customized dishes. The ability to cater to unique dietary preferences or accommodate special requests may be limited. Employment implications: Automated restaurants may result in job losses for traditional restaurant staff, potentially impacting the local workforce. It is important to consider the social and economic implications of adopting such technology. == Locations == Automated restaurants have been opening in many countries. Examples include: Nala Restaurant in Naperville, Illinois Fritz's Railroad Restaurant in Kansas City, Kansas Výtopna, a Railway Restaurant using model trains: franchise of various restaurants and coffeehouses in the Czech Republic Bagger's Restaurant in Nuremberg, Germany FuA-Men Restaurant, a ramen restaurant located in Nagoya, Japan Fōster Nutrition in Buenos Aires, Argentina Dalu Robot Restaurant in Jinan, China Haohai Robot Restaurant in Harbin, China Robot Kitchen Restaurant in Hong Kong Robo-Chef restaurant in Tehran, Iran, started in 2017, is the first robotic and "waiterless" restaurant of the Middle East. MIT graduates opened Spyce Kitchens in downtown Boston, Massachusetts, in 2018 Foodom, under Country Garden Holdings, opened January 12, 2020, in Guangzhou, China Robot Chacha, the first robot restaurant of India, is planning to open in the capital city of New Delhi. Kura Revolving Sushi Bar, with a number of locations in the United States, uses a tablets at tables for ordering, a conveyor belt to deliver food, and robots to deliver drinks and condiments. Chipotle Mexican Grill is beginning to deploy the Hyphen Makeline, which assembles up to 350 bowls and salads automatically per hour, and Chippy, an automatic tortilla chip fryer made by Miso Robotics. Serious Dumplings in Boca Raton, Florida

ServerNet

ServerNet is a switched fabric communications link primarily used in proprietary computers made by Tandem Computers, Compaq, and HP. Its features include good scalability, clean fault containment, error detection and failover. The ServerNet architecture specification defines a connection between nodes, either processor or high performance I/O nodes such as storage devices. == History == Tandem Computers developed the original ServerNet architecture and protocols for use in its own proprietary computer systems starting in 1992, and released the first ServerNet systems in 1995. Early attempts to license the technology and interface chips to other companies failed, due in part to a disconnect between the culture of selling complete hardware / software / middleware computer systems and that needed for selling and supporting chips and licensing technology. A follow-on development effort ported the Virtual Interface Architecture to ServerNet with PCI interface boards connecting personal computers. Infiniband directly inherited many ServerNet features. As of 2017, systems still ship based on the ServerNet architecture.