Frederick Jelinek

Frederick Jelinek

Frederick Jelinek (18 November 1932 – 14 September 2010) was a Czech-American researcher in information theory, automatic speech recognition, and natural language processing. He is well known for his oft-quoted statement, "Every time I fire a linguist, the performance of the speech recognizer goes up". Jelinek was born in Czechoslovakia before World War II and emigrated with his family to the United States in the early years of the communist regime. He studied engineering at the Massachusetts Institute of Technology and taught for 10 years at Cornell University before accepting a job at IBM Research. In 1961, he married Czech screenwriter Milena Jelinek. At IBM, his team advanced approaches to computer speech recognition and machine translation. After IBM, he went to head the Center for Language and Speech Processing at Johns Hopkins University for 17 years, where he was still working on the day he died. == Personal life == Jelinek was born on November 18, 1932, as Bedřich Jelínek in Kladno to Vilém and Trude Jelínek. His father was Jewish; his mother was born in Switzerland to Czech Catholic parents and had converted to Judaism. Jelínek senior, a dentist, had planned early to escape Nazi occupation and flee to England; he arranged for a passport, visa, and the shipping of his dentistry materials. The couple planned to send their son to an English private school. However, Vilém decided to stay at the last minute and was eventually sent to the Theresienstadt concentration camp, where he died in 1945. The family was forced to move to Prague in 1941, but Frederick, his sister and mother—thanks to the latter's background—escaped the concentration camps. After the war, Jelinek entered in the gymnasium, despite having missed several years of schooling because education of Jewish children had been forbidden since 1942. His mother, anxious that her son should get a good education, made great efforts for their emigration, especially when it became clear he would not be allowed to even attempt the graduation examination. His mother hoped her son would become a physician, but Jelinek dreamed of being a lawyer. He studied engineering in evening classes at the City College of New York and received stipends from the National Committee for a Free Europe that allowed him to study at the Massachusetts Institute of Technology. About his choice of specialty, he said: "Fortunately, to electrical engineering there belonged a discipline whose aim was not the construction of physical systems: the theory of information". He obtained his Ph.D. in 1962, with Robert Fano as his adviser. In 1957, Jelinek paid an unexpected visit to Prague. He had been in Vienna and applied for a visa, hoping to see his former acquaintances again. He met with his old friend Miloš Forman, who introduced him to film student Milena Tobolová—whose screenplay had been the basis for the movie Easy Life (Snadný život). His flight back to the U.S. had a stopover in Munich, during which he called her to propose. Tobolová was considered a dissident and the authorities were not happy with her film. Jelinek asked for help from Jerome Wiesner and Cyrus Eaton, the latter who lobbied Nikita Khrushchev. Following the inauguration of John F. Kennedy, a group of Czech dissidents were allowed to emigrate in January 1961. Thanks to the lobbying, the future Milena Jelinek was one of them. After completing his graduate studies, Jelinek, who had developed an interest in linguistics, had plans to work with Charles F. Hockett at Cornell University. However these fell through and during the next ten years he continued to study information theory. Having previously worked at IBM during a sabbatical, he began full-time work there in 1972—at first on leave for Cornell, but permanently from 1974. He remained there for over twenty years. Although at first he had been offered a regular research job, upon his arrival he learned that Josef Raviv had recently been promoted to head of the newly opened IBM Haifa Research Laboratory, and became head of the Continuous Speech Recognition group at the Thomas J. Watson Research Center. Despite his team's successes in this area, Jelinek's work remained little known in his home country because Czech scientists were not allowed to participate in key conferences. After the 1989 fall of communism, Jelinek helped establish scientific relationships, regularly visiting to lecture and helping to persuade IBM to establish a computing centre at Charles University. In 1993, he retired from IBM and went to Johns Hopkins University's Center for Language and Speech Processing, where he was director and Julian Sinclair Smith Professor of Electrical and Computer Engineering. He was still working there at the time of his death; Jelinek died of a heart attack at the close of an otherwise normal workday in mid-September 2010. He was survived by his wife, daughter and son, sister, stepsister, and three grandchildren, including Sophie Gold Jelinek. == Research and legacy == Information theory was a fashionable scientific approach in the mid '50s. However, pioneer Claude Shannon wrote in 1956 that this trendiness was dangerous. He said, "Our fellow scientists in many different fields, attracted by the fanfare and by the new avenues opened to scientific analysis, are using these ideas in their own problems ... It will be all too easy for our somewhat artificial prosperity to collapse overnight when it is realized that the use of a few exciting words like information, entropy, redundancy, do not solve all our problems." During the next decade, a combination of factors shut down the application of information theory to natural language processing (NLP) problems—in particular machine translation. One factor was the 1957 publication of Noam Chomsky's Syntactic Structures, which stated, "probabilistic models give no insight into the basic problems of syntactic structure". This accorded well with the philosophy of the artificial intelligence research of the time, which promoted rule-based approaches. The other factor was the 1966 ALPAC report, which recommended that the government should stop funding research into machine translation. ALPAC chairman John Pierce later said that the field was filled with "mad inventors or untrustworthy engineers". He said that the underlying linguistic problems must be solved before attempts at NLP could be reasonably made. These elements essentially halted research in the field. Jelinek had begun to develop an interest in linguistics after the immigration of his wife, who initially enrolled in the MIT linguistics program with the help of Roman Jakobson. Jelinek often accompanied her to Chomsky's lectures, and even discussed the possibility of changing orientation with his adviser. Fano was "really upset", and after the failure of his project with Hockett at Cornell, he did not return to this field of research until starting work at IBM. The scope of research at IBM was considerably different from that of most other teams. According to Mark Liberman, "While [Jelinek] was leading IBM's effort to solve the general dictation problem during the decade or so following 1972, most other U.S. companies and academic researchers were working on very limited problems ... or were staying out of the field entirely". Jelinek regarded speech recognition as an information theory problem—a noisy channel, in this case the acoustic signal—which some observers considered a daring approach. The concept of perplexity was introduced in their first model, New Raleigh Grammar, which was published in 1976 as the paper "Continuous Speech Recognition by Statistical Methods" in the journal Proceedings of the IEEE. According to Young, the basic noisy channel approach "reduced the speech recognition problem to one of producing two statistical models". Whereas New Raleigh Grammar was a hidden Markov model, their next model, called Tangora, was broader and involved n-grams, specifically trigrams. Even though "it was obvious to everyone that this model was hopelessly impoverished", it was not improved upon until Jelinek presented another paper in 1999. The same trigram approach was applied to phones in single words. Although the identification of parts of speech turned out not to be very useful for speech recognition, tagging methods developed during these projects are now used in various NLP applications. The incremental research techniques developed at IBM eventually became dominant in the field after DARPA, in the mid-80s, returned to NLP research and imposed that methodology to participating teams, shared common goals, data, and precise evaluation metrics. The Continuous Speech Recognition Group's research, which required large amounts of data to train the algorithms, eventually led to the creation of the Linguistic Data Consortium. In the 1980s, although the broader problem of speech recognition remained unsolved, they sought to apply the methods developed to other problems; machine translat

Fuse Mediation Router

Fuse Mediation Router is an open source tool for integrating services using Enterprise Integration Patterns based on Apache Camel for use in enterprise IT organizations. It is certified, productized and fully supported by the people who wrote the code. Fuse Mediation Router uses a standard method of notation to go from diagram to implementation without coding. Fuse Mediation Router is a rule-based routing and process mediation engine that combines the ease of basic POJO development with the clarity of the standard Enterprise Integration Patterns. It can be deployed inside any container or be used stand-alone, and works directly with any kind of transport or messaging model to rapidly integrate existing services and applications. Fuse Mediation Router is now a part of Red Hat JBoss Fuse. == Tooling == FuseSource offers graphical, Eclipse-based tooling for Apache Camel for download.

Polyfill (programming)

In software development, a polyfill is code that implements a new standard feature of a deployment environment within an old version of that environment that does not natively support the feature. Most often, it refers to JavaScript code that implements an HTML5 or CSS web standard, either an established standard (supported by some browsers) on older browsers, or a proposed standard (not supported by any browsers) on existing browsers. Polyfills are also used in PHP and Python. Polyfills allow web developers to use an API regardless of whether or not it is supported by a browser, and usually with minimal overhead. Typically they first check if a browser supports an API, and use it if available, otherwise using their own implementation. Polyfills themselves use other, more supported features, and thus different polyfills may be needed for different browsers. The term is also used as a verb: polyfilling is providing a polyfill for a feature. == Definition == The term is a neologism, coined by Remy Sharp, who required a word that meant "replicate an API using JavaScript (or Flash or whatever) if the browser doesn’t have it natively" while co-writing the book Introducing HTML5 in 2009. Formally, "a shim is a library that brings a new API to an older environment, using only the means of that environment." Polyfills exactly fit this definition; the term shim was also used for early polyfills. However, to Sharp shim connoted non-transparent APIs and workarounds, such as spacer GIFs for layout, sometimes known as shim.gif, and similar terms such as progressive enhancement and graceful degradation were not appropriate, so he invented a new term. The term is based on the multipurpose filling paste brand Polyfilla, a paste used to cover up cracks and holes in walls, and the meaning "fill in holes (in functionality) in many (poly-) ways." The word has since gained popularity, particularly due to its use by Paul Irish and in Modernizr documentation. The distinction that Sharp makes is: What makes a polyfill different from the techniques we have already, like a shim, is this: if you removed the polyfill script, your code would continue to work, without any changes required in spite of the polyfill being removed. This distinction is not drawn by other authors. At times various other distinctions are drawn between shims, polyfills, and fallbacks, but there are no generally accepted distinctions: most consider polyfills a form of shim. The term polyfiller is also occasionally found. == Examples == === core-js === core-js is one of the most popular JavaScript standard library polyfills. Includes polyfills for ECMAScript up to the latest version of the standard: promises, symbols, collections, iterators, typed arrays, many other features, ECMAScript proposals, some cross-platform WHATWG / W3C features and proposals like URL. You can load only required features or use it without global namespace pollution. It can be integrated with Babel, which allows it to automatically inject required core-js modules into your code. === html5shiv === In IE versions prior to 9, unknown HTML elements like

and