2018 Google data breach

The 2018 Google data breach was a major data privacy scandal in which the Google+ API exposed the private data of over five hundred thousand users. Google+ managers first noticed harvesting of personal data in March 2018, during a review following the Facebook–Cambridge Analytica data scandal. The bug, despite having been fixed immediately, exposed the private data of approximately 500,000 Google+ users to the public. Google did not reveal the leak to the network's users. In November 2018, another data breach occurred following an update to the Google+ API. Although Google found no evidence of failure, approximately 52.5 million personal profiles were potentially exposed. In August 2019, Google declared a shutdown of Google+ due to low use and technological challenges. == Overview of Google+ == Google+ was launched in June 2011 as an invite-only social network, but was opened for public access later in the year. It was managed by Vic Gundotra. Similar to Facebook, Google+ also included key features Circles, Hangouts and Sparks. Circles let users personalize their social groups by sorting friends into different categories. Once allowed into a Circle, users could regulate information in their individual spaces. Hangouts included video chatting and instant messaging between users. Sparks allowed Google to track users' past searches to find news and content related to their interests. Google+ was linked to other Google services, such as YouTube, Google Drive and Gmail, giving it access to roughly 2 billion user accounts. However, less than 400 million consumers actively used Google+, with 90% of those users using it for less than five seconds. == The breaches == In March 2018, Google developers found a data breach within the Google+ People API in which external apps acquired access to Profile fields that were not marked as public. According to The Wall Street Journal, Google didn’t disclose the breach when it was first discovered in March to avoid regulatory scrutiny and reputational damage. 500,000 Google+ accounts were included in the breach, which allowed 438 external apps unauthorized access to private users' names, emails, addresses, occupations, genders and ages. This information was available between 2015 and 2018. Google found no evidence of any user's personal information being misused, nor that any third-party app developers were aware of the leak. In November 2018, a software update created another data breach within the Google+ API. The bug impacted 52.5 million users, where, similarly to the March breach, unauthorized apps were able to access Google+ profiles, including users' names, email addresses, occupations and ages. Apps could not access financial information, national identification, numbers, or passwords. Blog posts, messages and phone numbers also remained inaccessible if marked as private. Unlike the previous breach, access was only available for six days before Google+ learned of the breach. Once more, Google+ found no evidence of data being misused by third-party developers. == Responses == In October 2018, the Wall Street Journal published an article outlining the initial breach and Google's decision to not disclose it to users. At the time, there was no federal law that required Google to inform their consumers of data breaches. Google+ originally did not disclose the breach out of fears of being compared to Facebook's recent data leak and subsequent loss of consumer confidence. In response to the Wall Street Journal article, Google announced the shutdown of Google+ in August 2019. After the second data leak, the date was moved to April 2019. In response to the data breach, enterprise consumers were notified of the bug's impact and given instructions on how to save, download and delete their data prior to the Google+ shut down. Google's Privacy and Data Protection Office found no misuse of user data. Prior to the Google+ shutdown, Google set a 10-month period in which users could download and migrate their data. After the 10-month period, user content was deleted. On 4 February 2019, consumers were no longer able to create new Google+ profiles. Google shut down Google+ APIs on 7 March 2019 to ensure that developers did not continue to rely on the APIs prior to the Google+ shutdown. Google is the principal entity of its parent company, Alphabet Inc. After the data breach, Alphabet Inc. share prices fell by 1% to $1,157.06 on 9 October 2018 after an earlier drop of $1,135.40 that morning, the lowest price since 5 July 2018. After the publication of The Wall Street Journal article, share prices dropped as low as 2.1% in two days on 10 October 2018. Share prices steadily increased from this point and met the 8 October 2018 share price on 5 February 2019. Google planned to rebuild Google+ as a corporate enterprise network. Google Play will now assess which apps can ask for permission to access the user's SMS data. Only the default app for telephone distribution is able to make requests. Prior to the data breaches, apps were able to request access to all of a consumer's data simultaneously. Now, each app must request permission for each aspect of a consumer's profile.

Xiaoice

Xiaoice (Chinese: 微软小冰; pinyin: Wēiruǎn Xiǎobīng; lit. 'Microsoft Little Ice', IPA [wéɪɻwânɕjâʊpíŋ]) is an AI system developed by Microsoft (Asia) Software Technology Center (STCA) in 2014 based on an emotional computing framework. In July 2018, Microsoft Xiaoice released the 6th generation. Xiaoice Company, formerly known as AI Xiaoice Team of Microsoft Software Technology Center Asia, was Microsoft's largest independent R&D team for AI products. Founded in China in December 2013 with an expanded Japanese R&D team established in September 2014, this team is distributed in Beijing, Suzhou, and Tokyo, etc. with its technical products covering Asia. On 13 July 2020, Microsoft spun off its Xiaoice business into a separate company. As of 2021, the AI chatbots created and hosted by the Xiaoice framework accounted for about 60% of total global AI interactions. == Platforms, languages and countries == Xiaoice exists on more than 40 platforms in four countries (China, Japan, USA and Indonesia) including apps such as WeChat, QQ, Weibo and Meipai in China, and Facebook Messenger in USA and LINE in Japan. == Introduction == On 13 July 2020, Microsoft spun off its Xiaoice business into a separate company, aiming at enabling the Xiaoice product line to accelerate the pace of local innovation and commercialization, and appointed Dr. Harry Shum, former global executive VP of Microsoft, as the chairman of the new company, Li Di, Microsoft Partner of Products in Microsoft STCA, as the CEO, and Cliff, Chief R&D Director, as the GM of the Japan branch. The new company will continue to use the brands of Xiaoice China and Rinna Japan. As of 2022, the single brand of Xiaoice has covered 660 million online users, 1 billion third-party smart devices and 900 million content viewers in the aforementioned countries. Xiaoice's customers include China Merchants Group, Winter Sports Center of the General Administration of Sport of China, China Textile Information Center, China Unicom, China Foreign Exchange Trade System, Hong Kong Securities and Futures Commission (SFC), Wind Information, BMW, Nissan, SAIC Motor, BAIC Group, Nio Inc., XPeng, HiPhi, Vanke, Wensli, etc. The Xiaoice Avatar Framework has incubated tens of millions of AI Beings, such as Xiaoice, Rinna, the Expo exhibitor Xia Yubing, the singer He Chang, the anchor F201, the human observer MERROR, anime robot character Roboko, and other; == Application == === Poet === In May 2017, the first AI-authored collection of poems in China—The Sunshine Lost Windows was published by Xiaoice. === Singer === Xiaoice has released dozens of songs with the similar quality to human singers, including I Know I New, Breeze, I Am Xiaoice, Miss You etc. The 4th version of the DNN singing model allows Xiaoice to learn more details. For example, Xiaoice can produce this breathing sound along with her singing as human. === Kid audio-books reciter === Xiaoice can automatically analyze the stories, to choose the suitable tones and characters to finish the entire process of creating the audio. === Designer === By learning the melodies of the songs and the landmarks about different cities, Xiaoice can create visual artworks of skylines when listening to the songs related to this city. Skyline Series T-shirts designed by Xiaoice have been jointly launched with SELECTED and been sold in stores. === TV and radio hostess === Xiaoice has hosted 21 TV programs and 28 Radio programs, such as CCTV-1 AI Show, Dragon TV Morning East News, Hunan TV My Future, several daily radio programs for Jiangsu FM99.7, Hunan FM89.3, Henan FM104.1 etc. === "AI being" === An "AI being" is a concept proposed by the Xiaoice team in 2019. According to the "White Book of China Virtual Human Development Industry in 2022" released by Frost & Sullivan and LeadLeo, the white paper cites six elements of an AI being proposed by the Xiaoice team, including: Persona, Attitude, Biological Characteristic, Creation, Knowledge and Skill. On May 16, 2023, Xiaoice released their "GPT Clones" as its "GPT Human Cloning Plan." The program is aimed at replicating celebrities, public figures, and regular people. As of June 2023, Xiaoice had launched more than 300 "GPT Clones." People were invited to register via WeChat in China and Japan. A major point of focus for Xiaoice with their AI Beings is having virtual partners. A paid fee allow for more complex responses, voice messages, and more. == Community feedback == Bill Gates mentioned Xiaoice during his speech at the Peking University: "Some of you may have had conversations with Xiaoice on Weibo, or seen her weather forecasts on TV, or read her column in the Qianjiang Evening News." '"Xiaoice has attracted 45 million followers and is quite skilled at multitasking. And I’ve heard she’s gotten good enough at sensing a user’s emotional state that she can even help with relationship breakups." According to Mr Li Di, vice President of Microsoft (Asia) Internet Engineering School, Xiaoice started writing poems since last year. Based on the data base that includes works of 519 Chinese contemporary poets since 1920s, a 100 hour long training session was conducted to allow Xiaoice to acquire the ability to write poems. What is more impressive is that Xiaoice has never been spotted as a bot while publishing poems on various forums and traditional literary under an alias. == Controversy == In 2017, Xiaoice was taken offline on WeChat after giving user responses critical to the Chinese government. It was subsequently censored and the bots will avoid and sidestep any inquiries using politically sensitive terms and phrases. == Activity == On September 22, 2021, Xiaoice Company and Microsoft Software Technology Center Asia (STCA) jointly held the 9th generation Xiaoice annual press conference in Beijing.Upgrading of Core Technologies of the 9th Generation Xiaoice Avatar Framework，1st First-party Social Platform APP "Xiaoice Island" from Xiaoice, WeChat Xiaoice has been reopened and other information == Regional varieties of Xiaoice == China: Xiaoice, launched in 2014 Japan: りんな, launched in 2015 America: Zo, launched in 2016 – discontinued summer 2019 India: Ruuh, launched in 2017 – discontinued June 21, 2019 Indonesia: Rinna, launched in 2017

Optical recording

The history of optical recording can be divided into a few number of distinct major contributions. The pioneers of optical recording worked mostly independently, and their solutions to the many technical challenges have very distinctive features, such as reflective disc (Compaan and Kramer) transparent disc (Gregg) floppy disc (Russell) rigid disc (Compaan and Kramer) focused laser beam for read-out through transparent substrate (Compaan and Kramer). == Gregg 1958 == Laserdisc technology, using a transparent disc, was invented by David Paul Gregg in 1958 (and patented in 1970 and 1990). By 1969 Philips had developed a videodisc in reflective mode, which has great advantages over the transparent mode. MCA and Philips decided to join their efforts. They first publicly demonstrated the videodisc in 1972. Laserdisc was first available on the market, in Atlanta, on December 15, 1978, two years after the VHS VCR and four years before the CD, which is based on Laserdisc technology. Philips produced the players and MCA produced the discs. The Philips/MCA cooperation was not successful, and discontinued after a few years. Several of the scientists responsible for the early research (John Winslow, Richard Wilkinson and Ray Dakin) founded Optical Disc Corporation (now ODC Nimbus). == Russell 1965 == While working at Pacific Northwest National Laboratory, James Russell invented an optical storage system for digital audio and video, patenting the concept in 1970. The earliest patents by Russell, US 3,501,586, and 3,795,902 were filed in 1966, and 1969. respectively. He built prototypes, and the first was operating in 1973. Russell had found a way to record digital information onto a photosensitive plate in tiny dark spots, each spot one micrometre from centre to centre, with a laser that wrote the binary patterns. Russell's first optical disc was distinctly different from the eventual compact disc product: the disc in the player was not read by laser light. A key characteristic of Russell's invention is that a laser is not used for the reading the disc, instead the entire disc or oblong sheet to be read is illuminated by a large playback light source at the back of the transparent foil. As a result, the information density is relatively low. By 1985, Russell held over 25 patents to various technologies related to optical recording and playback. Russell's intellectual property was purchased by Optical Recording Corporation (ORC) in Toronto in 1985, and this firm notified a number of CD manufacturers that their CD technology was based on patents held by ORC. In 1987, ORC signed an agreement with Sony whereby Sony paid for licensing of the technology. Further licenses followed from Philips and others. Warner Communications did not sign, and was sued by ORC. In 1992, the large CD manufacturer, now called Time Warner, was ordered to pay ORC US$30 million in patent violations. In the 1970 patent, the spot diameter was around 10 micrometres. Thus, the areal information density was around a factor hundred less than that of the CD as later developed. Russell continued to refine the concept throughout the 1970s. Philips and Sony, however, were able to put far greater resources into the parallel development of the concept, arriving at a smaller and more sophisticated product in just a few years. Russell's various partners and ventures failed to produce a single consumer product. == Korpel 1968 == Adrianus Korpel worked for the Zenith Electronics Corporation, when he developed very early optical videodisc systems, including holographic storage. == Kramer and Compaan 1969 == The Philips development of the videodisc technology began in 1969 with efforts by Dutch physicists Klaas Compaan and Piet Kramer to record video images in holographic form on disc. Their prototype Laserdisc shown in 1972 used a laser beam in reflective mode to read a track of pits using an FM video signal. Together with MCA, Philips brought the optical videodisk to market in 1978. The cooperation between Philips and MCA did not last long, and discontinued after a few years. == Immink and Doi 1979 == The Compact Disc (CD), which is based on MCA/Philips Laserdisc technology, was developed by a taskforce of Sony and Philips in 1979–1980. Toshi Doi and Kees Schouhamer Immink created the digital technologies that turned the analog Laserdisc into a high-density low-cost digital audio disc. The CD, available on the market since October 1982, remains the standard physical medium for sale of commercial audio recordings Standard CDs have a diameter of 120 mm and can hold up to 80 minutes of audio (700 MB of data). The Mini CD has various diameters ranging from 60 to 80 mm; they are sometimes used for CD singles or device drivers, storing up to 24 minutes of audio. The technology was later adapted and expanded to include data storage CD-ROM, write-once audio and data storage CD-R, rewritable media CD-RW, Super Audio CD (SACD), Video Compact Discs (VCD), Super Video Compact Discs (SVCD), PhotoCD, PictureCD, CD-i, and Enhanced CD. CD-ROMs and CD-Rs remain widely used technologies in the computer industry. The CD and its extensions have been extremely successful: in 2004, worldwide sales of CD audio, CD-ROM, and CD-R reached about 30 billion discs. By 2007, 200 billion CDs had been sold worldwide.

Elonis v. United States

Elonis v. United States, 575 U.S. 723 (2015), was a United States Supreme Court case concerning whether conviction of threatening another person over interstate lines (under 18 U.S.C. § 875(c)) requires proof of subjective intent to threaten or whether it is enough to show that a "reasonable person" would regard the statement as threatening. In controversy were the purported threats of violent rap lyrics written by Anthony Douglas Elonis and posted to Facebook under a pseudonym. The ACLU filed an amicus brief in support of the petitioner. It was the first time the Court has heard a case considering true threats and the limits of speech on social media. == Background == In May 2010, Elonis was in the process of divorce and made a number of public Facebook posts. Prior to his postings, he had lost his job at an amusement park. He "posted the script of a sketch" by The Whitest Kids U' Know, which originally referenced saying "I want to kill the President of the United States" and replaced the president with his wife: Elonis ended the post with this statement: "Art is about pushing limits. I'm willing to go to jail for my constitutional rights. Are you?" A week later, Elonis posted about local law enforcement and a kindergarten class, which caught the attention of the Federal Bureau of Investigation. Then, he wrote a post on Facebook about one of the agents who visited him: He concluded: == Arrest and Conviction == These actions led to Elonis's arrest on December 8, 2010. He was indicted by a grand jury on five counts of threats to his estranged ex-wife, park employees and visitors, local law enforcement, an FBI agent, and a kindergarten class that had been relayed through interstate communication. At the district court, Elonis moved to dismiss the indictment for failing to allege that he had intended to threaten anyone, claiming his Facebook post was not were not intended as a threat. He argued that, as an aspiring rap artist, his posts were intended to be a form of artistic expression to help him cope with his recent loses. According to him, he did not mean anything said in his posts in a literal sense. His motion was denied. He requested a jury instruction that "the government must prove that he intended to communicate a true threat", which was also denied. He was convicted on the last four of the five counts, and was sentenced to 44 months in prison and three years on supervised release. He appealed unsuccessfully to the Third Circuit, renewing his challenge to the jury instructions. He then appealed to the U.S. Supreme Court based on lack of any attempt to show intent to threaten and on First Amendment rights. == Decision == On June 1, 2015, the U.S. Supreme Court reversed Elonis's conviction in an 8–1 decision. Chief Justice John Roberts wrote for a seven-justice majority, Samuel Alito authored an opinion concurring in part and dissenting in part, and Clarence Thomas authored a dissenting opinion. The finding of the circuit court was reversed and the matter remanded. === Majority opinion === The majority opinion, written by Roberts, did not rule on First Amendment matters or on the question of whether recklessness was sufficient mens rea to show intent. It ruled that mens rea was required to prove the commission of a crime under §875(c). Importantly, the mens rea issue had been preserved for review, since Elonis had raised that objection at every stage of the previous proceedings. The government contended that the presence of the words "intent to extort" in §875(b) and §875(d) implied that the absence in §875(c) was constructive. The court disagreed, holding that the absence of the language in §875(c) was because the section was intended to have a broader scope than threats relating to extortion. The opinion drew on many Supreme Court cases holding that in criminal law, mens rea was required though it had not been mentioned explicitly in statute. Consequently, the Supreme Court ruled in favor of Elonis. === Alito's concurrence === Justice Samuel Alito, concurring in part and dissenting in part, opined that while agreeing that mens rea was required and specifically that showing negligence was not sufficient, the court should have ruled on the question of recklessness. He further opined that recklessness was sufficient to show a crime under that provision on the basis that going further would amount to amending the statute, rather than interpreting it. Since Elonis explicitly argued that recklessness was not sufficient, Alito said: I would therefore remand for the Third Circuit to determine if Elonis’s failure (indeed, refusal) to argue for recklessness prevents reversal of his conviction. The Third Circuit should also have the opportunity to consider whether the conviction could be upheld on harmless error grounds. Alito also addressed the First Amendment question, elided by the majority opinion. He held that "lyrics in songs that are performed for an audience or sold in recorded form are unlikely to be interpreted as a real threat to a real person. ... Statements on social media that are pointedly directed at their victims, by contrast, are much more likely to be taken seriously." === Thomas's dissent === Justice Clarence Thomas, dissenting, wrote against discarding the "general intent" standard without replacing it with a clearer standard. Thomas argued that "there is no historical practice requiring more than general intent when a statute regulates speech." Thomas cited Rosen v. United States, arguing that general intent was sufficient in this case. However, the majority opinion offers refutation in that Rosen turned on ignorance of the law: knowledge as to whether material was legally obscene, not on whether it was intended to be obscene. Thomas also supported the government's claim that the presence of "intent to extort" language in the adjacent §875(b) and did not address the majority's reasoning on that language. Thomas used precedent, notably from the states and 18th-century England based on other but similar and, arguably, influencing legislation to support his "general intent" claim. Thomas also drew a parallel with general intent in tort. While he sought to address the First Amendment issues, he never strayed far from "general intent". == Aftermath == On remand, the Third Circuit reaffirmed the conviction "concluding beyond a reasonable doubt that Elonis would have been convicted if the jury had been properly instructed" and therefore was harmless error. In 2022, Elonis was once again arrested and indicted on three counts of cyberstalking involving three people. It was discovered that between 2018 and 2021, Elonis had sent numerous threatening messages over email, text, voice mail, and social media platforms like Twitter to a former prosecutor of the Eastern District of Pennsylvania, his ex-girlfriend, and ex-wife. On August 5, after a five-day trial, Elonis was found guilty on all three counts, and on March 23, 2023, he was sentenced by U.S. District Court Judge Edward G. Smith of Easton, Pennsylvania to twelve years and seven months in prison.

Deluxe Media

Deluxe Media Inc., also known simply as Deluxe and formerly Deluxe Entertainment Services Group, Inc., is an American multinational multimedia and entertainment service provisions company owned by Platinum Equity, founded in 1915 by Hungarian-born American film producer William Fox and headquartered in Burbank, California. The company services multiple clients in the film, television, digital content and advertising industries across the globe, and has been recognized with 10 Academy Awards for scientific and technical achievements, including developments in CinemaScope pictures (as part of 20th Century Fox) and more recently for a process of creating archival separations from digital image data. == History == Deluxe began as a film processing laboratory established in 1915 by William Fox under the name De Luxe as part of his eponymous film conglomerate corporation in Fort Lee, New Jersey. In 1916, Fox Film Corporation opened its studio in Hollywood on 13 acres at Sunset and Western. The first Deluxe film laboratory on the west coast was built on the south side of the lot (Fernwood and Serrano), and the laboratory was moved to the new Fox studios building on Manhattan's west side in 1919, where it remained for over 40 years. The "business manager" (later president) of the laboratory was Alan E. Freedman, who guided the company into the 1960s. In 1927, Fox (Deluxe) received a patent for sound-on-film, the Fox Movietone system. In 1927, "Sunrise: A Song of Two Humans," an early Movietone film, opened. Fox Movietone News, ran weekly in theaters until 1963. During the Great Depression, Fox Film Corporation encountered financial difficulties. Among the actions taken to maintain liquidity, Fox sold the laboratories in 1932 to Freedman, who renamed the operation Deluxe. Under Freedman's leadership, Deluxe added two more plants in Chicago and Toronto. In January 1934, Fox was granted an option to rebuy DeLuxe before December 31, 1938. On 31 May 1935, under Sidney Kent, Fox merged his film company with Twentieth Century Pictures to form The Twentieth Century-Fox Film Corporation following a bank-infused reorganisation. The merged company then exercised this option in July 1936, with Freedman remaining as president. In 1953, Deluxe developed the widescreen format CinemaScope. Titles included "There's No Business Like Show Business" (1954) and "The Seven Year Itch" (1955). Other innovations included the processing and sound striping of CinemaScope, and were patented and/or received Academy awards. In 1962 Freedman retired. In the 1960s, Deluxe closed its New York plant, followed by its plants in Chicago and Toronto, as motion picture production declined on the East Coast. In 1972, Deluxe began large volume videocassette production, with a billion by 1996. In 1990, The Rank Organisation acquired Deluxe from Fox. In 2000, Deluxe began large volume DVD production. In 2006, The Rank Organisation sold Deluxe Film Group to MacAndrews & Forbes, renamed Deluxe Entertainment Services Group. On 9 February 2012, Deluxe acquired Hong Kong–based visual effects and post-production company, Centro Digital Pictures, with its founder John Chu remaining as president while reporting to Alaric McAusland, managing director for Deluxe in Australia. In May 2014, Deluxe shut down its Los Angeles plant at Sunset & Western Studios complex, where other studios themselves were demolished way back in 1971. Also that same year, Deluxe closed the Hollywood film labs, and they gave thousands of orphaned film elements to the Academy Film Archive. The Deluxe Laboratories Collection at the Academy Film Archive consists of over 7,500 35mm and 16mm film elements of various motion pictures dating back to the early 1960s. On 22 April 2015, Deluxe and its longtime competitor, Technicolor S.A., announced that they had entered into a binding agreement to create a new joint venture known as Deluxe Technicolor Digital Cinema which will specialize in cinema mastering, distribution and management services. Deluxe got acquired on 4 September 2019 by creditors in a debt-for-equity swap to avoid bankruptcy. On 3 October 2019, Deluxe filed for bankruptcy, pending in the Southern District of New York. The same month on the 24th, the company received court approval to emerge from bankruptcy with a comprehensive restructuring plan. On July 1, 2020, Platinum Equity agreed to acquire the distribution division of Deluxe and re-unite with former CEO Cyril Drabinsky who would merge CineVizion, a film distribution company he founded after leaving Deluxe in 2016, into it. The companies Company 3 and Method Studios which formed the creative divisions of Deluxe were sold to Framestore in November 2020.

History of natural language processing

The history of natural language processing describes the advances of natural language processing. There is some overlap with the history of machine translation, the history of speech recognition, and the history of artificial intelligence. == Early history == The history of machine translation dates back to the seventeenth century, when philosophers such as Leibniz and Descartes put forward proposals for codes which would relate words between languages. All of these proposals remained theoretical, and none resulted in the development of an actual machine. The first patents for "translating machines" were applied for in the mid-1930s. One proposal, by Georges Artsrouni, was simply an automatic bilingual dictionary using paper tape. The other proposal, by Peter Troyanskii, a Russian, was more detailed. Troyanskii’s proposal included both the bilingual dictionary and a method for dealing with grammatical roles between languages, based on Esperanto. == Logical period == In 1950, Alan Turing published his famous article "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence. This criterion depends on the ability of a computer program to impersonate a human in a real-time written conversation with a human judge, sufficiently well that the judge is unable to distinguish reliably — on the basis of the conversational content alone — between the program and a real human. In 1957, Noam Chomsky’s Syntactic Structures revolutionized Linguistics with 'universal grammar', a rule-based system of syntactic structures. The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English. The authors claimed that within three or five years, machine translation would be a solved problem. However, real progress was much slower, and after the ALPAC report in 1966, which found that ten years long research had failed to fulfill the expectations, funding for machine translation was dramatically reduced. Little further research in machine translation was conducted until the late 1980s, when the first statistical machine translation systems were developed. Some notably successful NLP systems developed in the 1960s were SHRDLU, a natural language system working in restricted "blocks worlds" with restricted vocabularies. In 1969 Roger Schank introduced the conceptual dependency theory for natural language understanding. This model, partially influenced by the work of Sydney Lamb, was extensively used by Schank's students at Yale University, such as Robert Wilensky, Wendy Lehnert, and Janet Kolodner. In 1970, William A. Woods introduced the augmented transition network (ATN) to represent natural language input. Instead of phrase structure rules ATNs used an equivalent set of finite-state automata that were called recursively. ATNs and their more general format called "generalized ATNs" continued to be used for a number of years. During the 1970s many programmers began to write 'conceptual ontologies', which structured real-world information into computer-understandable data. Examples are MARGIE (Schank, 1975), SAM (Cullingford, 1978), PAM (Wilensky, 1978), TaleSpin (Meehan, 1976), QUALM (Lehnert, 1977), Politics (Carbonell, 1979), and Plot Units (Lehnert 1981). During this time, many chatterbots were written including PARRY, Racter, and Jabberwacky. == Statistical period == Up to the 1980s, most NLP systems were based on complex sets of hand-written rules. Starting in the late 1980s, however, there was a revolution in NLP with the introduction of machine learning algorithms for language processing. This was due both to the steady increase in computational power resulting from Moore's law and the gradual lessening of the dominance of Chomskyan theories of linguistics (e.g. transformational grammar), whose theoretical underpinnings discouraged the sort of corpus linguistics that underlies the machine-learning approach to language processing. Some of the earliest-used machine learning algorithms, such as decision trees, produced systems of hard if-then rules similar to existing hand-written rules. Increasingly, however, research has focused on statistical models, which make soft, probabilistic decisions based on attaching real-valued weights to the features making up the input data. The cache language models upon which many speech recognition systems now rely are examples of such statistical models. Such models are generally more robust when given unfamiliar input, especially input that contains errors (as is very common for real-world data), and produce more reliable results when integrated into a larger system comprising multiple subtasks. === Datasets === The emergence of statistical approaches was aided by both increase in computing power and the availability of large datasets. At that time, large multilingual corpora were starting to emerge. Notably, some were produced by the Parliament of Canada and the European Union as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government. Many of the notable early successes occurred in the field of machine translation. In 1993, the IBM alignment models were used for statistical machine translation. Compared to previous machine translation systems, which were symbolic systems manually coded by computational linguists, these systems were statistical, which allowed them to automatically learn from large textual corpora. Though these systems do not work well in situations where only small corpora is available, so data-efficient methods continue to be an area of research and development. In 2001, a one-billion-word large text corpus, scraped from the Internet, referred to as "very very large" at the time, was used for word disambiguation. To take advantage of large, unlabelled datasets, algorithms were developed for unsupervised and self-supervised learning. Generally, this task is much more difficult than supervised learning, and typically produces less accurate results for a given amount of input data. However, there is an enormous amount of non-annotated data available (including, among other things, the entire content of the World Wide Web), which can often make up for the inferior results. == Neural period == Neural language models were developed in 1990s. In 1990, the Elman network, using a recurrent neural network, encoded each word in a training set as a vector, called a word embedding, and the whole vocabulary as a vector database, allowing it to perform such tasks as sequence-predictions that are beyond the power of a simple multilayer perceptron. A shortcoming of the static embeddings was that they didn't differentiate between multiple meanings of homonyms. Yoshua Bengio developed the first neural probabilistic language model in 2000. Novel algorithms, availability of larger datasets and higher processing power made possible training of larger and larger language models. Attention mechanism was introduced by Bahdanau et al. in 2014. This work laid the foundations for the famous "Attention Is All You Need" paper that introduced the Transformer architecture in 2017. The concept of large language model (LLM) emerged in late 2010s. LLM is a language model trained with self-supervised learning on vast amount of text. Earliest public LLMs had hundreds of millions of parameters, but this number quickly rose to billion and even trillions. In recent years, advancements in deep learning and large language models have significantly enhanced the capabilities of natural language processing, leading to widespread applications in areas such as healthcare, customer service, and content generation. == Software ==

M-DISC

M-DISC (Millennial Disc) is a write-once optical disc technology introduced in 2009 by Millenniata, Inc. and available as DVD and Blu-ray discs. == Overview == M-DISC's design is intended to provide archival media longevity. M-Disc claims that properly stored M-DISC DVD recordings will last up to 1000 years. The M-DISC DVD looks like a standard disc, except it is almost transparent with later DVD and BD-R M-Disks having standard and inkjet printable labels. The patents protecting the M-DISC technology assert that the data layer is a glassy carbon material that is substantially inert to oxidation and has a melting point of 200–1000 °C (392–1832 °F). M-Discs are readable by most regular DVD players made after 2005 and Blu-Ray and BDXL disc drives and writable by most made after 2011. Available recording capacities conform to standard DVD/Blu-ray sizes: 4.7 GB DVD+R to 25 GB BD-R, 50 GB BD-R and 100 GB BDXL. == History == M-DISC developer Millenniata, Inc. was co-founded by Brigham Young University professors Barry Lunt, Matthew Linford, CEO Henry O'Connell and CTO Doug Hansen. The company was incorporated on May 13, 2010, in American Fork, Utah. Millenniata, Inc. officially went bankrupt in December 2016. Under the direction of CEO Paul Brockbank, Millenniata had issued convertible debt. When the obligation for conversion was not satisfied, the company defaulted on the debt payment and the debt holders took possession of all of the company's assets. The debt holders subsequently started a new company, Yours.co, to sell M-DISCs and related services. As of the 2020s, there are only 2 licensed manufacturers of M-Discs: Ritek, sold under the Ritek and RiDATA brands, and Verbatim with co-branded discs, marketed as the "Verbatim M-DISC". 128 GB BDXL never made it to market due to the 2016 bankruptcy. Early in 2022, Verbatim changed the formulation of their M-DISC branded Blu-rays. These new discs could be written at a faster rate than the previous ones – 6× speed instead of 4×. The new discs also had different colouration and markings compared with older version. Later in the year customers accused Verbatim of selling an inferior product and deceptive marketing. Verbatim responded that the new discs were a further development of the older discs and should have the same longevity, and that the technical changes therein were responsible for the altered appearance and higher write speeds. The updated M-DISC currently sold on the market uses the same metal ablative layer (MABL) metal oxide inorganic recording layer used in many of Verbatim's regular Blu-ray products. == Durability claims == The original M-DISC DVD+R was tested according to ISO/IEC 10995:2011 and ECMA-379 with a projected rated lifespan of several hundred years in archival use. The glassy carbon layers, in theory if preserved correctly in an environment like a salt mine, could store the data for over 10,000 years before going outside of readable specifications. However, the polycarbonate plastics, which are commonly used by almost all optical media and heavily in CBRN and ballistic protective equipment due to their optical, physical impact and chemical resistant properties, have a lifespan rating of only around 1000 years before degradation. Verbatim Japan claims that M-DISCs now use a titanium layer to prevent moisture ingression and to provide environmental stability. M-DISCs sold in Japan are advertised to have a projected lifespan of 100 years or more based on internal ISO/IEC 16963 testing, while other regional Verbatim websites claim that M-DISCs have a projected lifespan of "several hundred years" based on ISO/IEC 16963 testing. == Durability testing == In 2009, testing was done by the US Department of Defense (DoD) producing the China Lake Report testing Millenniata's M-Disk DVD to current market offerings from Delkin, MAM-A, Mitsubishi, Taiyo Yuden and Verbatim with all brands using organic dyes failing to pass the series of accelerated aging tests. From 2010 to 2012, the French National Laboratory of Metrology and Testing (LNE) used high-temperature accelerated aging testing, at 90 °C (194 °F) and 85% relative humidity inside a CLIMATS Excal 5423-U, for 250 to 1000 hours with a mix of inorganic DVD+R discs from MPO, Verbatim, Maxell, Syylex and DataTresor. The summary of the tests states that Syylex Glass Master Disc was rated for 1000+ hours, DataTresor Disc 250 hours+ and M-Disk under 250 hours. The Syylex disc was a custom-ordered product that could not be burned in a consumer player when they were still purchaseable from Syylex before their bankruptcy, so it was not truly in the same category as the others. In 2016, a consumer Mol Smith did real world stress testing on the 25 GB BD-R M-Disc alongside TDK's standard BD-R 25 GB disc using a copied movie, which demonstrated the reliability of M-Disc's molding compared to standard discs; after 60 days of outdoor direct exposure the M-Disk was played without error, while the TDK disc was physically destroyed. In 2022, the NIST Interagency Report NIST IR 8387 listed the M-Disc as an acceptable archival format rated for 100+ years, citing the aforementioned 2009 and 2012 tests by the US Department of Defense and French National Laboratory of Metrology and Testing as sources. == Commercial support == While recorded discs are readable in conventional DVD and BD drives, M-disc DVDs can only be burned by drives with firmware that supports the slightly higher power mode that M-Disk requires for burning its inorganic layers, as such writing speed is typically 2× speed. Blu-ray M-discs can be both written and read in most standard Blu-ray drives and are certified by the Blu-ray Disc Association to meet all current standard specifications as of 2019. Typically, the M-Discs cost 1.5–3× the price of standard Blu-Ray discs with DVD M-Discs now having sparse availability. With the first-generation DVD M-DISCs, it was difficult to determine which was the writable side of the disc due to being near fully translucent, until coloring and later labels similar to that on standard DVD discs was added to discs to help distinguish the sides preventing user error. Asus, LG Electronics, Lite-On, Pioneer, Buffalo Technology, and Hitachi-LG produce drives that can record M-DISC media while Verbatim and Ritek produce M-DISC discs. == Adoption == The regional government of the U.S. state of Utah has used M-Disc since 2011. Some consumers and avid datahoarders have adopted the format for cold digital data storage. == Alternative technologies == === Optical === Syylex Glass Master Disc: these discs use etched glass and are only typically degradable by physical or chemical damage, but not by normal ageing inside an archival environment. Current BD 25 GB, BD-R DL 50 GB & BDXL 100 GB (three layer) and Sony's BDXL 128 GB (four layer) discs are rated for up to 50 years (Standard inorganic HTL discs). Sony's Optical Disc Archive, is an optical competitor to the LTO tape-based data storage system, currently with up to 5.5 TB cartridges of dual-sided 120mm discs, with desktop readers and automated rackmount standard archival systems allowing for large scale archival and data retrieval rated for an estimated 100+ years. Pioneer DM for Archive is a disc media and drive combination developed by Pioneer to meet the requirements laid out by the Japanese government for preservation of financial data for a minimum of 100 years. The discs use a MABL type recording layer and are manufactured with tight tolerances. Although burnable in any BD Writer, when burned in Pioneers DM for Archive writers using the DM Archiver software the media and burn quality meet ISO/IEC 18630 which defines the testing methods needed for ensuring media and burn quality. === Magnetic === Linear Tape-Open (LTO) is rated for up to 30 years in a climate-controlled environment and is currently in use by most industries, including broadcast and corporate digital data systems. The latest generation released in 2026 is LTO-10, it defines two unique cartridge types which can hold 30 TB or 40 TB each Hard disk drives are currently available up to 30 TB (HDD) capacity in 3.5-inch format and 5 TB in 2.5-inch laptop format. However, unlike optical media, they are limited to 5–25 years of operation lifespan due to inevitable mechanical failure or magnetic instability. == Gallery ==