Letter frequency is the number of times letters of the alphabet appear on average in written language. Letter frequency analysis dates back to the Arab mathematician Al-Kindi (c. AD 801–873), who formally developed the method to break ciphers. Letter frequency analysis gained importance in Europe with the development of movable type in AD 1450, wherein one must estimate the amount of type required for each letterform. Linguists use letter frequency analysis as a rudimentary technique for language identification, where it is particularly effective as an indication of whether an unknown writing system is alphabetic, syllabic, or logographic. The use of letter frequencies and frequency analysis plays a fundamental role in cryptograms and several word puzzle games, including hangman, Scrabble, Wordle and the television game show Wheel of Fortune. One of the earliest descriptions in classical literature of applying the knowledge of English letter frequency to solving a cryptogram is found in Edgar Allan Poe's famous story "The Gold-Bug", where the method is successfully applied to decipher a message giving the location of a treasure hidden by Captain Kidd. Herbert S. Zim, in his classic introductory cryptography text Codes and Secret Writing, gives the English letter frequency sequence as "ETAON RISHD LFCMU GYPWB VKJXZQ", the most common letter pairs as "TH HE AN RE ER IN ON AT ND ST ES EN OF TE ED OR TI HI AS TO", and the most common doubled letters as "LL EE SS OO TT FF RR NN PP CC". Different ways of counting can produce somewhat different orders. Letter frequencies also have a strong effect on the design of some keyboard layouts. The most frequent letters are placed on the home row of the Blickensderfer typewriter, the Dvorak keyboard layout, Colemak and other optimized layouts, while the commonly used QWERTY layout places common letters apart from each other to prevent typewriter jamming. == Background == The frequency of letters in text has been studied for use in cryptanalysis, and frequency analysis in particular, dating back to the Arab mathematician al-Kindi (c. AD 801–873 ), who formally developed the method (the ciphers breakable by this technique go back at least to the Caesar cipher used by Julius Caesar, so this method could have been explored in classical times). Letter frequency analysis gained additional importance in Europe with the development of movable type in AD 1450, wherein one must estimate the amount of type required for each letterform, as evidenced by the variations in letter compartment size in typographer's type cases. No exact letter frequency distribution underlies a given language, since all writers write slightly differently. However, most languages have a characteristic distribution which is strongly apparent in longer texts. Even language changes as extreme as from Old English to modern English (regarded as mutually unintelligible) show strong trends in related letter frequencies: over a small sample of Biblical passages, from most frequent to least frequent, enaid sorhm tgþlwu æcfy ðbpxz of Old English compares to eotha sinrd luymw fgcbp kvjqxz of modern English, with the most extreme differences concerning letterforms not shared. Linotype machines for the English language assumed the letter order, from most to least common, to be etaoin shrdlu cmfwyp vbgkqj xz based on the experience and custom of manual compositors. The equivalent for the French language was elaoin sdrétu cmfhyp vbgwqj xz. Arranging the alphabet in Morse into groups of letters that require equal amounts of time to transmit, and then sorting these groups in increasing order, yields e it san hurdm wgvlfbk opxcz jyq. Letter frequency was used by other telegraph systems, such as the Murray Code. Similar ideas are used in modern data-compression techniques such as Huffman coding. Letter frequencies, like word frequencies, tend to vary, both by writer and by subject. For instance, ⟨d⟩ occurs with greater frequency in fiction, as most fiction is written in past tense and thus most verbs will end in the inflectional suffix -ed / -d. One cannot write an essay about x-rays without using ⟨x⟩ frequently, and the essay will have an idiosyncratic letter frequency if the essay is about, say, Queen Zelda of Zanzibar requesting X-rays from Qatar to examine hypoxia in zebras. Different authors have habits which can be reflected in their use of letters. Hemingway's writing style, for example, is visibly different from Faulkner's. Letter, bigram, trigram, word frequencies, word length, and sentence length can be calculated for specific authors and used to prove or disprove authorship of texts, even for authors whose styles are not so divergent. Accurate average letter frequencies can only be gleaned by analyzing a large amount of representative text. With the availability of modern computing and collections of large text corpora, such calculations are easily made. Examples can be drawn from a variety of sources (press reporting, religious texts, scientific texts and general fiction) and there are differences especially for general fiction with the position of ⟨h⟩ and ⟨i⟩, with ⟨h⟩ becoming more common. Different dialects of a language will also affect a letter's frequency. For example, an author in the United States would produce something in which ⟨z⟩ is more common than an author in the United Kingdom writing on the same topic: words like "analyze", "apologize", and "recognize" contain the letter in American English, whereas the same words are spelled "analyse", "apologise", and "recognise" in British English. This would highly affect the frequency of the letter ⟨z⟩, as it is rarely used by British writers in the English language. The "top twelve" letters constitute about 80% of the total usage. The "top eight" letters constitute about 65% of the total usage. Letter frequency as a function of rank can be fitted well by several rank functions, with the two-parameter Cocho/Beta rank function being the best. Another rank function with no adjustable free parameter also fits the letter frequency distribution reasonably well (the same function has been used to fit the amino acid frequency in protein sequences.) A spy using the VIC cipher or some other cipher based on a straddling checkerboard typically uses a mnemonic such as "a sin to err" (dropping the second "r") or "at one sir" to remember the top eight characters. == Relative frequencies of letters in the English language == There are three ways to count letter frequency that result in very different charts for common letters. The first method, used in the chart below, is to count letter frequency in lemmas of a dictionary. The lemma is the word in its canonical form. The second method is to include all word variants when counting, such as "abstracts", "abstracted" and "abstracting" and not just the lemma of "abstract". This second method results in letters like ⟨s⟩ appearing much more frequently, such as when counting letters from lists of the most used English words on the Internet. ⟨s⟩ is especially common in inflected words (non-lemma forms) because it is added to form plurals and third person singular present tense verbs. A final method is to count letters based on their frequency of use in actual texts, resulting in certain letter combinations like ⟨th⟩ becoming more common due to the frequent use of common words like "the", "then", "both", "this", etc. Absolute usage frequency measures like this are used when creating keyboard layouts or letter frequencies in old fashioned printing presses. An analysis of entries in the Concise Oxford dictionary, ignoring frequency of word use, gives an order of "EARIOTNSLCUDPMHGBFYWKVXZJQ". The letter-frequency table above is taken from Pavel Mička's website, which cites Robert Lewand's Cryptological Mathematics. According to Lewand, arranged from most to least common in appearance, the letters are: etaoinshrdlcumwfgypbvkjxqz. Lewand's ordering differs slightly from others, such as Cornell University Math Explorer's Project, which produced a table after measuring 40,000 words. In English, the space character occurs almost twice as frequently as the top letter (⟨e⟩) and the non-alphabetic characters (digits, punctuation, etc.) collectively occupy the fourth position (having already included the space) between ⟨t⟩ and ⟨a⟩. == Relative frequencies of the first letters of a word in the English language == The frequency of the first letters of words or names is helpful in pre-assigning space in physical files and indexes. Given 26 filing cabinet drawers, rather than a 1:1 assignment of one drawer to one letter of the alphabet, it is often useful to use a more equal-frequency-letter code by assigning several low-frequency letters to the same drawer (often one drawer is labeled VWXYZ), and to split up the most-frequent initial letters (⟨s, a, c⟩) into several drawers (often 6 drawers Aa-An, Ao-Az, Ca-Cj, Ck-Cz, Sa-Si, Sj-Sz). The same system is used in some mult
World Database of Happiness
The World Database of Happiness is a web-based archive of research findings on subjective appreciation of life, based in the Erasmus Happiness Economics Research Organization of the Erasmus University Rotterdam in The Netherlands. The database contains both an overview of scientific publications on happiness and a digest of research findings. Happiness is defined as the degree to which an individual judges the quality of his or her life as a whole favorably. Two 'components' of happiness are distinguished: hedonic level of affect (the degree to which pleasant affect dominates) and contentment (perceived realization of wants). == Aims == The World Database of Happiness is a tool to quickly acquire an overview on the ever-growing stream of research findings on happiness Medio 2023 the database covered some 16,000 scientific publications on happiness, from which were extracted 23,000 distributional findings (on how happy people are) and another 24,000 correlational findings (on factors associated with more and less happiness). The first findings date from 1915. == Technique == The World Database of Happiness is a ‘findings archive’, which consists of electronic ‘finding pages’ on which separate research results are described in a standard format and terminology. These finding pages can be selected on various characteristics, such as population studies, the measure of happiness used and observed co-variates. All finding-pages have a specific internet address to which links can be made in scientific review papers or policy recommendations. This allows a concise presentation of many findings in a table, while providing readers with access to detail. == Scientific use == The Database has been cited in 254 scientific papers, for example to access under what conditions economic growth enhances average happiness or to show that rising mean happiness at first raises happiness inequality, but further rise will diminish these differences, or that healthy eating is associated with more happiness, even after controlling for the effect on health Another finding is that relative simple happiness training techniques raise happiness by some 5% == Popular use == The World Database of Happiness is often used by popular media to make lists of the happiest countries around the globe. An example is the Happy Planet Index, which aims to chart sustainable happiness all over the world by combining data on longevity, happiness and the size of the ecological footprint of citizens. == Strengths and weaknesses == The database has a clear conceptual focus, it includes only research findings on subjective enjoyment of one's life as a whole. Thereby it evades the Babel that has haunted the study of happiness for ages. The other side of that coin is that much interesting research is left out. The findings are reported with technical details about measurement and statistical analysis. This detail is welcomed by scholars, but makes the information difficult to digest for lay-persons. Still another limitation is that the determinants of happiness appear to vary considerably across persons and situations, which make it hard to draw general conclusions about the causes of happiness. What is clear is that poor health, separation, unemployment and lack of social contact are all strongly negatively associated with happiness. Another problem for the World database of happiness is that the studies on happiness increase with such a high rate that it gets increasingly difficult to offer a complete overview of all research findings. A further concern is that the Database of Happiness is exclusively focused on hedonic happiness (feeling good) and not on mature happiness that might exist in the face of suffering
POODLE
POODLE (which stands for "Padding Oracle On Downgraded Legacy Encryption") is a security vulnerability which takes advantage of the fallback to SSL 3.0. If attackers successfully exploit this vulnerability, on average, they only need to make 256 SSL 3.0 requests to reveal one byte of encrypted messages. Bodo Möller, Thai Duong and Krzysztof Kotowicz from the Google Security Team discovered this vulnerability; they disclosed the vulnerability publicly on October 14, 2014 (despite the paper being dated "September 2014"). On December 8, 2014, a variation of the POODLE vulnerability that affected TLS was announced. The CVE-ID associated with the original POODLE attack is CVE-2014-3566. F5 Networks filed for CVE-2014-8730 as well, see POODLE attack against TLS section below. == Prevention == To mitigate the POODLE attack, one approach is to completely disable SSL 3.0 on the client side and the server side. However, some old clients and servers do not support TLS 1.0 and above. Thus, the authors of the paper on POODLE attacks also encourage browser and server implementation of TLS_FALLBACK_SCSV, which will make downgrade attacks impossible. Another mitigation is to implement "anti-POODLE record splitting". It splits the records into several parts and ensures none of them can be attacked. However the problem of the splitting is that, though valid according to the specification, it may also cause compatibility issues due to problems in server-side implementations. A full list of browser versions and levels of vulnerability to different attacks (including POODLE) can be found in the article Transport Layer Security. Opera 25 implemented this mitigation in addition to TLS_FALLBACK_SCSV. Google's Chrome browser and their servers had already supported TLS_FALLBACK_SCSV. Google stated in October 2014 it was planning to remove SSL 3.0 support from their products completely within a few months. Fallback to SSL 3.0 has been disabled in Chrome 39, released in November 2014. SSL 3.0 has been disabled by default in Chrome 40, released in January 2015. Mozilla disabled SSL 3.0 in Firefox 34 and ESR 31.3, which were released in December 2014, and added support of TLS_FALLBACK_SCSV in Firefox 35. Microsoft published a security advisory to explain how to disable SSL 3.0 in Internet Explorer and Windows OS, and on October 29, 2014, Microsoft released a fix which disables SSL 3.0 in Internet Explorer on Windows Vista / Server 2003 and above and announced a plan to disable SSL 3.0 by default in their products and services within a few months. Microsoft disabled fallback to SSL 3.0 in Internet Explorer 11 for Protect Mode sites on February 10, 2015, and for other sites on April 14, 2015. Apple's Safari (on OS X 10.8, iOS 8.1 and later) mitigated against POODLE by removing support for all CBC protocols in SSL 3.0, however, this left RC4 which is also completely broken by the RC4 attacks in SSL 3.0. POODLE was completely mitigated in OS X 10.11 (El Capitan 2015) and iOS 9 (2015). To prevent the POODLE attack, some web services dropped support of SSL 3.0. Examples include CloudFlare and Wikimedia. Network Security Services version 3.17.1 (released on October 3, 2014) and 3.16.2.3 (released on October 27, 2014) introduced support for TLS_FALLBACK_SCSV, and NSS will disable SSL 3.0 by default in April 2015. OpenSSL versions 1.0.1j, 1.0.0o and 0.9.8zc, released on October 15, 2014, introduced support for TLS_FALLBACK_SCSV. LibreSSL version 2.1.1, released on October 16, 2014, disabled SSL 3.0 by default. == POODLE attack against TLS == A new variant of the original POODLE attack was announced on December 8, 2014. This attack exploits implementation flaws of CBC encryption mode in the TLS 1.0 - 1.2 protocols. Even though TLS specifications require servers to check the padding, some implementations fail to validate it properly, which makes some servers vulnerable to POODLE even if they disable SSL 3.0. SSL Pulse showed "about 10% of the servers are vulnerable to the POODLE attack against TLS" before this vulnerability was announced. The CVE-ID for F5 Networks' implementation bug is CVE-2014-8730. The entry in NIST's NVD states that this CVE-ID is to be used only for F5 Networks' implementation of TLS, and that other vendors whose products have the same failure to validate the padding mistake in their implementations like A10 Networks and Cisco Systems need to issue their own CVE-IDs for their implementation errors because this is not a flaw in the protocol but in the implementation. The POODLE attack against TLS was found to be easier to initiate than the initial POODLE attack against SSL. There is no need to downgrade clients to SSL 3.0, meaning fewer steps are needed to execute a successful attack.
Cambridge Analytica
Cambridge Analytica Ltd. (CA), previously known as SCL USA, was a British political consulting firm that came to prominence through the Facebook–Cambridge Analytica data scandal. It was founded in 2013, as a subsidiary of the private intelligence company and self-described "global election management agency" SCL Group by long-time SCL executives Nigel Oakes, Alexander Nix and Alexander Oakes, with Nix as CEO. Cambridge Analytica was hired by a variety of political actors, including the Trinidadian government in 2010 and the 2016 presidential campaigns of Ted Cruz and Donald Trump. The firm maintained offices in London, New York City, and Washington, D.C. The company closed operations in 2018 due to backlash from the scandal, although firms related to both Cambridge Analytica and its parent firm SCL still exist. == History == Cambridge Analytica was founded in 2013 as a subsidiary of the private intelligence company SCL Group, which describes itself as providing "data, analytics and strategy to governments and military organisations worldwide". The company was part of "an international web of companies" headed by the London-based SCL Group. Cambridge Analytica (SCL USA) was incorporated in January 2013 with its registered office being in Westferry Circus, London and consisting of just one staff member, director and CEO Alexander Nix (also appointed in January 2015). Nix was also the director of nine similar companies sharing the same registered offices in London, including Firecrest technologies, Emerdata and six SCL Group companies including "SCL elections limited". Nigel Oakes, known as the former boyfriend of Lady Helen Windsor, had founded the predecessor SCL Group in the 1990s, and in 2005 Oakes established SCL Group together with his brother Alexander Oakes and Alexander Nix; SCL Group was the parent company of Cambridge Analytica. Former Conservative minister and MP Sir Geoffrey Pattie was the founding chairman of SCL; Lord Ivar Mountbatten also joined Oakes as a director of the company. As a result of the Facebook–Cambridge Analytica data scandal, Nix was removed as CEO and replaced by Julian Wheatland before the company closed. Several of the company's executives were Old Etonians. The company's owners included several of the Conservative Party's largest donors such as billionaire Vincent Tchenguiz, former British Conservative minister Jonathan Marland, Baron Marland and the family of American hedge fund manager Robert Mercer. The company combined misappropriation of digital assets, data mining, data brokerage, and data analysis with strategic communication during electoral processes. While its parent SCL had focused on influencing elections in developing countries since the 1990s, Cambridge Analytica focused more on the western world, including the United Kingdom and the United States; CEO Alexander Nix has said CA was involved in 44 U.S. political races in 2014. In 2015, CA performed data analysis services for Ted Cruz's presidential campaign. In 2016, CA worked for Donald Trump's presidential campaign as well as for Leave.EU (one of the organisations campaigning in the United Kingdom's referendum on European Union membership). CA's role in those campaigns has been controversial and is the subject of ongoing inquiries in both countries. Political scientists question CA's claims about the effectiveness of its methods of targeting voters. == Data scandal == In March 2018, media outlets broke news of Cambridge Analytica's business practices. The New York Times and The Observer reported that the company had acquired and used personal data about Facebook users from an external researcher who had told Facebook he was collecting it for academic purposes. Shortly afterwards, Channel 4 News aired undercover investigative videos showing Nix boasting about using prostitutes, bribery sting operations, and honey traps to discredit politicians on whom it had conducted opposition research, and saying that the company "ran all of (Donald Trump's) digital campaign". In response to the media reports, the Information Commissioner's Office (ICO) of the UK pursued a warrant to search the company's servers. Facebook banned Cambridge Analytica from advertising on its platform, saying that it had been deceived. On 23 March 2018, the British High Court granted the ICO a warrant to search Cambridge Analytica's London offices. As a result, Nix was suspended as CEO, and replaced by Julian Wheatland. The personal data of up to 87 million Facebook users were acquired via the 270,000 Facebook users who used a Facebook app created by Aleksandr Kogan called "This Is Your Digital Life". This was a personality profiling app and asked simple personality questions similar to other Facebook quizzes. Kogan was a scientist and psychologist, also being an employed lecturer for the University of Cambridge from 2012 to 2018. Alexander Nix claimed they had close to five thousand data points on each person who participated. They also gathered information through other data brokers ending with them acquiring millions of data points from American citizens. Kogan's app exploited a feature of Facebook's Graph API (version 1.0), which permitted any third-party app to access not only the app user's data, but also the full profile data of all of that user's Facebook friends, without those friends' knowledge or consent. This platform-wide design was available to all developers and was used by tens of thousands of apps; Facebook CEO Mark Zuckerberg later told the House Energy and Commerce Committee that the company was auditing "tens of thousands" of apps that had had access to large amounts of user data. Because the average Facebook user at the time had approximately 300 friends, the 270,000 users who installed Kogan's app yielded data on up to 87 million people. Facebook deprecated the friends-data API in April 2014 and shut it down entirely in April 2015, but data already collected by apps remained in developers' possession. Kogan passed this data to Cambridge Analytica, breaching Facebook's terms of service. On 1 May 2018, Cambridge Analytica and its parent company SCL filed for insolvency proceedings and closed operations. Alexander Tayler, a former director for Cambridge Analytica, was appointed director of Emerdata on 28 March 2018. Rebekah Mercer, Jennifer Mercer, Alexander Nix and Johnson Chun Shun Ko, who has links to American businessman Erik Prince, are in leadership positions at Emerdata. The Russo brothers are producing an upcoming film on Cambridge Analytica. In 2019 the Federal Trade Commission filed an administrative complaint against Cambridge Analytica for misuse of data. In 2020, the British Information Commissioner's Office closed a three-year inquiry into the company, concluded that Cambridge Analytica was "not involved" in the 2016 Brexit referendum and found no additional evidence for Russia's alleged interference during the campaign. US sensitive polling and election data, however, were passed to Russian Intelligence via a Cambridge Analytica contractor Sam Patten, Trump campaign manager Paul Manafort, and Russian agent Konstantin Kilimnik, who was indicted during the affair. Publicly, parent company SCL Group called itself a "global election management agency", Politico reported it was known for involvement "in military disinformation campaigns to social media branding and voter targeting". SCL gained work on a large number of campaigns for the US and UK governments' war on terror advancing their model of behavioral conflict during the 2000s. SCL's involvement in the political world has been primarily in the developing world where it has been used by the military and politicians to study and manipulate public opinion and political will. Slate writer Sharon Weinberger compared one of SCL's hypothetical test scenarios to fomenting a coup. Among the investors in Cambridge Analytica were some of the Conservative Party's largest donors such as billionaire Vincent Tchenguiz, former Conservative minister Jonathan Marland, Baron Marland, Roger Gabb, the family of American hedge fund manager Robert Mercer, and Steve Bannon. A minimum of 15 million dollars has been invested into the company by Mercer, according to The New York Times. Bannon's stake in the company was estimated at 1 to 5 million dollars, but he divested his holdings in April 2017 as required by his role as White House Chief Strategist. In March 2018, Jennifer Mercer and Rebekah Mercer became directors of Emerdata limited. In March 2018 it became public by Christopher Wylie, that Cambridge Analytica's first activities were founded on a data set, which its parent company SCL bought 2014 from a company named Global Science Research founded by Aleksandr Kogan and his team present across the world who worked as a psychologist at Cambridge. During Boris Johnson's tenure as foreign secretary, the Foreign Office sought advice from Cambridge Analytica and Boris Johnson had a meeting with Alexander N
Back-Up Interceptor Control
Backup Interceptor Control (BUIC, ) was the Electronic Systems Division 416M System to backup the SAGE 416L System in the United States and Canada. BUIC deployed Cold War command, control, and coordination systems to SAGE radar stations to create dispersed NORAD Control Centers. == Background == Prior to the SAGE Direction Centers becoming operational, the USAF deployed data link systems at NORAD Control Centers with ground computers for controlling crewed interceptors. After SAGE IBM AN/FSQ-7 Combat Direction Centrals became operational and the Super Combat Centers with improved (digital) computers were cancelled, a backup to SAGE was planned in the event the above-ground SAGE Air Defense Direction Center failed. == General Electric AN/GPA-37 Course Directing Group == BUIC began with deployment of General Electric AN/GPA-37 Course Directing Groups to several Long Range Radar stations. Units designated included the "U.S. Air Force 858th Air Defense Group (BUIC) [which became] a permanent operating facility" at Naval Air Station Fallon in Nevada. == BUIC II == BUIC II was used to command and control sites using the Burroughs AN/GSA-51 Radar Course Directing Group. North Truro AFS became the first ADC installation configured for BUIC II. == BUIC III == The AN/GYK-19 (initially AN/GSA-51A) was an upgraded version of the BUIC II system designated AN/GSA-51A and required a larger building than the AN/GSA-51. The first BUIC III site was Fort Fisher AFS, and Air Defense Command's was first installed at Fort Fisher Air Force Station, North Carolina. Although more advanced systems were contemplated, the final design of the BUIC III system was an upgraded version of the BUIC II with around twice the performance. == Closure and upgrade == In 1972, the USAF decided to shut down most of the BUIC sites; most of the sites mothballed by 1974, except for the BUIC III site at Tyndall Air Force Base. In Canada the BUIC site at Senneterre was shut down, but St Margarets remained open. The remaining sites were closed between 1983-1984 when SAGE was replaced by the Joint Surveillance System. The AN/FYQ-47 Common Digitizer for the Joint Surveillance System, and the Radar Video Data Processor (RVDP) was a combined system for the Air Force and Federal Aviation Administration (FAA), it replaced the SAGE Burroughs AN/FST-2 Coordinate Data Transmitting Sets.
Alipay
Alipay (simplified Chinese: 支付宝; traditional Chinese: 支付寶; pinyin: zhīfùbǎo) is a third-party mobile and online payment platform, established in Hangzhou, China, in February 2004 by Alibaba Group and its founder Jack Ma. In 2015, Alipay moved its headquarters to Pudong, Shanghai, although its parent company Ant Financial remains Hangzhou-based. Alipay overtook PayPal as the world's largest mobile (digital) payment platform in 2013. As of June 2020, Alipay serves over 1.3 billion users and 80 million merchants. According to the statistics of the fourth quarter of 2018, Alipay has a 55.32% share of the third-party payment market in mainland China, and it continues to grow. Along with WeChat, Alipay has been described to be China's super-app with a wide range of functionalities including ridesharing, travel booking and medical appointments. == History == The service was first launched in 2003, by Taobao. The People's Bank of China, China's central bank, issued licensing regulations in June 2010 for third-party payment providers. It also issued separate guidelines for foreign-funded payment institutions. Because of this, Alipay, which accounted for half of China's non-bank online payment market, was restructured as a domestic company controlled by Alibaba CEO Jack Ma in order to facilitate the regulatory approval for the license. The 2010 transfer of Alipay's ownership was controversial, with media reports in 2011 that Yahoo! and Softbank (Alibaba Group's controlling shareholders) were not informed of the sale for nominal value. Chinese business publication Century Weekly criticised Ma, who stated that Alibaba Group's board of directors was aware of the transaction. The incident was criticised in foreign and Chinese media as harming foreign trust in making Chinese investments. The ownership dispute was resolved by Alibaba Group, Yahoo!, and Softbank in July 2011. In 2013, Alipay launched a financial product platform called Yu'e Bao. Alipay partnered with Tianhong Asset Management to launch the it. Yu'e Bao offers an online money market account in which Alipay customers can deposit money and receive a higher interest rate than that available from banks. It soon became China's largest online money market fund and prompted competitors like Baidu and Tencent to introduce alternatives. Alibaba (the parent company of Alipay) reported having 152 million Yu'e Bao users in mid-2016, with 810 billion RMB (US$117 billion) in funds under management. In 2015, Alipay's parent company was re-branded as Ant Financial Services Group. In 2017, Alipay unveiled their facial recognition payment service. In 2020, Alipay upgraded from a payment financial instrument to an open platform for digital life. In 2021, the mandate by the Ministry of Industry and Information Technology (MIIT) to open up the "walled garden" ecosystems of the major tech companies has led to the introduction of interoperability of payment QR codes of Alipay and competing WeChat Pay and UnionPay's Cloud QuickPass platforms. In response to the increase in Alipay's payment volume due to use on Alibaba's e-commerce sites and others, Chinese regulators introduced new rules in 2020. The new rules focused on Alipay because the payment volume exploded due to its use on Alibaba's e-commerce sites and other platforms. By the second quarter in 2020, Alipay held 55.6% of China's third party mobile payment market. The People's Bank of China made rules that required payment firms to place money with regulators and anti-monopoly reviews would be triggered if the amount exceeded 50% market share. The rules included that the People's Bank of China mandate an online-payment clearing route through the NetsUnion Clearing Corporation, a centralized, state-overseen clearing body, and that unused consumer funds be held by a third-party payment provider in a non-interest-bearing account. These measures increased transparency and reduced systemic risk. When Alipay operates outside of China, it must comply with local financial regulations, which may treat specific functions such as money-market funds or investment-linked products. In Singapore, such services may require prior authorization from securities or financial-services regulators before they can be offered to residents. == Services == Alipay states that it operates with more than 65 financial institutions including Visa and MasterCard to provide payment services for Taobao and Tmall as well as more than 460,000 online and local Chinese businesses. Alipay is used in smartphones with their Alipay Wallet app. QR code payment codes are used for local in-store payments. The Alipay app also provides features such as credit card bill payments, bank account managements, P2P transfer, prepay mobile phone top-up, bus and train ticket purchases, food orders, vehicles for hire, insurance selections and a digital identification document storage. Alipay also allows online check-out on most Chinese-based websites such as Taobao and Tmall. The Alipay app allows users to add their own services provided from different companies to create a more personalised experience. Since late 2008, Alipay has promoted public service payment services and has covered more than 300 cities nationwide, supporting more than 1,200 partner organizations. In addition to utility bills such as water and electricity, Alipay also extends their services to areas such as paying transportation fines, property fees, and cable television fees. Common online payment services also include hydropower coal payment, tuition payment and traffic fine. On 15 January 2009, Alipay launched a credit card repayment service, supporting 39 domestic bank-issued credit cards. It is currently the most popular third-party repayment platform. The main advantages are free credit card bills checking, repayments with no administrative fee, as well as automatic repayment, repayment reminders and other value-added services. In the first quarter of 2014, 76% of credit cards were also paid by Alipay Wallet. From December 2013, several chain convenience store companies, including Meiyijia, Hongqi Chain, and Qishiduo C-STORE and 7-Eleven, have successively supported Alipay payment; in December, Beijing taxi drivers began to accept Alipay to pay the fare. Subsequently, Wanda Cinema, Joy City, Wangfujing and other large-scale retail companies as well as movie theaters, KTV, and catering companies have access to Alipay. From 26 March 2019, the service fee will be charged for the payment of credit card through Alipay. Customers only pay the portion of the payment that exceeds 2,000 yuan at 0.1%. In addition to this, in 2019, Walgreens accepted Alipay as payment in 3,000 US stores. Walgreen's products are available to Chinese customers through Alibaba's Tmall online marketplace. The payment application can also be used on Alibaba.com's site and Taobao as a means of payment. A Nielsen report suggests that over 90% of Chinese tourists would be willing to use mobile payment overseas if given the option. Many Chinese tourists do not have international credit cards, and so Alipay is a payment option. Digital payments have become the norm in China as the government pushes a cashless system even in rural and village areas. In November 2019, Alipay introduced Tourpass, a service component that allows non-Chinese users to use its mobile payment feature by pre-loading Chinese Yuan equivalent foreign currency into the app. In 2020, Alipay used a QR code system to help in containing the COVID-19 outbreak. The health code system tags users one of three colors according to their location, basic health information and travel history. "Beauty filters" were included to Alipay's face-scan payment system in a new upgrade that was released in July 2019. The market has responded well to the "beauty filters," which make users seem better when they use the program to make payments. Alipay Tap is a payment function launched by Alipay in July 2024. Alipay+ NFC enables wallets to offer tap-to-pay acceptance across Mastercard's global contactless network, all within your existing wallet infrastructure. == Foreign expansion == Outside of China, more than 300 worldwide merchants use Alipay to sell directly to consumers in China. It currently supports transactions in 18 foreign currencies. Since the launch of Alipay in the Mainland China, Ant Financial introduced a series of expansion of the services to other countries. Other than expanding into individual countries, the system would also be integrated with online payment platform providers. Ant Group had acquired a majority stake into 2C2P, a Singapore-based provider used by merchants worldwide in April 2022, and would eventually integrate Alipay with 2C2P. === Asia === ==== Bangladesh ==== In 2018, Alipay bought 20% shares in Bangladeshi mobile financial service provider bKash Limited. ==== Hong Kong ==== In 2017, Ant Financial expanded to Hong Kong. In a joint venture with CK Hutchison, as Alipay Payment Ser
Data preservation
Data preservation is the act of conserving and maintaining both the safety and integrity of data. Preservation is done through formal activities that are governed by policies, regulations and strategies directed towards protecting and prolonging the existence and authenticity of data and its metadata. Data can be described as the elements or units in which knowledge and information is created, and metadata are the summarizing subsets of the elements of data; or the data about the data. The main goal of data preservation is to protect data from being lost or destroyed and to contribute to the reuse and progression of the data. == History == Most historical data collected over time has been lost or destroyed. War and natural disasters combined with the lack of materials and necessary practices to preserve and protect data has caused this. Usually, only the most important data sets were saved, such as government records and statistics, legal contracts and economic transactions. Scientific research and doctoral theses data have mostly been destroyed from improper storage and lack of data preservation awareness and execution. Over time, data preservation has evolved and has generated importance and awareness. We now have many different ways to preserve data and many different important organizations involved in doing so. The first digital data preservation storage solutions appeared in the 1950s, which were usually flat or hierarchically structured. While there were still issues with these solutions, it made storing data much cheaper, and more easily accessible. In the 1970s relational databases as well as spreadsheets appeared. Relational data bases structure data into tables using structured query languages which made them more efficient than the preceding storage solutions, and spreadsheets hold high volumes of numeric data which can be applied to these relational databases to produce derivative data. More recently, non-relational (non-structured query language) databases have appeared as complements to relational databases which hold high volumes of unstructured or semi-structured data. == Importance == The scope of data preservation is vast. Everything from governmental to business records to art essentially can be represented as data, and is amenable to be lost. This then leads to loss of human history, for perpetuity. Data can be lost on a small or independent scale whether it's personal data loss, or data loss within businesses and organizations, as well as on a larger or national or global scale which can negatively and potentially permanently affect things such as environmental protection, medical research, homeland security, public health and safety, economic development and culture. The mechanisms of data loss are also as many as they are varied, spanning from disaster, wars, data breaches, negligence, all the way through simple forgetting to natural decay. Ways in which data collections can be used when preserved and stored properly can be seen through the U.S. Geological Survey, which stores data collections on natural hazards, natural resources, and landscapes. The data collected by the Survey is used by federal and state land management agencies towards land use planning and management, and continually needs access to historical reference data. == Related Concepts == In contrast, data holdings are collections of gathered data that are informally kept, and not necessarily prepared for long-term preservation. For example, a collection or back-up of personal files. Data holdings are generally the storage methods used in the past when data has been lost due to environmental and other historical disasters. Furthermore, data retention differs from data preservation in the sense that by definition, to retain an object (data) is to hold or keep possession or use of the object. To preserve an object is to protect, maintain and keep up for future use. Retention policies often circle around when data should be deleted on purpose as well, and held from public access, while preservation prioritizes permanence and more widely shared access. Thus, data preservation exceeds the concept of having or possessing data or back up copies of data. Data preservation ensures reliable access to data by including back-up and recovery mechanisms that precede the event of a disaster or technological change. == Methods == === Digital === Digital preservation, is similar to data preservation, but is mainly concerned with technological threats, and solely digital data. Essentially digital data is a set of formal activities to enable ongoing or persistent use and access of digital data exceeding the occurrence of technological malfunction or change. Digital preservation is aware of the inevitable change in technology and protocols, and prepares for data that will need to be accessible across new types of technologies and platforms while the integrity of the data and metadata are being conserved. Technology, while providing great process in conserving data that may not have been possible in the past, is also changing at such a quick rate that digital data may not be accessible anymore due to the format being incompatible with new software. Without the use of data preservation much of our existing digital data is at risk. The majority of methods used towards data preservation today are digital methods, which are so far the most effective methods that exist. === Archives === Archives are a collection of historical documents and records. Archives contribute and work towards the preservation of data by collecting data that is well organized, while providing the appropriate metadata to confirm it. An example of an important data archive is The LONI Image Data Archive, which is an archive that collects data regarding clinical trials and clinical research studies. === Catalogues, directories and portals === Catalogues, directories and portals are consolidated resources which are kept by individual institutions, and are associated with data archives and holdings. In other words, the data is not presented on the site, but instead might act as metadata and aggregators, and may administer thorough inventories. === Repositories === Repositories are places where data archives and holdings can be accessed and stored. The goal of repositories is to make sure that all requirements and protocols of archives and holdings are being met, and data is being certified to ensure data integrity and user trust. Single-site Repositories A repository that holds all data sets on a single site. An example of a major single-site repository the Data Archiving and Networking Services which is a repository which provides ongoing access to digital research resources for the Netherlands. Multi-Site Repositories A repository that hosts data set on multiple institutional sites. An example of a well known multi-site repository is OpenAIRE which is a repository that hosts research data and publications collaborating all of the EU countries and more. OpenAIRE promotes open scholarship and seeks to improves discover-ability and re-usability of data. Trusted Digital Repository A repository that seeks to provide reliable, trusted access over a long period of time. The repository can be single or multi-sited but must cooperate with the Reference Model for an Open Archival Information System, as well as adhere to a set of rules or attributes that contribute to its trust such as having persistent financial responsibility, organizational buoyancy, administrative responsibility security and safety. An example of a trusted digital repository is The Digital Repository of Ireland (DRI) which is a multi-site repository that hosts Ireland's humanity and social science data sets. === Cyber Infrastructures === Cyber infrastructures which consists of archive collections which are made available through the system of hardware, technologies, software, policies, services and tools. Cyber infrastructures are geared towards the sharing of data supporting peer-to-peer collaborations and a cultural community. An example of a major cyber-infrastructure is The Canadian Geo-spatial Data Infrastructure which provides access to spatial data in Canada.