AI Detector Similar To Turnitin

AI Detector Similar To Turnitin — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • NHS COVID-19

    NHS COVID-19

    NHS COVID-19 was a voluntary contact tracing app for monitoring the spread of the COVID-19 pandemic in England and Wales, in use from 24 September 2020 until 27 April 2023. It was available for Android and iOS smartphones, and could be used by anyone aged 16 or over. Two versions of the app were created. The first was commissioned by NHSX and developed by the Pivotal division of American software company VMware. A pilot deployment began in May 2020, but on 18 June development of the app was abandoned in favour of a second design using the Apple/Google Exposure Notification system. Scotland and Northern Ireland had separate contact tracing apps. A 2023 study estimated that in its first year of use, the app's contact tracing function prevented an estimated 1 million cases, and 9,600 deaths. == Description == The app allowed users to: See the alert level of their local authority area (in Wales) or information about restrictions (in England); to enable this, the user must enter the first half of their postcode "Check in" at places displaying an NHS QR code poster (no longer required by legislation after 26 January 2022, removed from the app the next month) Be notified when they have been in close contact with someone who has tested positive for the virus Be notified when local health protection teams determine that people with the virus had attended a business or other venue around the same time as the user Check their symptoms, and book a coronavirus test if necessary If asked to self-isolate, receive information and a daily "countdown". At first, "close contact" was defined as being within 2 metres for 15 minutes, or within 4 metres for a longer time. These time durations were reduced from 29 October 2020, to as little as three minutes when the other person is at their most infectious, i.e. soon after they begin showing symptoms. === Implementation === The Android app was coded in Kotlin, and the iOS app in Swift. The backend used Java and is deployed to Amazon Web Services using Terraform. The code of the app and back-end is open-source and available on GitHub. == Context == The app was part of the UK's test and trace programme which was chaired by Dido Harding; from 12 May 2020 Tom Riordan, chief executive of Leeds City Council, led the tracing effort. == First phase and cancellation == === Description === In March 2020, NHSX commissioned a contact tracing app to monitor the spread in the United Kingdom of the coronavirus disease 2019 (COVID-19) in the 2020 pandemic, developed by the Pivotal division of American software company VMware. The app used a centralised approach, in contrast to the Google / Apple contact tracing project. NHSX consulted ethicists and GCHQ's National Cyber Security Centre (NCSC) about the privacy aspects. The app recorded the make and model of the phone and asked the user for their postcode area. It generated a unique installation identification number and also a daily identification number. It then used Bluetooth Low Energy (BLE) to record the daily identification number of other users nearby. If a user was unwell, they could tell the app about symptoms which are characteristic of COVID-19, such as a fever and cough. These details were then passed to a central NHS server. This would assess the information and notify other users that have been in contact, giving them appropriate advice such as physical distancing. The NHS would also arrange for a swab test of the unwell user and the outcome would determine further notifications to contacts: if the test confirmed infection with COVID-19, the contacts would be asked to isolate. By June 2020, £11.8 million had been spent on the app; in 2020–21, £35 million was spent on the app. === Deployment === The first public trial of the app began on the Isle of Wight on 5 May 2020 and by 11 May it had been downloaded 55,000 times. When the first national contact tracing schemes were launched – Test, Trace, Protect in Wales on 13 May, then on 28 May NHS Test and Trace in England, and Test and Protect in Scotland – the app was not ready to be included. Replying to a question at the government's daily briefing on 8 June, Hancock was unable to give a date for rollout of the app in England, saying it would be brought in "when it's right to do so". On 17 June, Lord Bethell, junior minister for Innovation at the Department of Health and Social Care, said "we're seeking to get something going before the winter ... it isn't a priority for us at the moment". On 18 June, Health Secretary Matt Hancock announced development would switch to the Apple/Google system after admitting that Apple's restrictions on usage of Bluetooth prevented the app from working effectively. At the same press briefing Dido Harding, leader of the UK's test and trace programme, said "What we've done in really rigorously testing both our own Covid-19 app and the Google-Apple version is demonstrate that none of them are working sufficiently well enough to be actually reliable to determine whether any of us should self-isolate for two weeks [and] that's true across the world". === Concerns === The first, ultimately rejected, version of the app was subject to privacy concerns, the government backtracking on initial statements that the data collected from the app would not be shared outside the NHS. Matthew Gould, CEO of NHSX, the government department responsible for the app, said the data would be accessible to other organisations, but did not disclose which. Data collected would not necessarily be anonymised and would be held in a centralised repository. Over 150 of the UK's security and privacy experts warned the app's data could be used by 'a bad actor (state, private sector, or hacker)' to spy on citizens. Fears were discussed by the House of Commons' Human Rights Select Committee about plans for the app to record user location data. Parliament's Joint Committee on Human Rights said this version of the app should not be released without proper privacy protections. The second version of the app, released nationwide, addressed these concerns by employing a decentralised framework, the Apple/Google Exposure Notification system. Under this system, users remain pseudonymous: a person diagnosed with COVID-19 does not know which people are informed about an encounter, and contacted persons do not receive any information about the person diagnosed with COVID-19. The functionality of the app was also questioned in late April and early May 2020, as the software's use of Bluetooth required the app to be constantly running, meaning users could not use other apps or lock their device if the app was to function properly. The developers of the app were said to have found a way of working around this restriction. === Related contracts === Faculty – a company linked to Cambridge Analytica – provided research and modelling to NHSX in support of the response to the pandemic. Palantir, also linked to Cambridge Analytica, provided their data management platform. These contracts began in February and March respectively. == Second phase == As outlined on cancellation of the first app on 18 June 2020, the Department of Health and Social Care published on 30 July a brief description of the "next phase" app. Users would be able to scan a QR code at venues they visit, and later be notified if they had visited a place which was the source of a number of infections; the app would also assist with identifying symptoms and ordering a test. By using the Exposure Notification system from Apple and Google, personal data would be decentralised. Zuhlke Engineering Ltd, the UK branch of Swiss-based Zühlke Group, used 70 staff to complete the development of the app in 12 weeks. Zuhlke Engineering was awarded "Development Team of the Year" title at UK IT Industry awards in November 2021 for development of NHS COVID-19 application. === Timeline === Testing of the app by NHS volunteer responders, and selected residents of the Isle of Wight and the London Borough of Newham, began around 13 August. The app was made available to the public (aged 16 or over) in England and Wales on 24 September. An updated app released on 29 October, in part from collaboration with the Alan Turing Institute, improved the accuracy of measurements of the distance between the user's phone and other phones. At the same time, the duration threshold for determining exposure was reduced; this was expected to lead to an increase in the number of users told to self-isolate. An update to the app in April 2021, timed to coincide with easing of restrictions on hospitality businesses, was blocked by Apple and Google. It was intended that users who tested positive would be asked to share their history of visited venues, to assist in warning others, but this would have contravened assurances by Apple and Google that location data from devices would not be shared. === Statistics and effectiveness === The app was downloaded six million times on the first day it was generally availa

    Read more →
  • Service Assurance Agent

    Service Assurance Agent

    IP SLA (Internet Protocol Service Level Agreement) is an active computer network measurement technology that was initially developed by Cisco Systems. IP SLA was previously known as Service Assurance Agent (SAA) or Response Time Reporter (RTR). IP SLA is used to track network performance like latency, ping response, and jitter, it also helps to provide service quality. == Functions == Routers and switches enabled with IP SLA perform periodic network tests or measurements such as Hypertext Transfer Protocol (HTTP) GET File Transfer Protocol (FTP) downloads Domain Name System (DNS) lookups User Datagram Protocol (UDP) echo, for VoIP jitter and mean opinion score (MOS) Data-Link Switching (DLSw) (Systems Network Architecture (SNA) tunneling protocol) Dynamic Host Configuration Protocol (DHCP) lease requests Transmission Control Protocol (TCP) connect Internet Control Message Protocol (ICMP) echo (remote ping) The exact number and types of available measurements depends on the IOS version. IP SLA is very widely used in service provider networks to generate time-based performance data. It is also used together with Simple Network Management Protocol (SNMP) and NetFlow, which generate volume-based data. == Usage considerations == For IP SLA tests, devices with IP SLA support are required. IP SLA is supported on Cisco routers and switches since IOS version 12.1. Other vendors like Juniper Networks or Enterasys Networks support IP SLA on some of their devices. IP SLA tests and data collection can be configured either via a console (command-line interface) or via SNMP. When using SNMP, both read and write community strings are needed. The IP SLA voice quality feature was added starting with IOS version 12.3(4)T. All versions after this, including 12.4 mainline, contain the MOS and ICPIF voice quality calculation for the UDP jitter measurement.

    Read more →
  • Yao's test

    Yao's test

    In cryptography and the theory of computation, Yao's test is a test defined by Andrew Chi-Chih Yao in 1982, against pseudo-random sequences. A sequence of words passes Yao's test if an attacker with reasonable computational power cannot distinguish it from a sequence generated uniformly at random. == Formal statement == === Boolean circuits === Let P {\displaystyle P} be a polynomial, and S = { S k } k {\displaystyle S=\{S_{k}\}_{k}} be a collection of sets S k {\displaystyle S_{k}} of P ( k ) {\displaystyle P(k)} -bit long sequences, and for each k {\displaystyle k} , let μ k {\displaystyle \mu _{k}} be a probability distribution on S k {\displaystyle S_{k}} , and P C {\displaystyle P_{C}} be a polynomial. A predicting collection C = { C k } {\displaystyle C=\{C_{k}\}} is a collection of boolean circuits of size less than P C ( k ) {\displaystyle P_{C}(k)} . Let p k , S C {\displaystyle p_{k,S}^{C}} be the probability that on input s {\displaystyle s} , a string randomly selected in S k {\displaystyle S_{k}} with probability μ ( s ) {\displaystyle \mu (s)} , C k ( s ) = 1 {\displaystyle C_{k}(s)=1} , i.e. Moreover, let p k , U C {\displaystyle p_{k,U}^{C}} be the probability that C k ( s ) = 1 {\displaystyle C_{k}(s)=1} on input s {\displaystyle s} a P ( k ) {\displaystyle P(k)} -bit long sequence selected uniformly at random in { 0 , 1 } P ( k ) {\displaystyle \{0,1\}^{P(k)}} . We say that S {\displaystyle S} passes Yao's test if for all predicting collection C {\displaystyle C} , for all but finitely many k {\displaystyle k} , for all polynomial Q {\displaystyle Q} : === Probabilistic formulation === As in the case of the next-bit test, the predicting collection used in the above definition can be replaced by a probabilistic Turing machine, working in polynomial time. This also yields a strictly stronger definition of Yao's test (see Adleman's theorem). Indeed, one could decide undecidable properties of the pseudo-random sequence with the non-uniform circuits described above, whereas BPP machines can always be simulated by exponential-time deterministic Turing machines.

    Read more →
  • Kruskal count

    Kruskal count

    The Kruskal count (also known as Kruskal's principle, Dynkin–Kruskal count, Dynkin's counting trick, Dynkin's card trick, coupling card trick or shift coupling) is a probabilistic concept originally demonstrated by the Russian mathematician Evgenii Borisovich Dynkin in the 1950s or 1960s discussing coupling effects and rediscovered as a card trick by the American mathematician Martin David Kruskal in the early 1970s as a side-product while working on another problem. It was published by Kruskal's friend Martin Gardner and magician Karl Fulves in 1975. This is related to a similar trick published by magician Alexander F. Kraus in 1957 as Sum total and later called Kraus principle. Besides uses as a card trick, the underlying phenomenon has applications in cryptography, code breaking, software tamper protection, code self-synchronization, control-flow resynchronization, design of variable-length codes and variable-length instruction sets, web navigation, object alignment, and others. == Card trick == The trick is performed with cards, but is more a magical-looking effect than a conventional magic trick. The magician has no access to the cards, which are manipulated by members of the audience. Thus sleight of hand is not possible. Rather the effect is based on the mathematical fact that the output of a Markov chain, under certain conditions, is typically independent of the input. A simplified version using the hands of a clock performed by David Copperfield is as follows. A volunteer picks a number from one to twelve and does not reveal it to the magician. The volunteer is instructed to start from 12 on the clock and move clockwise by a number of spaces equal to the number of letters that the chosen number has when spelled out. This is then repeated, moving by the number of letters in the new number. The output after three or more moves does not depend on the initially chosen number and therefore the magician can predict it.

    Read more →
  • Lexical Markup Framework

    Lexical Markup Framework

    Language resource management – Lexical markup framework (LMF; ISO 24613), produced by ISO/TC 37, is the ISO standard for natural language processing (NLP) and machine-readable dictionary (MRD) lexicons. The scope is standardization of principles and methods relating to language resources in the contexts of multilingual communication. == Objectives == The goals of LMF are to provide a common model for the creation and use of lexical resources, to manage the exchange of data between and among these resources, and to enable the merging of large number of individual electronic resources to form extensive global electronic resources. Types of individual instantiations of LMF can include monolingual, bilingual or multilingual lexical resources. The same specifications are to be used for both small and large lexicons, for both simple and complex lexicons, for both written and spoken lexical representations. The descriptions range from morphology, syntax, computational semantics to computer-assisted translation. The covered languages are not restricted to European languages but cover all natural languages. The range of targeted NLP applications is not restricted. LMF is able to represent most lexicons, including WordNet, EDR and PAROLE lexicons. == History == In the past, lexicon standardization has been studied and developed by a series of projects like GENELEX, EDR, EAGLES, MULTEXT, PAROLE, SIMPLE and ISLE. Then, the ISO/TC 37 National delegations decided to address standards dedicated to NLP and lexicon representation. The work on LMF started in Summer 2003 by a new work item proposal issued by the US delegation. In Fall 2003, the French delegation issued a technical proposition for a data model dedicated to NLP lexicons. In early 2004, the ISO/TC 37 committee decided to form a common ISO project with Nicoletta Calzolari (CNR-ILC Italy) as convenor and Gil Francopoulo (Tagmatica France) and Monte George (ANSI, United States) as editors. The first step in developing LMF was to design an overall framework based on the general features of existing lexicons and to develop a consistent terminology to describe the components of those lexicons. The next step was the actual design of a comprehensive model that best represented all of the lexicons in detail. A large panel of 60 experts contributed a wide range of requirements for LMF that covered many types of NLP lexicons. The editors of LMF worked closely with the panel of experts to identify the best solutions and reach a consensus on the design of LMF. Special attention was paid to the morphology in order to provide powerful mechanisms for handling problems in several languages that were known as difficult to handle. 13 versions have been written, dispatched (to the National nominated experts), commented and discussed during various ISO technical meetings. After five years of work, including numerous face-to-face meetings and e-mail exchanges, the editors arrived at a coherent UML model. In conclusion, LMF should be considered a synthesis of the state of the art in NLP lexicon field. == Current stage == The ISO number is 24613. The LMF specification has been published officially as an International Standard on 17 November 2008. == As one of the members of the ISO/TC 37 family of standards == The ISO/TC 37 standards are currently elaborated as high level specifications and deal with word segmentation (ISO 24614), annotations (ISO 24611 a.k.a. MAF, ISO 24612 a.k.a. LAF, ISO 24615 a.k.a. SynAF, and ISO 24617-1 a.k.a. SemAF/Time), feature structures (ISO 24610), multimedia containers (ISO 24616 a.k.a. MLIF), and lexicons (ISO 24613). These standards are based on low level specifications dedicated to constants, namely data categories (revision of ISO 12620), language codes (ISO 639), scripts codes (ISO 15924), country codes (ISO 3166) and Unicode (ISO 10646). The two level organization forms a coherent family of standards with the following common and simple rules: the high level specification provides structural elements that are adorned by the standardized constants; the low level specifications provide standardized constants as metadata. == Key standards == The linguistics constants like /feminine/ or /transitive/ are not defined within LMF but are recorded in the Data Category Registry (DCR) that is maintained as a global resource by ISO/TC 37 in compliance with ISO/IEC 11179-3:2003. And these constants are used to adorn the high level structural elements. The LMF specification complies with the modeling principles of Unified Modeling Language (UML) as defined by Object Management Group (OMG). The structure is specified by means of UML class diagrams. The examples are presented by means of UML instance (or object) diagrams. An XML DTD is given in an annex of the LMF document. == Model structure == LMF is composed of the following components: The core package that is the structural skeleton which describes the basic hierarchy of information in a lexical entry. Extensions of the core package which are expressed in a framework that describes the reuse of the core components in conjunction with the additional components required for a specific lexical resource. The extensions are specifically dedicated to morphology, MRD, NLP syntax, NLP semantics, NLP multilingual notations, NLP morphological patterns, multiword expression patterns, and constraint expression patterns. == Example == In the following example, the lexical entry is associated with a lemma clergyman and two inflected forms clergyman and clergymen. The language coding is set for the whole lexical resource. The language value is set for the whole lexicon as shown in the following UML instance diagram. The elements Lexical Resource, Global Information, Lexicon, Lexical Entry, Lemma, and Word Form define the structure of the lexicon. They are specified within the LMF document. On the contrary, languageCoding, language, partOfSpeech, commonNoun, writtenForm, grammaticalNumber, singular, plural are data categories that are taken from the Data Category Registry. These marks adorn the structure. The values ISO 639-3, clergyman, clergymen are plain character strings. The value eng is taken from the list of languages as defined by ISO 639-3. With some additional information like dtdVersion and feat, the same data can be expressed by the following XML fragment: This example is rather simple, while LMF can represent much more complex linguistic descriptions the XML tagging is correspondingly complex. == Selected publications about LMF == The first publication about the LMF specification as it has been ratified by ISO (this paper became (in 2015) the 9th most cited paper within the Language Resources and Evaluation conferences from LREC papers): Language Resources and Evaluation LREC-2006/Genoa: Gil Francopoulo, Monte George, Nicoletta Calzolari, Monica Monachini, Nuria Bel, Mandy Pet, Claudia Soria: Lexical Markup Framework (LMF) About semantic representation: Gesellschaft für linguistische Datenverarbeitung GLDV-2007/Tübingen: Gil Francopoulo, Nuria Bel, Monte George Nicoletta Calzolari, Monica Monachini, Mandy Pet, Claudia Soria: Lexical Markup Framework ISO standard for semantic information in NLP lexicons About African languages: Traitement Automatique des langues naturelles, Marseille, 2014: Mouhamadou Khoule, Mouhamad Ndiankho Thiam, El Hadj Mamadou Nguer: Toward the establishment of a LMF-based Wolof language lexicon (Vers la mise en place d'un lexique basé sur LMF pour la langue wolof) [in French] About Asian languages: Lexicography, Journal of ASIALEX, Springer 2014: Lexical Markup Framework: Gil Francopoulo, Chu-Ren Huang: An ISO Standard for Electronic Lexicons and its Implications for Asian Languages DOI 10.1007/s40607-014-0006-z About European languages: COLING 2010: Verena Henrich, Erhard Hinrichs: Standardizing Wordnets in the ISO Standard LMF: Wordnet-LMF for GermaNet EACL 2012: Judith Eckle-Kohler, Iryna Gurevych: Subcat-LMF: Fleshing out a standardized format for subcategorization frame interoperability EACL 2012: Iryna Gurevych, Judith Eckle-Kohler, Silvana Hartmann, Michael Matuschek, Christian M Meyer, Christian Wirth: UBY - A Large-Scale Unified Lexical-Semantic Resource Based on LMF. About Semitic languages: Journal of Natural Language Engineering, Cambridge University Press (to appear in Spring 2015): Aida Khemakhem, Bilel Gargouri, Abdelmajid Ben Hamadou, Gil Francopoulo: ISO Standard Modeling of a large Arabic Dictionary. Proceedings of the seventh Global Wordnet Conference 2014: Nadia B M Karmani, Hsan Soussou, Adel M Alimi: Building a standardized Wordnet in the ISO LMF for aeb language. Proceedings of the workshop: HLT & NLP within Arabic world, LREC 2008: Noureddine Loukil, Kais Haddar, Abdelmajid Ben Hamadou: Towards a syntactic lexicon of Arabic Verbs. Traitement Automatique des Langues Naturelles, Toulouse (in French) 2007: Khemakhem A, Gargouri B, Abdelwahed A, Francopoulo G: Modélisation des paradigmes de fl

    Read more →
  • Cambridge Analytica

    Cambridge Analytica

    Cambridge Analytica Ltd. (CA), previously known as SCL USA, was a British political consulting firm that came to prominence through the Facebook–Cambridge Analytica data scandal. It was founded in 2013, as a subsidiary of the private intelligence company and self-described "global election management agency" SCL Group by long-time SCL executives Nigel Oakes, Alexander Nix and Alexander Oakes, with Nix as CEO. Cambridge Analytica was hired by a variety of political actors, including the Trinidadian government in 2010 and the 2016 presidential campaigns of Ted Cruz and Donald Trump. The firm maintained offices in London, New York City, and Washington, D.C. The company closed operations in 2018 due to backlash from the scandal, although firms related to both Cambridge Analytica and its parent firm SCL still exist. == History == Cambridge Analytica was founded in 2013 as a subsidiary of the private intelligence company SCL Group, which describes itself as providing "data, analytics and strategy to governments and military organisations worldwide". The company was part of "an international web of companies" headed by the London-based SCL Group. Cambridge Analytica (SCL USA) was incorporated in January 2013 with its registered office being in Westferry Circus, London and consisting of just one staff member, director and CEO Alexander Nix (also appointed in January 2015). Nix was also the director of nine similar companies sharing the same registered offices in London, including Firecrest technologies, Emerdata and six SCL Group companies including "SCL elections limited". Nigel Oakes, known as the former boyfriend of Lady Helen Windsor, had founded the predecessor SCL Group in the 1990s, and in 2005 Oakes established SCL Group together with his brother Alexander Oakes and Alexander Nix; SCL Group was the parent company of Cambridge Analytica. Former Conservative minister and MP Sir Geoffrey Pattie was the founding chairman of SCL; Lord Ivar Mountbatten also joined Oakes as a director of the company. As a result of the Facebook–Cambridge Analytica data scandal, Nix was removed as CEO and replaced by Julian Wheatland before the company closed. Several of the company's executives were Old Etonians. The company's owners included several of the Conservative Party's largest donors such as billionaire Vincent Tchenguiz, former British Conservative minister Jonathan Marland, Baron Marland and the family of American hedge fund manager Robert Mercer. The company combined misappropriation of digital assets, data mining, data brokerage, and data analysis with strategic communication during electoral processes. While its parent SCL had focused on influencing elections in developing countries since the 1990s, Cambridge Analytica focused more on the western world, including the United Kingdom and the United States; CEO Alexander Nix has said CA was involved in 44 U.S. political races in 2014. In 2015, CA performed data analysis services for Ted Cruz's presidential campaign. In 2016, CA worked for Donald Trump's presidential campaign as well as for Leave.EU (one of the organisations campaigning in the United Kingdom's referendum on European Union membership). CA's role in those campaigns has been controversial and is the subject of ongoing inquiries in both countries. Political scientists question CA's claims about the effectiveness of its methods of targeting voters. == Data scandal == In March 2018, media outlets broke news of Cambridge Analytica's business practices. The New York Times and The Observer reported that the company had acquired and used personal data about Facebook users from an external researcher who had told Facebook he was collecting it for academic purposes. Shortly afterwards, Channel 4 News aired undercover investigative videos showing Nix boasting about using prostitutes, bribery sting operations, and honey traps to discredit politicians on whom it had conducted opposition research, and saying that the company "ran all of (Donald Trump's) digital campaign". In response to the media reports, the Information Commissioner's Office (ICO) of the UK pursued a warrant to search the company's servers. Facebook banned Cambridge Analytica from advertising on its platform, saying that it had been deceived. On 23 March 2018, the British High Court granted the ICO a warrant to search Cambridge Analytica's London offices. As a result, Nix was suspended as CEO, and replaced by Julian Wheatland. The personal data of up to 87 million Facebook users were acquired via the 270,000 Facebook users who used a Facebook app created by Aleksandr Kogan called "This Is Your Digital Life". This was a personality profiling app and asked simple personality questions similar to other Facebook quizzes. Kogan was a scientist and psychologist, also being an employed lecturer for the University of Cambridge from 2012 to 2018. Alexander Nix claimed they had close to five thousand data points on each person who participated. They also gathered information through other data brokers ending with them acquiring millions of data points from American citizens. Kogan's app exploited a feature of Facebook's Graph API (version 1.0), which permitted any third-party app to access not only the app user's data, but also the full profile data of all of that user's Facebook friends, without those friends' knowledge or consent. This platform-wide design was available to all developers and was used by tens of thousands of apps; Facebook CEO Mark Zuckerberg later told the House Energy and Commerce Committee that the company was auditing "tens of thousands" of apps that had had access to large amounts of user data. Because the average Facebook user at the time had approximately 300 friends, the 270,000 users who installed Kogan's app yielded data on up to 87 million people. Facebook deprecated the friends-data API in April 2014 and shut it down entirely in April 2015, but data already collected by apps remained in developers' possession. Kogan passed this data to Cambridge Analytica, breaching Facebook's terms of service. On 1 May 2018, Cambridge Analytica and its parent company SCL filed for insolvency proceedings and closed operations. Alexander Tayler, a former director for Cambridge Analytica, was appointed director of Emerdata on 28 March 2018. Rebekah Mercer, Jennifer Mercer, Alexander Nix and Johnson Chun Shun Ko, who has links to American businessman Erik Prince, are in leadership positions at Emerdata. The Russo brothers are producing an upcoming film on Cambridge Analytica. In 2019 the Federal Trade Commission filed an administrative complaint against Cambridge Analytica for misuse of data. In 2020, the British Information Commissioner's Office closed a three-year inquiry into the company, concluded that Cambridge Analytica was "not involved" in the 2016 Brexit referendum and found no additional evidence for Russia's alleged interference during the campaign. US sensitive polling and election data, however, were passed to Russian Intelligence via a Cambridge Analytica contractor Sam Patten, Trump campaign manager Paul Manafort, and Russian agent Konstantin Kilimnik, who was indicted during the affair. Publicly, parent company SCL Group called itself a "global election management agency", Politico reported it was known for involvement "in military disinformation campaigns to social media branding and voter targeting". SCL gained work on a large number of campaigns for the US and UK governments' war on terror advancing their model of behavioral conflict during the 2000s. SCL's involvement in the political world has been primarily in the developing world where it has been used by the military and politicians to study and manipulate public opinion and political will. Slate writer Sharon Weinberger compared one of SCL's hypothetical test scenarios to fomenting a coup. Among the investors in Cambridge Analytica were some of the Conservative Party's largest donors such as billionaire Vincent Tchenguiz, former Conservative minister Jonathan Marland, Baron Marland, Roger Gabb, the family of American hedge fund manager Robert Mercer, and Steve Bannon. A minimum of 15 million dollars has been invested into the company by Mercer, according to The New York Times. Bannon's stake in the company was estimated at 1 to 5 million dollars, but he divested his holdings in April 2017 as required by his role as White House Chief Strategist. In March 2018, Jennifer Mercer and Rebekah Mercer became directors of Emerdata limited. In March 2018 it became public by Christopher Wylie, that Cambridge Analytica's first activities were founded on a data set, which its parent company SCL bought 2014 from a company named Global Science Research founded by Aleksandr Kogan and his team present across the world who worked as a psychologist at Cambridge. During Boris Johnson's tenure as foreign secretary, the Foreign Office sought advice from Cambridge Analytica and Boris Johnson had a meeting with Alexander N

    Read more →
  • Classora

    Classora

    Classora is a knowledge base for the Internet oriented to data analysis. From a practical point of view, Classora is a digital repository that stores structured information and allows it to be displayed in multiple formats: analytically, graphically, geographically (through maps); as well as carry out OLAP analysis. The information contained in Classora comes from public sources and is uploaded into the system through bots and ETL processes. The Knowledge Base has a commercial API for semantic enhancement, and an open web through which any user can access to part of the information collected (it also allows users to complete data and share opinions). Internally, Classora is organized into Knowledge Units and Reports. A «Knowledge Unit» is any element of the World about which information may be stored and presented in the form of a data sheet (a person, a company, a country, etc.) A «Report» is a group of Knowledge Units: a ranking of companies, a sport classification table, a survey about people, etc. In fact, one of the technical capabilities of Classora is that it allows the comparison of reports and knowledge units gathered from different sources, thereby generating an added value for the media in which this information is published: digital media, interactive TV, etc. == Key definitions == === Knowledge unit === The units of knowledge (also known as entries) in Classora are data sheets that have a certain semantic equivalence with the articles on the Wikipedia: they store information about any element of the world, be it a film, a country, a company or an animal. However, they differ from Wikipedia in that Classora stores structured information, enriched with a metadata layer; and therefore it is able to automatically interpret the meaning of each unit of knowledge. === Data report === A report is a group of units of knowledge in which the repetition of elements is not allowed. This definition includes any list, poll, ranking, etc.; and, in general, any consultation that involves more than one unit of knowledge. Classora excels at the reports management due to its visualization capabilities, being able to display data in the form of tables, graphs and maps. Types of reports: Sports scores: Sports competitions results sanctioned by the competent institution. Rankings and lists: All types of interesting and curious lists, whether they have an implicit order or not. Polls: Units of knowledge that are ranked according to users’ votes. Queries to the Knowledge Base: Questions from users using CQL. Networks of connections: automatically calculated from the reports and the taxonomy of each Knowledge Unit. === Organizational taxonomy === An organizational taxonomy (also referred to as entry type) is a data sheet that brings together the common attributes of a set of units of knowledge. For instance, the organizational taxonomy F1 Driver displays attributes such as date of debut, team, etc.; and the organizational taxonomy Football Club presents attributes such as city, stadium, etc. In Classora, taxonomies are hierarchically organized, so that they inherit attributes from their parent taxonomies. For instance, F1 Driver is a subsidiary taxonomy of Sportsperson, which is a subsidiary taxonomy of Person, which in turn is a subsidiary taxonomy of Organism. The simplest type of entry in Classora is Classora Object. All the other taxonomies are its subsidiaries and inherit its attributes. In fact, the only attribute Classora Object possesses is name (all units of knowledge are required to have one name at least). == Architecture of Classora == === Data Extraction Module === The Data Extraction Module consists of a set of robots coordinated by software that also manages the potential incidents. Most of the information available in Classora is automatically uploaded through those robots, which connect to the main online public sources to gather all types of data. There are three categories of robots: Extraction robots: responsible for the massive uploading of reports from official public sources (FIFA, CIA, IMF, Eurostat...). They are used for either absolute or incremental data uploading. Data scanner robots: responsible for looking for and updating the data of a unit of knowledge. They use specific sources to perform this task: Wikipedia, IMDB, World Bank, etc. Content aggregators: they don’t connect to external sources. Instead, they generate new information using Classora’s internal database. === Participatory Module === In Classora’s Open Website, Internet users may participate providing their knowledge as they would on the Wikipedia. There are different ways to participate: adding or correcting data in the Knowledge Base, voting in surveys (participatory rankings) and creating new Knowledge Units and Data Reports. === Connectivity Module === The Knowledge Base is designed to be embedded in multi-platform, multi-channel systems, thus enabling its integration into mobile devices, tablets, interactive TV, etc. This integration may be carried out through specific plugins (for navigators or other devices) or an API REST that provides content in XML or JSON formats. The API is divided into three blocks of operations. The first one is the block of general utility tools (ranging from autosuggest components about geographical hierarchies to operations to obtain the list of today’s celebrity birthdays, using CQL). The second one is the block of operations for widget generation (graphs, maps, rankings) using information from the knowledge base. Finally, there is a block of operations designed for the publication of free-source content. == Project statistics == As of April 2012, 2,000,000 Knowledge Units, 15,000 Reports, around 10,000 Maps and several million potential Comparative Analyses had been added to Classora. According to the site of web metrics Alexa, Classora Open Website is ranked at 100,557 globally and at 2,880 in the Spanish traffic ranking. Users spend an average of 9 ½ minutes in Classora.

    Read more →
  • Chaffing and winnowing

    Chaffing and winnowing

    Chaffing and winnowing is a cryptographic technique to achieve confidentiality without using encryption when sending data over an insecure channel. The name is derived from agriculture: after grain has been harvested and threshed, it remains mixed together with inedible fibrous chaff. The chaff and grain are then separated by winnowing, and the chaff is discarded. The cryptographic technique was conceived by Ron Rivest and published in an on-line article on 18 March 1998. Although it bears similarities to both traditional encryption and steganography, it cannot be classified under either category. This technique allows the sender to deny responsibility for encrypting their message. When using chaffing and winnowing, the sender transmits the message unencrypted, in clear text. Although the sender and the receiver share a secret key, they use it only for authentication. However, a third party can make their communication confidential by simultaneously sending specially crafted messages through the same channel. == How it works == The sender (Alice) wants to send a message to the receiver (Bob). In the simplest setup, Alice enumerates the symbols in her message and sends out each in a separate packet. If the symbols are complex enough, such as natural-language text, an attacker may be able to distinguish the real symbols from poorly faked chaff symbols, posing a similar problem as steganography in needing to generate highly realistic fakes; to avoid this, the symbols can be reduced to just single 0/1 bits, and realistic fakes can then be simply randomly generated 50:50 and are indistinguishable from real symbols. In general, the method requires each symbol to arrive in-order and to be authenticated by the receiver. When implemented over networks that may change the order of packets, the sender places the symbol's serial number in the packet, the symbol itself (both unencrypted), and a message authentication code (MAC). Many MACs use a secret key Alice shares with Bob, but it is sufficient that the receiver has a method to authenticate the packets. Rivest notes an interesting property of chaffing-and-winnowing is that third parties (such as an ISP) can opportunistically add it to communications without needing permission or coordination with the sender/recipient. A third-party (Charles) who transmits Alice's packets to Bob, interleaves the packets with corresponding bogus packets (called "chaff") with corresponding serial numbers, arbitrary symbols, and a random number in place of the MAC. Charles does not need to know the key to do that (real MACs are large enough that it is extremely unlikely to generate a valid one by chance, unlike in the example). Bob uses the MAC to find the authentic messages and drops the "chaff" messages. This process is called "winnowing". An eavesdropper located between Alice and Charles can easily read Alice's message. But an eavesdropper between Charles and Bob would have to tell which packets are bogus and which are real (i.e. to winnow, or "separate the wheat from the chaff"). That is infeasible if the MAC used is secure and Charles does not leak any information on packet authenticity (e.g. via timing). If a fourth party joins the example (named Darth) who wants to send counterfeit messages to impersonate Alice, it would require Alice to disclose her secret key. If Darth cannot force Alice to disclose an authentication key (the knowledge of which would enable him to forge messages from Alice), then her messages will remain confidential. Charles, on the other hand, is no target of Darth's at all, since Charles does not even possess any secret keys that could be disclosed. == Variations == The simple variant of the chaffing and winnowing technique described above adds many bits of overhead per bit of original message. To make the transmission more efficient, Alice can process her message with an all-or-nothing transform and then send it out in much larger chunks. The chaff packets will have to be modified accordingly. Because the original message can be reconstructed only by knowing all of its chunks, Charles needs to send only enough chaff packets to make finding the correct combination of packets computationally infeasible. Chaffing and winnowing lends itself especially well to use in packet-switched network environments such as the Internet, where each message (whose payload is typically small) is sent in a separate network packet. In another variant of the technique, Charles carefully interleaves packets coming from multiple senders. That eliminates the need for Charles to generate and inject bogus packets in the communication. However, the text of Alice's message cannot be well protected from other parties who are communicating via Charles at the same time. This variant also helps protect against information leakage and traffic analysis. == Implications for law enforcement == Ron Rivest suggests that laws related to cryptography, including export controls, would not apply to chaffing and winnowing because it does not employ any encryption at all. The power to authenticate is in many cases the power to control, and handing all authentication power to the government is beyond all reason The author of the paper proposes that the security implications of handing everyone's authentication keys to the government for law-enforcement purposes would be far too risky, since possession of the key would enable someone to masquerade and communicate as another entity, such as an airline controller. Furthermore, Ron Rivest contemplates the possibility of rogue law enforcement officials framing up innocent parties by introducing the chaff into their communications, concluding that drafting a law restricting chaffing and winnowing would be far too difficult. == Trivia == The term winnowing was suggested by Ronald Rivest's father. Before the publication of Rivest's paper in 1998 other people brought to his attention a 1965 novel, Rex Stout's The Doorbell Rang, which describes the same concept and was thus included in the paper's references.

    Read more →
  • FMLLR

    FMLLR

    In signal processing, Feature space Maximum Likelihood Linear Regression (fMLLR) is a global feature transform that are typically applied in a speaker adaptive way, where fMLLR transforms acoustic features to speaker adapted features by a multiplication operation with a transformation matrix. In some literature, fMLLR is also known as the Constrained Maximum Likelihood Linear Regression (cMLLR). == Overview == fMLLR transformations are trained in a maximum likelihood sense on adaptation data. These transformations may be estimated in many ways, but only maximum likelihood (ML) estimation is considered in fMLLR. The fMLLR transformation is trained on a particular set of adaptation data, such that it maximizes the likelihood of that adaptation data given a current model-set. This technique is a widely used approach for speaker adaptation in HMM-based speech recognition. Later research also shows that fMLLR is an excellent acoustic feature for DNN/HMM hybrid speech recognition models. The advantage of fMLLR includes the following: the adaptation process can be performed within a pre-processing phase, and is independent of the ASR training and decoding process. this type of adapted feature can be applied to deep neural networks (DNN) to replace traditionally used mel-spectrogram in end-to-end speech recognition models. fMLLR's speaker adaptation process leads to a significant performance boost for ASR models, hence outperforming other transform or features like MFCCs (Mel-Frequency Cepstral Coefficients) and FBANKs (Filter bank) coefficients. fMLLR features can be efficiently realized with speech toolkits like Kaldi. Major problem and disadvantage of fMLLR: when the amount of adaptation data is limited, the transformation matrices tends to easily overfit the given data. == Computing fMLLR transform == Feature transform of fMLLR can be easily computed with the open source speech tool Kaldi, the Kaldi script uses the standard estimation scheme described in Appendix B of the original paper, in particular the section Appendix B.1 "Direct method over rows". In the Kaldi formulation, fMLLR is an affine feature transform of the form x {\displaystyle x} → A {\displaystyle A} x {\displaystyle x} + b {\displaystyle +b} , which can be written in the form x {\displaystyle x} →W x ^ {\displaystyle {\hat {x}}} , where x ^ {\displaystyle {\hat {x}}} = [ x 1 ] {\displaystyle {\begin{bmatrix}x\\1\end{bmatrix}}} is the acoustic feature x {\displaystyle x} with a 1 appended. Note that this differs from some of the literature where the 1 comes first as x ^ {\displaystyle {\hat {x}}} = [ 1 x ] {\displaystyle {\begin{bmatrix}1\\x\end{bmatrix}}} . The sufficient statistics stored are: K = ∑ t , j , m γ j , m ( t ) Σ j m − 1 μ j m x ( t ) + {\displaystyle K=\sum _{t,j,m}\gamma _{j,m}(t)\textstyle \Sigma _{jm}^{-1}\mu _{jm}x(t)^{+}\displaystyle } where Σ j m − 1 {\displaystyle \textstyle \Sigma _{jm}^{-1}\displaystyle } is the inverse co-variance matrix. And for 0 ≤ i ≤ D {\displaystyle 0\leq i\leq D} where D {\displaystyle D} is the feature dimension: G ( i ) = ∑ t , j , m γ j , m ( t ) ( 1 σ j , m 2 ( i ) ) x ( t ) + x ( t ) + T {\displaystyle G^{(i)}=\sum _{t,j,m}\gamma _{j,m}(t)\left({\frac {1}{\sigma _{j,m}^{2}(i)}}\right)x(t)^{+}x(t)^{+T}\displaystyle } For a thorough review that explains fMLLR and the commonly used estimation techniques, see the original paper "Maximum likelihood linear transformations for HMM-based speech recognition ". Note that the Kaldi script that performs the feature transforms of fMLLR differs with by using a column of the inverse in place of the cofactor row. In other words, the factor of the determinant is ignored, as it does not affect the transform result and can causes potential danger of numerical underflow or overflow. == Comparing with other features or transforms == Experiment result shows that by using the fMLLR feature in speech recognition, constant improvement is gained over other acoustic features on various commonly used benchmark datasets (TIMIT, LibriSpeech, etc). In particular, fMLLR features outperform MFCCs and FBANKs coefficients, which is mainly due to the speaker adaptation process that fMLLR performs. In, phoneme error rate (PER, %) is reported for the test set of TIMIT with various neural architectures: As expected, fMLLR features outperform MFCCs and FBANKs coefficients despite the use of different model architecture. Where MLP (multi-layer perceptron) serves as a simple baseline, on the other hand RNN, LSTM, and GRU are all well known recurrent models. The Li-GRU architecture is based on a single gate and thus saves 33% of the computations over a standard GRU model, Li-GRU thus effectively address the gradient vanishing problem of recurrent models. As a result, the best performance is obtained with the Li-GRU model on fMLLR features. == Extract fMLLR features with Kaldi == fMLLR can be extracted as reported in the s5 recipe of Kaldi. Kaldi scripts can certainly extract fMLLR features on different dataset, below are the basic example steps to extract fMLLR features from the open source speech corpora Librispeech. Note that the instructions below are for the subsets train-clean-100,train-clean-360,dev-clean, and test-clean, but they can be easily extended to support the other sets dev-other, test-other, and train-other-500. These instruction are based on the codes provided in this GitHub repository, which contains Kaldi recipes on the LibriSpeech corpora to execute the fMLLR feature extraction process, replace the files under $KALDI_ROOT/egs/librispeech/s5/ with the files in the repository. Install Kaldi. Install Kaldiio. If running on a single machine, change the following lines in $KALDI_ROOT/egs/librispeech/s5/cmd.sh to replace queue.pl to run.pl: Change the data path in run.sh to your LibriSpeech data path, the directory LibriSpeech/ should be under that path. For example: Install flac with: sudo apt-get install flac Run the Kaldi recipe run.sh for LibriSpeech at least until Stage 13 (included), for simplicity you can use the modified run.sh. Copy exp/tri4b/trans. files into exp/tri4b/decode_tgsmall_train_clean_/ with the following command: Compute the fMLLR features by running the following script, the script can also be downloaded here: Compute alignments using: Apply CMVN and dump the fMLLR features to new .ark files, the script can also be downloaded here: Use the Python script to convert Kaldi generated .ark features to .npy for your own dataloader, an example Python script is provided:

    Read more →
  • Initialization vector

    Initialization vector

    In cryptography, an initialization vector (IV) or starting variable is an input to a cryptographic primitive being used to provide the initial state. The IV is typically required to be random or pseudorandom, but sometimes an IV only needs to be unpredictable or unique. Randomization is crucial for some encryption schemes to achieve semantic security, a property whereby repeated usage of the scheme under the same key does not allow an attacker to infer relationships between (potentially similar) segments of the encrypted message. For block ciphers, the use of an IV is described by the modes of operation. Some cryptographic primitives require the IV only to be non-repeating, and the required randomness is derived internally. In this case, the IV is commonly called a nonce (a number used only once), and the primitives (e.g. CBC) are considered stateful rather than randomized. This is because an IV need not be explicitly forwarded to a recipient but may be derived from a common state updated at both sender and receiver side. (In practice, a short nonce is still transmitted along with the message to consider message loss.) An example of stateful encryption schemes is the counter mode of operation, which has a sequence number for a nonce. The IV size depends on the cryptographic primitive used; for block ciphers it is generally the cipher's block-size. In encryption schemes, the unpredictable part of the IV has at best the same size as the key to compensate for time/memory/data tradeoff attacks. When the IV is chosen at random, the probability of collisions due to the birthday problem must be taken into account. Traditional stream ciphers such as RC4 do not support an explicit IV as input, and a custom solution for incorporating an IV into the cipher's key or internal state is needed. Some designs realized in practice are known to be insecure; the WEP protocol is a notable example, and is prone to related-IV attacks. == Motivation == A block cipher is one of the most basic primitives in cryptography, and frequently used for data encryption. However, by itself, it can only be used to encode a data block of a predefined size, called the block size. For example, a single invocation of the AES algorithm transforms a 128-bit plaintext block into a ciphertext block of 128 bits in size. The key, which is given as one input to the cipher, defines the mapping between plaintext and ciphertext. If data of arbitrary length is to be encrypted, a simple strategy is to split the data into blocks each matching the cipher's block size, and encrypt each block separately using the same key. This method is not secure as equal plaintext blocks get transformed into equal ciphertexts, and a third party observing the encrypted data may easily determine its content even when not knowing the encryption key. To hide patterns in encrypted data while avoiding the re-issuing of a new key after each block cipher invocation, a method is needed to randomize the input data. In 1980, the NIST published a national standard document designated Federal Information Processing Standard (FIPS) PUB 81, which specified four so-called block cipher modes of operation, each describing a different solution for encrypting a set of input blocks. The first mode implements the simple strategy described above, and was specified as the electronic codebook (ECB) mode. In contrast, each of the other modes describe a process where ciphertext from one block encryption step gets intermixed with the data from the next encryption step. To initiate this process, an additional input value is required to be mixed with the first block, and which is referred to as an initialization vector. For example, the cipher-block chaining (CBC) mode requires an unpredictable value, of size equal to the cipher's block size, as additional input. This unpredictable value is added to the first plaintext block before subsequent encryption. In turn, the ciphertext produced in the first encryption step is added to the second plaintext block, and so on. The ultimate goal for encryption schemes is to provide semantic security: by this property, it is practically impossible for an attacker to draw any knowledge from observed ciphertext. It can be shown that each of the three additional modes specified by the NIST are semantically secure under so-called chosen-plaintext attacks. == Properties == Properties of an IV depend on the cryptographic scheme used. A basic requirement is uniqueness, which means that no IV may be reused under the same key. For block ciphers, repeated IV values devolve the encryption scheme into electronic codebook mode: equal IV and equal plaintext result in equal ciphertext. In stream cipher encryption uniqueness is crucially important as plaintext may be trivially recovered otherwise. Example: Stream ciphers encrypt plaintext P to ciphertext C by deriving a key stream K from a given key and IV and computing C as C = P xor K. Assume that an attacker has observed two messages C1 and C2 both encrypted with the same key and IV. Then knowledge of either P1 or P2 reveals the other plaintext since C1 xor C2 = (P1 xor K) xor (P2 xor K) = P1 xor P2. Many schemes require the IV to be unpredictable by an adversary. This is effected by selecting the IV at random or pseudo-randomly. In such schemes, the chance of a duplicate IV is negligible, but the effect of the birthday problem must be considered. As for the uniqueness requirement, a predictable IV may allow recovery of (partial) plaintext. Example: Consider a scenario where a legitimate party called Alice encrypts messages using the cipher-block chaining mode. Consider further that there is an adversary called Eve that can observe these encryptions and is able to forward plaintext messages to Alice for encryption (in other words, Eve is capable of a chosen-plaintext attack). Now assume that Alice has sent a message consisting of an initialization vector IV1 and starting with a ciphertext block CAlice. Let further PAlice denote the first plaintext block of Alice's message, let E denote encryption, and let PEve be Eve's guess for the first plaintext block. Now, if Eve can determine the initialization vector IV2 of the next message she will be able to test her guess by forwarding a plaintext message to Alice starting with (IV2 xor IV1 xor PEve); if her guess was correct this plaintext block will get encrypted to CAlice by Alice. This is because of the following simple observation: CAlice = E(IV1 xor PAlice) = E(IV2 xor (IV2 xor IV1 xor PAlice)). Depending on whether the IV for a cryptographic scheme must be random or only unique the scheme is either called randomized or stateful. While randomized schemes always require the IV chosen by a sender to be forwarded to receivers, stateful schemes allow sender and receiver to share a common IV state, which is updated in a predefined way at both sides. == Block ciphers == Block cipher processing of data is usually described as a mode of operation. Modes are primarily defined for encryption as well as authentication, though newer designs exist that combine both security solutions in so-called authenticated encryption modes. While encryption and authenticated encryption modes usually take an IV matching the cipher's block size, authentication modes are commonly realized as deterministic algorithms, and the IV is set to zero or some other fixed value. == Stream ciphers == In stream ciphers, IVs are loaded into the keyed internal secret state of the cipher, after which a number of cipher rounds are executed prior to releasing the first bit of output. For performance reasons, designers of stream ciphers try to keep that number of rounds as small as possible, but because determining the minimal secure number of rounds for stream ciphers is not a trivial task, and considering other issues such as entropy loss, unique to each cipher construction, related-IVs and other IV-related attacks are a known security issue for stream ciphers, which makes IV loading in stream ciphers a serious concern and a subject of ongoing research. == WEP IV == The 802.11 encryption algorithm called WEP (short for Wired Equivalent Privacy) used a short, 24-bit IV, leading to reused IVs with the same key, which led to it being easily cracked. Packet injection allowed for WEP to be cracked in times as short as several seconds. This ultimately led to the deprecation of WEP. == SSL 2.0 IV == In cipher-block chaining mode (CBC mode), the IV need not be secret, but must be unpredictable (In particular, for any given plaintext, it must not be possible to predict the IV that will be associated to the plaintext in advance of the generation of the IV.) at encryption time. Additionally for the output feedback mode (OFB mode), the IV must be unique. In particular, the (previously) common practice of re-using the last ciphertext block of a message as the IV for the next message is insecure (for example, this method was used by SSL 2.0). If an attacker knows

    Read more →
  • Content repository

    Content repository

    A content repository or content store is a database of digital content with an associated set of data management, search and access methods allowing application-independent access to the content, rather like a digital library, but with the ability to store and modify content in addition to searching and retrieving. The content repository acts as the storage engine for a larger application such as a content management system or a document management system, which adds a user interface on top of the repository's application programming interface. == Advantages provided by repositories == Common rules for data access allow many applications to work with the same content without interrupting the data. They give out signals when changes happen, letting other applications using the repository know that something has been modified, which enables collaborative data management. Developers can deal with data using programs that are more compatible with the desktop programming environment. The data model is scriptable when users use a content repository. == Content repository features == A content repository may provide functionality such as: Add/edit/delete content Hierarchy and sort order management Query / search Versioning Access control Import / export Locking Life-cycle management Retention and holding / records management == Examples == Apache Jackrabbit ModeShape == Applications == Content management Document management Digital asset management Records management Revision control Social collaboration Web content management == Standards and specification == Content repository API for Java WebDAV Content Management Interoperability Services

    Read more →
  • Trace zero cryptography

    Trace zero cryptography

    First proposed by Gerhard Frey in 1998, trace zero cryptography refers to the use of trace zero varieties (TZV) for cryptographic purpose. Trace zero varieties are subgroups of the divisor class group on a low genus hyperelliptic curve defined over a finite field. These groups can be used to establish asymmetric cryptography using the discrete logarithm problem as cryptographic primitive. Trace zero varieties feature a better scalar multiplication performance than elliptic curves. This allows fast arithmetic in these groups, which can speed up the calculations with a factor 3 compared with elliptic curves and hence speed up the cryptosystem. Another advantage is that for groups of cryptographically relevant size, the order of the group can simply be calculated using the characteristic polynomial of the Frobenius endomorphism. This is not the case, for example, in elliptic curve cryptography when the group of points of an elliptic curve over a prime field is used for cryptographic purpose. However, to represent an element of the trace zero variety more bits are needed compared with elements of elliptic or hyperelliptic curves. Another disadvantage is the fact that it is possible to reduce the security of the TZV of 1/6th of the bit length using cover attack. == Mathematical background == A hyperelliptic curve C of genus g over a prime field F q {\displaystyle \mathbb {F} _{q}} where q = pn (p prime) of odd characteristic is defined as C : y 2 + h ( x ) y = f ( x ) , {\displaystyle C:~y^{2}+h(x)y=f(x),} where f monic, deg(f) = 2g + 1 and deg(h) ≤ g. The curve has at least one F q {\displaystyle \mathbb {F} _{q}} -rational Weierstraßpoint. The Jacobian variety J C ( F q n ) {\displaystyle J_{C}(\mathbb {F} _{q^{n}})} of C is for all finite extension F q n {\displaystyle \mathbb {F} _{q^{n}}} isomorphic to the ideal class group Cl ⁡ ( C / F q n ) {\displaystyle \operatorname {Cl} (C/\mathbb {F} _{q^{n}})} . With the Mumford's representation it is possible to represent the elements of J C ( F q n ) {\displaystyle J_{C}(\mathbb {F} _{q^{n}})} with a pair of polynomials [u, v], where u, v ∈ F q n [ x ] {\displaystyle \mathbb {F} _{q^{n}}[x]} . The Frobenius endomorphism σ is used on an element [u, v] of J C ( F q n ) {\displaystyle J_{C}(\mathbb {F} _{q^{n}})} to raise the power of each coefficient of that element to q: σ([u, v]) = [uq(x), vq(x)]. The characteristic polynomial of this endomorphism has the following form: χ ( T ) = T 2 g + a 1 T 2 g − 1 + ⋯ + a g T g + ⋯ + a 1 q g − 1 T + q g , {\displaystyle \chi (T)=T^{2g}+a_{1}T^{2g-1}+\cdots +a_{g}T^{g}+\cdots +a_{1}q^{g-1}T+q^{g},} where ai in Z {\displaystyle \mathbb {Z} } With the Hasse–Weil theorem it is possible to receive the group order of any extension field F q n {\displaystyle \mathbb {F} _{q^{n}}} by using the complex roots τi of χ(T): | J C ( F q n ) | = ∏ i = 1 2 g ( 1 − τ i n ) {\displaystyle |J_{C}(\mathbb {F} _{q^{n}})|=\prod _{i=1}^{2g}(1-\tau _{i}^{n})} Let D be an element of the J C ( F q n ) {\displaystyle J_{C}(\mathbb {F} _{q^{n}})} of C, then it is possible to define an endomorphism of J C ( F q n ) {\displaystyle J_{C}(\mathbb {F} _{q^{n}})} , the so-called trace of D: Tr ⁡ ( D ) = ∑ i = 0 n − 1 σ i ( D ) = D + σ ( D ) + ⋯ + σ n − 1 ( D ) {\displaystyle \operatorname {Tr} (D)=\sum _{i=0}^{n-1}\sigma ^{i}(D)=D+\sigma (D)+\cdots +\sigma ^{n-1}(D)} Based on this endomorphism one can reduce the Jacobian variety to a subgroup G with the property, that every element is of trace zero: G = { D ∈ J C ( F q n ) | Tr ( D ) = 0 } , ( 0 neutral element in J C ( F q n ) {\displaystyle G=\{D\in J_{C}(\mathbb {F} _{q^{n}})~|~{\text{Tr}}(D)={\textbf {0}}\},~~~({\textbf {0}}{\text{ neutral element in }}J_{C}(\mathbb {F} _{q^{n}})} G is the kernel of the trace endomorphism and thus G is a group, the so-called trace zero (sub)variety (TZV) of J C ( F q n ) {\displaystyle J_{C}(\mathbb {F} _{q^{n}})} . The intersection of G and J C ( F q ) {\displaystyle J_{C}(\mathbb {F} _{q})} is produced by the n-torsion elements of J C ( F q ) {\displaystyle J_{C}(\mathbb {F} _{q})} . If the greatest common divisor gcd ( n , | J C ( F q ) | ) = 1 {\displaystyle \gcd(n,|J_{C}(\mathbb {F} _{q})|)=1} the intersection is empty and one can compute the group order of G: | G | = | J C ( F q n ) | | J C ( F q ) | = ∏ i = 1 2 g ( 1 − τ i n ) ∏ i = 1 2 g ( 1 − τ i ) {\displaystyle |G|={\dfrac {|J_{C}(\mathbb {F} _{q^{n}})|}{|J_{C}(\mathbb {F} _{q})|}}={\dfrac {\prod _{i=1}^{2g}(1-\tau _{i}^{n})}{\prod _{i=1}^{2g}(1-\tau _{i})}}} The actual group used in cryptographic applications is a subgroup G0 of G of a large prime order l. This group may be G itself. There exist three different cases of cryptographical relevance for TZV: g = 1, n = 3 g = 1, n = 5 g = 2, n = 3 == Arithmetic == The arithmetic used in the TZV group G0 based on the arithmetic for the whole group J C ( F q n ) {\displaystyle J_{C}(\mathbb {F} _{q^{n}})} , But it is possible to use the Frobenius endomorphism σ to speed up the scalar multiplication. This can be archived if G0 is generated by D of order l then σ(D) = sD, for some integers s. For the given cases of TZV s can be computed as follows, where ai come from the characteristic polynomial of the Frobenius endomorphism : For g = 1, n = 3: s = q − 1 1 − a 1 mod ℓ {\displaystyle s={\dfrac {q-1}{1-a_{1}}}{\bmod {\ell }}} For g = 1, n = 5: s = q 2 − q − a 1 2 q + a 1 q + 1 q − 2 a 1 q + a 1 3 − a 1 2 + a 1 − 1 mod ℓ {\displaystyle s={\dfrac {q^{2}-q-a_{1}^{2}q+a_{1}q+1}{q-2a_{1}q+a_{1}^{3}-a_{1}^{2}+a_{1}-1}}{\bmod {\ell }}} For g = 2, n = 3: s = − q 2 − a 2 + a 1 a 1 q − a 2 + 1 mod ℓ {\displaystyle s=-{\dfrac {q^{2}-a_{2}+a_{1}}{a_{1}q-a_{2}+1}}{\bmod {\ell }}} Knowing this, it is possible to replace any scalar multiplication mD (|m| ≤ l/2) with: m 0 D + m 1 σ ( D ) + ⋯ + m n − 1 σ n − 1 ( D ) , where m i = O ( ℓ 1 / ( n − 1 ) ) = O ( q g ) {\displaystyle m_{0}D+m_{1}\sigma (D)+\cdots +m_{n-1}\sigma ^{n-1}(D),~~~~{\text{where }}m_{i}=O(\ell ^{1/(n-1)})=O(q^{g})} With this trick the multiple scalar product can be reduced to about 1/(n − 1)th of doublings necessary for calculating mD, if the implied constants are small enough. == Security == The security of cryptographic systems based on trace zero subvarieties according to the results of the papers comparable to the security of hyper-elliptic curves of low genus g' over F p ′ {\displaystyle \mathbb {F} _{p'}} , where p' ~ (n − 1)(g/g' ) for |G| ~128 bits. For the cases where n = 3, g = 2 and n = 5, g = 1 it is possible to reduce the security for at most 6 bits, where |G| ~ 2256, because one can not be sure that G is contained in a Jacobian of a curve of genus 6. The security of curves of genus 4 for similar fields are far less secure. == Cover attack on a trace zero crypto-system == The attack published in shows, that the DLP in trace zero groups of genus 2 over finite fields of characteristic diverse than 2 or 3 and a field extension of degree 3 can be transformed into a DLP in a class group of degree 0 with genus of at most 6 over the base field. In this new class group the DLP can be attacked with the index calculus methods. This leads to a reduction of the bit length 1/6th.

    Read more →
  • Gorn address

    Gorn address

    A Gorn address (Gorn, 1967) is a method of identifying and addressing any node within a tree data structure. This notation is often used for identifying nodes in a parse tree defined by phrase structure rules. The Gorn address is a sequence of zero or more integers conventionally separated by dots, e.g., 0 or 1.0.1. The root which Gorn calls can be regarded as the empty sequence. And the j {\displaystyle j} -th child of the i {\displaystyle i} -th child has an address i . j {\displaystyle i.j} , counting from 0. It is named after American computer scientist Saul Gorn.

    Read more →
  • Out-of-band control

    Out-of-band control

    Out-of-band control is a method used by network protocols for sending control information (commands, logins, or session signals) separately from the main data, improving reliability and preventing interference. File Transfer Protocol (FTP) employs an out-of-band approach, using one connection for control commands, like logging in or requesting files, and a separate connection for transferring the files themselves.

    Read more →
  • Messaging Layer Security

    Messaging Layer Security

    Messaging Layer Security (MLS) is a security layer for end-to-end encrypted messages. It is maintained by the MLS working group of the Internet Engineering Task Force (IETF), and is designed to provide an efficient and practical security mechanism for groups as large as 50,000 and for those who access chat systems from multiple devices. == Security properties == Security properties of MLS include message confidentiality, message integrity and authentication, membership authentication, asynchronicity, forward secrecy, post-compromise security, and scalability. == History == The idea was born in 2016 and first discussed in an unofficial meeting during IETF 96 in Berlin with attendees from Wire, Mozilla and Cisco. Initial ideas were based on pairwise encryption for secure 1:1 and group communication. In 2017, an academic paper introducing Asynchronous Ratcheting Trees was published by the University of Oxford and Facebook setting the focus on more efficient encryption schemes. The first BoF took place in February 2018 at IETF 101 in London. The founding members are Mozilla, Facebook, Wire, Google, Twitter, University of Oxford, and INRIA. On March 29, 2023, the IETF approved publication of Messaging Layer Security (MLS) as a new standard. It was officially published on July 19, 2023. At that time, Google announced it intended to add MLS to the end to end encryption used by Google Messages over Rich Communication Services (RCS). In March 2025, the GSMA announced the Universal Profile 3.0 standard of RCS would support MLS and Apple announced it would support this RCS standard on Apple Messages. Both Google Messages and Apple Messages began the rollout of MLS E2EE over RCS in May 2026. Matrix is one of the protocols declaring migration to MLS. In 2026, Discord rolled out end-to-end encryption on voice and video calls, using MLS for scalable group key exchanges. Research on adding post-quantum cryptography (PQC) to MLS is ongoing. The IETF has prepared an Internet-Draft using PQC algorithms in MLS. == Implementations ==

    Read more →