AI Chat On Google

AI Chat On Google — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Digital curation

    Digital curation

    Digital curation is the selection, preservation, maintenance, collection, and archiving of digital assets. It is a process that establishes, maintains, and adds value to repositories of digital data for present and future use. The implementation of digital curation is often carried out by archivists, librarians, scientists, historians, and scholars to ensure users have access to reliable, high-quality resources. Enterprises are also starting to adopt digital curation as a means to improve the quality of information and data within their operational and strategic processes. A successful digital curation initiative will help to mitigate digital obsolescence, keeping the information accessible to users indefinitely. Digital curation includes various aspects, including digital asset management, data curation, digital preservation, and electronic records management. == Word History == Much like the word archive has layered meanings and uses, the word curation is both a noun and a verb, used originally in the field of museology to represent a wide range of activities, most often associated with collection care, long-term preservation, and exhibition design. Curation can be a reference to physical repositories that store cultural heritage or natural resource collections (e.g., a curatorial repository) or a representation of varied policies and processes involved with the long-term care and management of heritage collections, digital archives, and research data (e.g, curatorial/collections management plans, curation life-cycle, and data curation). Yet curation is also associated with short-term objectives and processes of selection and interpretation for the purposes of presentation, such as for gallery exhibitions and websites, which contribute to knowledge creation. It has also been applied to interaction with social media including compiling digital images, web links, and movie files. The term curation entered the legal framework through federal historic preservation laws, starting with the National Historic Preservation Act of 1966, and was further defined and coded into federal regulations through 36 CFR Part 79: Curation of Federally-owned and Administered Archaeological Collections. Curation has since permeated into an array of disciplines but remains closely tied to heritage and information management. == Core Principles and Activities == The term "digital curation" was first used in the e-science and biological science fields as a means of differentiating the additional suite of activities ordinarily employed by library and museum curators to add value to their collections and enable its reuse from the smaller subtask of simply preserving the data, a significantly more concise archival task. Additionally, the historical understanding of the term "curator" demands more than simple care of the collection. A curator is expected to command academic mastery of the subject matter as a requisite part of appraisal and selection of assets and any subsequent adding of value to the collection through application of metadata. === Principles === There are five commonly accepted principles that govern the occupation of digital curation: Manage the complete birth-to-retirement life cycle of the digital asset. Evaluate and cull assets for inclusion in the collection. Apply preservation methods to strengthen the asset’s integrity and reusability for future users. Act proactively throughout the asset life cycle to add value to both the digital asset and the collection. Facilitate the appropriate degree of access to users. === Methodology === The Digital Curation Center offers the following step-by-step life cycle procedures for putting the above principles into practice: Sequential Actions: Conceptualize: Consider what digital material you will be creating and develop storage options. Take into account websites, publications, email, among other types of digital output. Create: Produce digital material and attach all relevant metadata, typically the more metadata the more accessible the information. Appraise and select: Consult the mission statement of the institution or private collection and determine what digital data is relevant. There may also be legal guidelines in place that will guide the decision process for a particular collection. Ingest: Send digital material to the predetermined storage solution. This may be an archive, repository or other facility. Preservation action: Employ measures to maintain the integrity of the digital material. Store: Secure data within the predetermined storage facility. Access, use, and reuse: Determine the level of accessibility for the range of digital material created. Some material may be accessible only by password and other material may be freely accessible to the public. Routinely check that material is still accessible for the intended audience and that the material has not been compromised through multiple uses. Transform: If desirable or necessary the material may be transferred into a different digital format. Occasional Actions: Dispose: Discard any digital material that is not deemed necessary to the institution. Reappraise: Reevaluate material to ensure that is it still relevant and is true to its original form. Migrate: Migrate data to another format in order to protect data for using better in the future. == Related terms == The term "digital curation" is sometimes used interchangeably with terms such as "digital preservation" and "digital archiving." While digital preservation does focus a significant degree of energy on optimizing reusability, preservation remains a subtask to the concept of digital archiving, which is in turn a subtask of digital curation. For example, archiving is a part of curation, but so are subsequent tasks such as themed collection-building, which is not considered an archival task. Similarly, preservation is a part of archiving, as are the tasks of selection and appraisal that are not necessarily part of preservation. Data curation is another term that is often used interchangeably with digital curation, however common usage of the two terms differs. While "data" is a more all-encompassing term that can be used generally to indicate anything recorded in binary form, the term "data curation" is most common in scientific parlance and usually refers to accumulating and managing information relative to the process of research. Data-driven research of education request the role of information professional gradually develop tradition of digital service to data curation particularly at the management of digital research data. So, while documents and other discrete digital assets are technically a subset of the broader concept of data, in the context of scientific vernacular digital curation represents a broader purview of responsibilities than data curation due to its interest in preserving and adding value to digital assets of any kind. == Challenges == === Rate of creation of new data and data sets === The ever lowering cost and increasing prevalence of entirely new categories of technology has led to a quickly growing flow of new data sets. These come from well established sources such as business and government, but the trend is also driven by new styles of sensors becoming embedded in more areas of modern life. This is particularly true of consumers, whose production of digital assets is no longer relegated strictly to work. Consumers now create wider ranges of digital assets, including videos, photos, location data, purchases, and fitness tracking data, just to name a few, and share them in wider ranges of social platforms. Additionally, the advance of technology has introduced new ways of working with data. Some examples of this are international partnerships that leverage astronomical data to create "virtual observatories," and similar partnerships have also leveraged data resulting from research at the Large Hadron Collider at CERN and the database of protein structures at the Protein Data Bank. === Storage format evolution and obsolescence === By comparison, archiving of analog assets is notably passive in nature, often limited to simply ensuring a suitable storage environment. Digital preservation requires a more proactive approach. Today’s artifacts of cultural significance are notably transient in nature and prone to obsolescence when social trends or dependent technologies change. This rapid progression of technology occasionally makes it necessary to migrate digital asset holdings from one file format to another in order to mitigate the dangers of hardware and software obsolescence which would render the asset unusable. === Underestimation of human labor costs === Modern tools for program planning often underestimate the amount of human labor costs required for adequate digital curation of large collections. As a result cost-benefit assessments often paint an inaccurate picture of both the amount of work involved and the true cost to the institution for bot

    Read more →
  • Wearable technology

    Wearable technology

    Wearable technology is a category of small electronic and mobile devices with wireless communications capability designed to be worn on the human body and are incorporated into gadgets, accessories, or clothes. Common types of wearable technology include smartwatches, fitness trackers, and smartglasses. Wearable electronic devices are often close to or on the surface of the skin, where they detect, analyze, and transmit information such as vital signs, and/or ambient data and which allow in some cases immediate biofeedback to the wearer. Wearable devices collect vast amounts of data from users making use of different behavioral and physiological sensors, which monitor their health status and activity levels. Wrist-worn devices include smartwatches with a touchscreen display, while wristbands are mainly used for fitness tracking but do not contain a touchscreen display. Wearable devices such as activity trackers are an example of the Internet of things, since "things" such as electronics, software, sensors, and connectivity are effectors that enable objects to exchange data (including data quality) through the internet with a manufacturer, operator, and/or other connected devices, without requiring human intervention. Wearable technology offers a wide range of possible uses, from communication and entertainment to improving health and fitness, however, there are worries about privacy and security because wearable devices have the ability to collect personal data. Wearable technology has a variety of use cases which is growing as the technology is developed and the market expands. It can be used to encourage individuals to be more active and improve their lifestyle choices. Healthy behavior is encouraged by tracking activity levels and providing useful feedback to enable goal setting. This can be shared with interested stakeholders such as healthcare providers. Wearables are popular in consumer electronics, most commonly in the form factors of smartwatches, smart rings, and implants. Apart from commercial uses, wearable technology is being incorporated into navigation systems, advanced textiles (e-textiles), and healthcare. As wearable technology is being proposed for use in critical applications, like other technology, it is vetted for its reliability and security properties. == History == In the 1500s, German inventor Peter Henlein (1485–1542) created small watches that were worn as necklaces. A century later, pocket watches grew in popularity as waistcoats became fashionable for men. Wristwatches were created in the late 1600s but were worn mostly by women as bracelets. Pedometers were developed around the same time as pocket watches. The concept of a pedometer was described by Leonardo da Vinci around 1500, and the Germanic National Museum in Nuremberg has a pedometer in its collection from 1590. In the late 1800s, the first wearable hearing aids were introduced. In 1904, aviator Alberto Santos-Dumont pioneered the modern use of the wristwatch. In 1949, American biophysicist Norman Holter invented the very first health monitoring device. His invention, the Holter monitor, was groundbreaking as one of the first wearable devices capable of tracking vital health data outside of a clinical setting. In the 1970s, calculator watches became available, reaching the peak of their popularity in the 1980s. From the early 2000s, wearable cameras were being used as part of a growing sousveillance movement. Expectations, operations, usage and concerns about wearable technology was floated on the first International Conference on Wearable Computing. In 2008, Ilya Fridman incorporated a hidden Bluetooth microphone into a pair of earrings. Big tech companies such as Apple, Samsung, and Fitbit have expanded on this idea by interfacing with smartphones and personal computer software to collect a wide variety of data. Wearable devices include dedicated health monitors, fitness bands, and smartwatches. In 2010, Fitbit released its first step counter. Wearable technology which tracks information such as walking and heart rate is part of the quantified self movement. In 2013, McLear, also known as NFC Ring, released a "smart ring". The smart ring could make bitcoin payments, unlock other devices, and transfer personally identifying information, and also had other features. In 2013, one of the first widely available smartwatches was the Samsung Galaxy Gear. Apple followed in 2015 with the Apple Watch. === Prototypes === From 1991 to 1997, Rosalind Picard and her students, Steve Mann and Jennifer Healey, at the MIT Media Lab designed, built, and demonstrated data collection and decision making from "Smart Clothes" that monitored continuous physiological data from the wearer. These "smart clothes", "smart underwear", "smart shoes", and smart jewellery collected data that related to affective state and contained or controlled physiological sensors and environmental sensors like cameras and other devices. At the same time, also at the MIT Media Lab, Thad Starner and Alex "Sandy" Pentland develop augmented reality. In 1997, their smartglass prototype is featured on 60 Minutes and enables rapid web search and instant messaging. Though the prototype's glasses are nearly as streamlined as modern smartglasses, the processor was a computer worn in a backpack – the most lightweight solution available at the time. In 2009, Sony Ericsson teamed up with the London College of Fashion for a contest to design digital clothing. The winner was a cocktail dress with Bluetooth technology making it light up when a call is received. Zach "Hoeken" Smith of MakerBot fame made keyboard pants during a "Fashion Hacking" workshop at a New York City creative collective. The Tyndall National Institute in Ireland developed a "remote non-intrusive patient monitoring" platform which was used to evaluate the quality of the data generated by the patient sensors and how the end users may adopt to the technology. More recently, London-based fashion company CuteCircuit created costumes for singer Katy Perry featuring LED lighting so that the outfits would change color both during stage shows and appearances on the red carpet such as the dress Katy Perry wore in 2010 at the MET Gala in NYC. In 2012, CuteCircuit created the world's first dress to feature Tweets, as worn by singer Nicole Scherzinger. In 2010, McLear, also known as NFC Ring, developed prototypes of its "smart ring" devices, before a Kickstarter fundraising in 2013. In 2014, graduate students from the Tisch School of Arts in New York designed a hoodie that sent pre-programmed text messages triggered by gesture movements. Around the same time, prototypes for digital eyewear with heads up display (HUD) began to appear. The US military employs headgear with displays for soldiers using a technology called holographic optics. In 2010, Google started developing prototypes of its optical head-mounted display Google Glass, which went into customer beta in March 2013. == Usage == In the consumer space, sales of smart wristbands (aka activity trackers such as the Jawbone UP and Fitbit Flex) started accelerating in 2013. One in five American adults have a wearable device, according to the 2014 PriceWaterhouseCoopers Wearable Future Report. As of 2009, decreasing cost of processing power and other components was facilitating widespread adoption and availability. In professional sports, wearable technology has applications in monitoring and real-time feedback for athletes. Examples of wearable technology in sport include accelerometers, pedometers, and GPS's which can be used to measure an athlete's energy expenditure and movement pattern. In cybersecurity and financial technology, secure wearable devices have captured part of the physical security key market. McLear, also known as NFC Ring, and VivoKey developed products with one-time pass secure access control. In health informatics, wearable devices have enabled better capturing of human health statistics for data driven analysis. This has facilitated data-driven machine learning algorithms to analyse the health condition of users. In business, wearable technology helps managers easily supervise employees by knowing their locations and what they are currently doing. Employees working in a warehouse also have increased safety when working around chemicals or lifting something. Smart helmets are employee safety wearables that have vibration sensors that can alert employees of possible danger in their environment. == Wearable technology and health == Wearable technology is often used to monitor a user's health. Given that such a device is in close contact with the user, it can easily collect data. It started as soon as 1980 where first wireless ECG was invented. In the last decades, there has been substantial growth in research of e.g. textile-based, tattoo, patch, and contact lenses as well as circulation of a notion of "quantified self", transhumanism-related ideas, and growth of life ex

    Read more →
  • Voiceverse NFT plagiarism scandal

    Voiceverse NFT plagiarism scandal

    In January 2022, 15—the pseudonymous Massachusetts Institute of Technology (MIT) artificial intelligence researcher and creator of the non-commercial generative artificial intelligence voice synthesis research project 15.ai—discovered that the blockchain-based technology company Voiceverse had plagiarized from their platform. Voiceverse marketed itself as a service that offered AI voice cloning technology that could be purchased and traded as non-fungible tokens (NFTs). Amid heightened controversy over NFTs in the gaming industry, voice actor Troy Baker (who has been described as one of the most famous voice actors in video games) announced his partnership with Voiceverse on January 14, 2022, triggering immediate backlash over concerns about the environmental impact of NFTs, potential for fraud, predatory monetization in video games, and the potential of AI displacing jobs for human voice actors. Later that same day, 15 revealed through server logs that Voiceverse had generated voice lines using 15's free text-to-speech platform, pitch-shifted the audio to make them unrecognizable, and falsely marketed the samples as their own technology before selling them as NFTs. Within an hour of being confronted with evidence, Voiceverse confessed and stated that their marketing team had used 15.ai without proper attribution while rushing to create a technology demo to coincide with Baker's partnership announcement, further exacerbating the already negative reception to the original announcement. In response, 15 replied "Go fuck yourself"; the interaction went viral and garnered a large amount of support for the developer. News publications universally characterized this incident as Voiceverse having "stolen" from 15.ai. The next day, Baker appeared on a podcast and stated that his motivation had been to help independent creators who were unable to afford professional voice actors. Following continued backlash and the plagiarism revelation, Baker ended his partnership with Voiceverse on January 31, 2022. Subsequently, the incident was documented in multiple AI ethics databases, criticisms of predatory monetization in video games, and retrospectives as one of the earliest instances of plagiarism and theft stemming from artificial intelligence during the AI boom. == Background == === Troy Baker === Troy Baker is a prominent voice actor in the video game industry best known for his performances as Joel Miller in The Last of Us franchise. Baker has been described as "ubiquitous" by Polygon, "one of the most high-profile and prolific voice actors in video games" by Eurogamer, and "arguably the most famous voice actor in the gaming industry" by GameGuru. His other prominent roles include voicing Agent John "Jonesy" Jones in Fortnite, Booker DeWitt in BioShock Infinite, and both Batman and Joker in multiple Batman video games. As of October 2025, Baker holds the record for the most acting nominations at the BAFTA Games Awards, with five between 2013 and 2021. === Voiceverse === Voiceverse is a blockchain-based startup founded by the Bored Ape Yacht Club that marketed itself as offering AI voice cloning technology in the form of NFTs. Prior to the announcement of their partnership with Baker, Voiceverse had partnered with LOVO, Inc., an AI voice platform that, according to LOVO, could generate human-like voices. Voiceverse stated that any user who purchases a voice NFT would have unlimited and perpetual access to the voice model, which could be used to create content such as audiobooks, YouTube videos, podcasts, e-learning materials, in-game voice chat, and Zoom calls. Voiceverse promised that buyers would "OWN [sic] all of the IP" of content they created using these voices. Voiceverse's roadmap included plans to release 8,888 initial voice NFTs, a feature to add emotions to existing voices, and the ability for users to mint their own voices as NFTs. Prior to Baker's partnership, Voiceverse had also partnered with voice actors Charlet Chung, who voices D.Va in Overwatch, and Andy Milonakis of The Andy Milonakis Show. === 15.ai === 15.ai is a free web application launched in 2020 that uses artificial intelligence to generate text-to-speech voices of fictional characters from popular media. Created by a pseudonymous artificial intelligence researcher known as 15, who began developing the technology as a freshman during their undergraduate research at MIT, it was an early example of an application of generative artificial intelligence during the initial stages of the AI boom. The platform showed that deep neural networks could generate emotionally expressive speech with only 15 seconds of speech; the name "15.ai" references the creator's statement that a voice can be convincingly cloned with just 15 seconds of audio, as opposed to the tens of hours of data previously required. 15.ai became an Internet phenomenon in early 2021 when content utilizing it went viral on social media and quickly gained widespread use among various Internet fandoms. 15 has emphasized that it remain free and non-commercial; it only requires users to give proper credit when using the service for content creation. === NFTs in the video game industry === By early 2022, NFTs had become highly controversial within the gaming industry. Critics raised concerns about their environmental impact due to the significant energy consumption of blockchain technology. In addition, the prevalence of scams, fraud, and potential money laundering associated with NFT sales, as well as fears that NFTs were a new form of predatory monetization following the increasing frequency of loot boxes, caused vocal pushback from the gaming community. Several major gaming companies had begun exploring NFT integration into their products, though fan backlash had already forced some projects to be cancelled. On December 16, 2021, the developers of S.T.A.L.K.E.R. 2: Heart of Chernobyl announced that they would be including NFTs in the game, but cancelled within an hour of the announcement due to immediate universal backlash. Simultaneously, the rise of AI voice technology raised concerns among voice actors about potential job displacement and the devaluation of their work amidst the voice acting industry's ongoing struggles for better compensation and working conditions. == Partnership announcement and backlash == On January 14, 2022, 1:02 a.m. EST, Baker announced on Twitter that he was partnering with Voiceverse "to explore ways where together we might bring new tools to new creators to make new things, and allow everyone a chance to own & invest in the IP's they create." The announcement concluded with the statement "You can hate. Or you can create." Baker's specific role with Voiceverse remained unclear at the time of the announcement. Along with Baker's announcement, Voiceverse promoted their supposed voice AI technology on Twitter by posting animated videos that featured a cat character created by NFT firm Chubbiverse. The videos concluded with text that read "The Voice Powered By Voiceverse"; Voiceverse stated on Twitter that the voices in the animations had been generated using their own AI voice synthesis technology and presented the videos as a technology demonstration of their voice NFT capabilities. The announcement provoked immediate and widespread backlash from the gaming community. Baker's tweet received thousands of replies and quote retweets (the vast majority of which were negative), far more than the number of likes; Michael McWhertor of Polygon described it as a "textbook example of being ratioed" and commented that reactions had been amplified by the final part of Baker's announcement. Michael Beckwith of Metro called Baker's approach "bizarrely aggressive". Later that day, Baker responded to the backlash by apologizing for his choice of words. He said he appreciated people's thoughts and acknowledged that the "hate/create part might have been a bit antagonistic," calling it a "bad attempt to bring levity". Despite the apology, Baker and his fellow voice actors did not distance themselves from Voiceverse at this point. At the same time, Voiceverse attempted to address the criticisms, stating that they were working to move to more environmentally friendly blockchain technology and that voice actors would receive royalties from NFT sales, with actors benefiting from any increase in NFT value. == Plagiarism revelation == On December 13, 2021, amidst the increasingly negative reactions toward NFTs among the general public, the creator of 15.ai (known pseudonymously as 15) announced that they had "no interest in incorporating NFTs into any aspect of [their] work." On January 14, 2022, 11:17 a.m. EST (10 hours after Baker's initial announcement), 15 commented on the Voiceverse venture, stating that it "sounds like a scam". Two hours later, at 1:20 p.m., 15 explicitly accused Voiceverse of "actively attempting to appropriate [15's] work for [Voiceverse's] own benefit." 15 provided evidence through

    Read more →
  • Pseudonymization

    Pseudonymization

    Pseudonymization is a data management and de-identification procedure by which personally identifiable information fields within a data record are replaced by one or more artificial identifiers, or pseudonyms. A single pseudonym for each replaced field or collection of replaced fields makes the data record less identifiable while remaining suitable for data analysis and data processing. Pseudonymization (or pseudonymisation, the spelling under European guidelines) is one way to comply with the European Union's General Data Protection Regulation (GDPR) demands for secure data storage of personal information. Pseudonymized data can be restored to its original state with the addition of information which allows individuals to be re-identified. In contrast, anonymization is intended to prevent re-identification of individuals within the dataset. Clause 18, Module Four, footnote 2 of the Adoption by the European Commission of the Implementing Decisions (EU) 2021/914 "requires rendering the data anonymous in such a way that the individual is no longer identifiable by anyone ... and that this process is irreversible." == Impact of Schrems II ruling == The European Data Protection Supervisor (EDPS) on 9 December 2021 highlighted pseudonymization as the top technical supplementary measure for Schrems II compliance. Less than two weeks later, the EU Commission highlighted pseudonymization as an essential element of the equivalency decision for South Korea, which is the status that was lost by the United States under the Schrems II ruling by the Court of Justice of the European Union (CJEU). The importance of GDPR-compliant pseudonymization increased dramatically in June 2021 when the European Data Protection Board (EDPB) and the European Commission highlighted GDPR-compliant pseudonymization as the state-of-the-art technical supplementary measure for the ongoing lawful use of EU personal data when using third country (i.e., non-EU) cloud processors or remote service providers under the "Schrems II" ruling by the CJEU. Under the GDPR and final EDPB Schrems II Guidance, the term pseudonymization requires a new protected "state" of data, producing a protected outcome that: Protects direct, indirect, and quasi-identifiers, together with characteristics and behaviors; Protects at the record and data set level versus only the field level so that the protection travels wherever the data goes, including when it is in use; and Protects against unauthorized re-identification via the mosaic effect by generating high entropy (uncertainty) levels by dynamically assigning different tokens at different times for various purposes. The combination of these protections is necessary to prevent the re-identification of data subjects without the use of additional information kept separately, as required under GDPR Article 4(5) and as further underscored by paragraph 85(4) of the final EDPB Schrems II guidance: Article 4(5) "Definitions" of the GDPR defines pseudonymization as "the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person." "Use Case 2: Transfer of pseudonymised Data Paragraph 85(4)" of the final EDPB Schrems II Guidance requires that “the controller has established by means of a thorough analysis of the data in question – taking into account any information that the public authorities of the recipient country may be expected to possess and use – that the pseudonymised personal data cannot be attributed to an identified or identifiable natural person even if cross-referenced with such information." GDPR-compliant pseudonymization requires that data is "anonymous" in the strictest EU sense of the word – globally anonymous – but for the additional information held separately and made available under controlled conditions as authorized by the data controller for permitted re-identification of individual data subjects. Clause 18, Module Four, footnote 2 of the Adoption by the European Commission of the Implementing Decision (EU) 2021/914 "requires rendering the data anonymous in such a way that the individual is no longer identifiable by anyone, in line with recital 26 of Regulation (EU) 2016/679, and that this process is irreversible." Before the Schrems II ruling, pseudonymization was a technique used by security experts or government officials to hide personally identifiable information to maintain data structure and privacy of information. Some common examples of sensitive information include postal code, location of individuals, names of individuals, race and gender, etc. After the Schrems II ruling, GDPR-compliant pseudonymization must satisfy the above-noted elements as an "outcome" versus merely a technique. == Data fields == The choice of which data fields are to be pseudonymized is partly subjective. Less selective fields, such as birth date or postal code are often also included because they are usually available from other sources and therefore make a record easier to identify. Pseudonymizing these less identifying fields removes most of their analytic value and is therefore normally accompanied by the introduction of new derived and less identifying forms, such as year of birth or a larger postal code region. Data fields that are less identifying, such as date of attendance, are usually not pseudonymized. This is because too much statistical utility is lost in doing so, not because the data cannot be identified. For example, given prior knowledge of a few attendance dates it is easy to identify someone's data in a pseudonymized dataset by selecting only those people with that pattern of dates. This is an example of an inference attack. The weakness of pre-GDPR pseudonymized data to inference attacks is commonly overlooked. A famous example is the AOL search data scandal. The AOL example of unauthorized re-identification did not require access to separately kept "additional information" that was under the control of the data controller as is now required for GDPR-compliant pseudonymization, outlined below under the section "New Definition for Pseudonymization Under GDPR". Protecting statistically useful pseudonymized data from re-identification requires: a sound information security base controlling the risk that the analysts, researchers or other data workers cause a privacy breach The pseudonym allows tracking back of data to its origins, which distinguishes pseudonymization from anonymization, where all person-related data that could allow backtracking has been purged. Pseudonymization is an issue in, for example, patient-related data that has to be passed on securely between clinical centers. The application of pseudonymization to e-health intends to preserve the patient's privacy and data confidentiality. It allows primary use of medical records by authorized health care providers and privacy preserving secondary use by researchers. In the US, HIPAA provides guidelines on how health care data must be handled and data de-identification or pseudonymization is one way to simplify HIPAA compliance. However, plain pseudonymization for privacy preservation often reaches its limits when genetic data are involved (see also genetic privacy). Due to the identifying nature of genetic data, depersonalization is often not sufficient to hide the corresponding person. Potential solutions are the combination of pseudonymization with fragmentation and encryption. An example of application of pseudonymization procedure is creation of datasets for de-identification research by replacing identifying words with words from the same category (e.g. replacing a name with a random name from the names dictionary), however, in this case it is in general not possible to track data back to its origins. == New definition under GDPR == Effective as of May 25, 2018, the EU General Data Protection Regulation (GDPR) defines pseudonymization for the very first time at the EU level in Article 4(5). Under Article 4(5) definitional requirements, data is pseudonymized if it cannot be attributed to a specific data subject without the use of separately kept "additional information". Pseudonymized data embodies the state of the art in Data Protection by Design and by Default because it requires protection of both direct and indirect identifiers (not just direct). GDPR Data Protection by Design and by Default principles as embodied in pseudonymization require protection of both direct and indirect identifiers so that personal data is not cross-referenceable (or re-identifiable) via the "mosaic effect" without access to "additional information" that is kept separately by the controller. Because access to separately kept "additional information" is required

    Read more →
  • Traceability

    Traceability

    Traceability is the capability to trace something. In some cases, it is interpreted as the ability to verify the history, location, or application of an item by means of documented recorded identification. Other common definitions include the capability (and implementation) of keeping track of a given set or type of information to a given degree, or the ability to chronologically interrelate uniquely identifiable entities in a way that is verifiable. Traceability is applicable to measurement, supply chain, software development, healthcare and security. == Measurement == The term measurement traceability or metrological traceability is used to refer to an unbroken chain of comparisons relating an instrument's measurements to a known standard. Calibration to a traceable standard can be used to determine an instrument's bias, precision, and accuracy. It may also be used to show a chain of custody—from current interpretation of evidence to the actual evidence in a legal context, or history of handling of any information. In many countries, national standards for weights and measures are maintained by a National Metrological Institute (NMI) which provides the highest level of standards for the calibration / measurement traceability infrastructure in that country. Examples of government agencies include the National Physical Laboratory, UK (NPL) the National Institute of Standards and Technology (NIST) in the USA, the Physikalisch-Technische Bundesanstalt (PTB) in Germany, the Instituto Nazionale di Ricerca Metrologica (INRiM) in Italy, and the National Research Council of Canada (NRC). As defined by NIST, "Traceability of measurement requires the establishment of an unbroken chain of comparisons to stated references each with a stated uncertainty." A clock providing traceable time is traceable to a time standard such as Coordinated Universal Time or International Atomic Time. The Global Positioning System is a source of traceable time. === Food processing === In food processing (meat processing, fresh produce processing), the term traceability refers to the recording through means of barcodes or RFID tags and other tracking media, all movement of product and steps within the production process. One of the key reasons this is such a critical point is in instances where an issue of contamination arises, and a recall is required. Where traceability has been closely adhered to, it is possible to identify, by precise date/time and exact location which goods must be recalled, and which are safe, potentially saving millions of dollars in the recall process. Traceability within the food processing industry is also utilised to identify key high production and quality areas of a business, versus those of low return, and where points in the production process may be improved. In food processing software, traceability systems imply the use of a unique piece of data (e.g., order date/time or a serialized sequence number, generally through the use of a barcode / RFID) which can be traced through the entire production flow, linking all sections of the business, including suppliers and future sales through the supply chain. Messages and files at any point in the system can then be audited for correctness and completeness, using the traceability software to find the particular transaction and/or product within the supply chain. In food systems, ISO 22005, as part of the ISO 22000 family of standards, has been developed to define the principles for food traceability and specifies the basic requirements for the design and implementation of a feed and food traceability system. It can be applied by an organization operating at any step in the feed and food chain. The European Union's General Food Law came into force in 2002, making traceability compulsory for food and feed operators and requiring those businesses to implement traceability systems. The EU introduced its Trade Control and Expert System, or TRACES, in April 2004. The system provides a central database to track movement of animals within the EU and from third countries. Australia has its National Livestock Identification System to keep track of livestock from birth to slaughterhouse. India has started taking initiatives for setting up traceability systems at Government and Corporate levels. Grapenet, an initiative by Agriculture and Processed Food Products Export Development Authority (APEDA), Ministry of Commerce, Government of India is an example in this direction. GrapeNet is an internet based traceability software system for monitoring fresh grapes exported from India to the European Union. GrapeNet is a first of its kind initiative in India that has put in place an end-to-end system for monitoring pesticide residue, achieve product standardization and facilitate tracing back from pallets to the farm of the Indian grower, through the various stages of sampling, testing, certification and packing. Grapenet won the National Award (Gold), in the winners announced for the best e-Governance initiatives undertaken in India in 2007. The Directorate Generate Foreign Trade (DGFT), Government of India, through its notification dated 04.02.2009 relating to Amendment in Foreign Trade Policy (RE2008)has mandated that Export to the European Union is permitted subject to registration with APEDA, thereby making Grapenet mandatory for all exports of fresh grapes from India to Europe. Uruguay has also designed a system called "Traceability & Electronic Information System of the Beef Industry". Traceability in food supply can also refer to practices employed by individual companies, including Ritual and Amway's Nutrilite. In the case of Nutrilite's supplements, ingredients are documented and tested throughout farming, processing, and manufacturing to ensure traceability at each stage of production. == Systems and software development == In systems and software development, the term traceability (or requirements traceability) refers to the ability to link product requirements back to stakeholders' rationales and forward to corresponding design artifacts, code, and test cases. Traceability supports numerous software engineering activities such as change impact analysis, compliance verification or traceback of code, regression test selection, and requirements validation. It is usually accomplished in the form of a matrix created for the verification and validation of the project. Unfortunately, the practice of constructing and maintaining a requirements trace matrix (RTM) can be very arduous and over time the traces tend to erode into an inaccurate state unless date/time stamped. Alternate automated approaches for generating traces using information retrieval methods have been developed. The IEEE defines traceability as "(1)The degree to which a relationship can be established between two or more products of the development process, especially products having a predecessor, successor or master-subordinate relationship to one another. For example, the degree to which the requirements and design of a given software component match. See also: consistency. " and "(2) The degree to which each element in a software development product establishes its reason for existing; for example, the degree to which each element in a bubble chart references the requirement that it satisfies." In transaction processing software, traceability implies use of a unique piece of data (e.g., order date/time or a serialized sequence number) which can be traced through the entire software flow of all relevant application programs. Messages and files at any point in the system can then be audited for correctness and completeness, using the traceability key to find the particular transaction. This is also sometimes referred to as the transaction footprint. == Health care == Patient safety during healthcare service plays an important role in preventing delayed recovery or even mortality, by increasing and improving the quality of life of citizens, and is considered an indicator of the quality status of health services Maintaining patient safety is a complex task and involves factors inherent to the environment and human actions. New technologies facilitate the traceability tools of patients and medications. This is particularly relevant for drugs that are considered high risk and cost. Recent research in the healthcare industry emphasizes the significant impact of Blockchain Technology (BCT) on improving the performance of healthcare supply chain management. It highlights BCT's role in enhancing transparency, data immutability, and efficient management, leading to better cooperation among stakeholders and effective risk mitigation in healthcare services. The World Health Organization has recognized the importance of traceability for medical products of human origin (MPHO) and urged member states "to encourage the implementation of globally consistent coding systems to facilitate national and international traceability". == Security and cri

    Read more →
  • E-Science librarianship

    E-Science librarianship

    E-Science librarianship refers to a role for librarians in e-Science. == Early scholars == Early references to e-Science and librarianship involve information studies scholars researching cyberinfrastructure and emerging networked information and knowledge communities. Notably Christine Borgman, Professor and Presidential Chair in Information Studies at the University of California, Los Angeles (UCLA) was a key player in bringing e-Science, and the idea of networked knowledge communities, to the attention of the library profession. In 2004, as a visiting fellow at the Oxford Internet Institute, she conducted research and lectured publicly on e-Science, Digital Libraries, and Knowledge Communities. In 2007 Anna K. Gold, formerly of MIT and Cal Poly, San Luis Obispo, authored a series of articles in D-Lib Magazine that opened the door for academic libraries to begin exploring roles, skills, and strategies for engaging in e-Science: Cyberinfrastructure, Data, and Libraries, Part 1: A Cyberinfrastructure Primer for Librarians and Cyberinfrastructure, Data, and Libraries, Part 2: Libraries and the Data Challenge: Roles and Actions for Libraries. == Academic research and health sciences libraries == In 2007, the Association of Research Libraries (ARL) e-Science task force issued its report on e-Science and librarianship. The ARL's report encouraged its member libraries to position themselves to engage with researchers involved in e-Science (eScience) by cultivating new research support strategies and developing their digital scholarship infrastructure. E-Science has multiple attributes; Tony and Jessie Hey framed e-Science for the library community by characterizing it as a research methodology: "e-Science is not a new scientific discipline in its own right: e-Science is shorthand for the set of tools and technologies required to support collaborative, networked science". In addition to academic libraries' interests in providing support for their researchers engaging in e-Science, the health sciences library community also emerged as a major proponent for creating librarian positions for supporting the information needs of large-scale, networked, research collaborations on their campuses. Neil Rambo, current director of NYU's Health Sciences Library and former director of University of Washington Health Sciences Library, was the first to use the term in the Journal of the Medical Library Association, in his 2009 editorial e-Science and the Biomedical Library. Rambo's definition of e-Science highlighted the potential e-Science held for creating data as a research product: "E-science is a new research methodology, fueled by networked capabilities and the practical possibility of gathering and storing vast amounts of data." In response to this article the University of Massachusetts Medical School Lamar Soutter Library and National Network of Libraries of Medicine, New England Region encouraged health sciences libraries to cooperate to identify skills and develop a program for training e-Science Librarians. Then, in 2013, Shannon Bohle, an archivist who was employed in the library at Cold Spring Harbor Laboratory, an NCI-designated basic cancer research facility, used experience gained there and previous papers and presentations about preserving scientific archival materials to expand the traditional definition of e-Science by including the terms, principles, and practices used in archival science. These included in the definition the "long-term storage and accessibility of all materials generated through the scientific process," as well as examples of material types traditionally preserved in archives, like "electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, pre-prints," as well as library materials ("print and/or electronic publications"). == Roles == Many areas of science are about to be transformed by the availability of vast amounts of new scientific data that can potentially provide insights at a level of detail never before envisaged. However, this new data dominant era brings new challenges for the scientists and they will need the skills and technologies both of computer scientists and of the library community to manage, search and curate these new data resources. Libraries will not be immune from change in this new world of research. Karen Williams identifies roles in the following areas for librarians in the developing world of e-Science. Campus Engagement Content/Collection Development and Management Teaching and Learning Scholarly Communication E-Scholarship and Digital Tools Reference/Help Services Outreach Fund Raising Exhibit and Event Planning Leadership == Challenges for research libraries == E-science tends toward inter- and multidisciplinary approaches that depend on computation and computer science. Research libraries have traditionally been discipline focused and, although increasingly technologically sophisticated, do not have systems of the scale or complexity of the e-science environment. E-science is data intensive, but research libraries have not typically been responsible for scientific data. E-science is frequently conducted in a team context, often distributed across multiple institutions and on a global scale. The primary constituency of libraries generally comprises those affiliated with the local institution. Licenses for electronic content are typically restricted to a particular institutional community, and the infrastructure to move institutional licenses into a multi-institutional environment is not well developed. E-science challenges all these traditional paradigms of research library organization and services. == Skills == Garritano & Carlson were among the first to outline a skill set for librarians seeking to support the data needs of e-Science; they identified five skill categories librarians new to this area should expect to adapt or develop when participating on such projects: Library and information science expertise Subject expertise Partnerships and outreach (both internal and external) Participating in sponsored research Balancing workload An example of librarians reconfiguring traditional librarian skills to meet the needs of researchers engaging in e-Science is Witt & Carlson's adaptation of the traditional reference interview into a "data interview" in order to provide effective data management and e-Science services. This interview consists of ten practical queries necessary for understanding the provenance and expectations for the preservation of datasets typical of e-Science that also help illustrate some of the educational tools and skills needed by a librarian new to e-Science. "What is the story of the data? What form and format are the data in? What is the expected lifespan of the dataset? How could the data be used, reused, and repurposed? How large is the dataset, and what is its rate of growth? Who are the potential audiences for the data? Who owns the data? Does the dataset include any sensitive information? What publications or discoveries have resulted from the data? How should the data be made accessible?" == Resources == In 2009 the Lamar Soutter Library at the University of Massachusetts Medical School (UMMS) and the National Network of Libraries of Medicine, New England Region (NN/LM NER) funded an e-Science program for building the skills highlighted above for librarians. Elaine Russo Martin, Director of Library Services at the Lamar Soutter Library and Director of the NN/LM NER developed this comprehensive e-Science program to build librarians' subject expertise in the sciences, developing their data management skills, and their familiarity with cyberinfrastructure and e-Science. Three major products of this program are the e-Science web portal for librarians, the E-Science Symposium, and the New England Collaborative Data Management Curriculum (NECDMC). This portal includes educational resources for specific tools and subject/discipline tutorials and modules to assist librarians new to e-Science. UMMS and NN/LM NER also publish an open access journal called the Journal of eScience Librarianship.

    Read more →
  • Token-based replay

    Token-based replay

    Token-based replay technique is a conformance checking algorithm that checks how well a process conforms with its model by replaying each trace on the model (in Petri net notation ). Using the four counters produced tokens, consumed tokens, missing tokens, and remaining tokens, it records the situations where a transition is forced to fire and the remaining tokens after the replay ends. Based on the count at each counter, we can compute the fitness value between the trace and the model. == The algorithm == Source: The token-replay technique uses four counters to keep track of a trace during the replaying: p: Produced tokens c: Consumed tokens m: Missing tokens (consumed while not there) r: Remaining tokens (produced but not consumed) Invariants: At any time: p + m ≥ c ≥ m {\displaystyle p+m\geq c\geq m} At the end: r = p + m − c {\displaystyle r=p+m-c} At the beginning, a token is produced for the source place (p = 1) and at the end, a token is consumed from the sink place (c' = c + 1). When the replay ends, the fitness value can be computed as follows: 1 2 ( 1 − m c ) + 1 2 ( 1 − r p ) {\displaystyle {\frac {1}{2}}(1-{\frac {m}{c}})+{\frac {1}{2}}(1-{\frac {r}{p}})} == Example == Suppose there is a process model in Petri net notation as follows: === Example 1: Replay the trace (a, b, c, d) on the model M === Step 1: A token is initiated. There is one produced token ( p = 1 {\displaystyle p=1} ). Step 2: The activity a {\displaystyle \mathbf {a} } consumes 1 token to be fired and produces 2 tokens ( p = 1 + 2 = 3 {\displaystyle p=1+2=3} and c = 1 {\displaystyle c=1} ). Step 3: The activity b {\displaystyle \mathbf {b} } consumes 1 token and produces 1 token ( p = 3 + 1 = 4 {\displaystyle p=3+1=4} and c = 1 + 1 = 2 {\displaystyle c=1+1=2} ). Step 4: The activity c {\displaystyle \mathbf {c} } consumes 1 token and produces 1 token ( p = 4 + 1 = 5 {\displaystyle p=4+1=5} and c = 2 + 1 = 3 {\displaystyle c=2+1=3} ). Step 5: The activity d {\displaystyle \mathbf {d} } consumes 2 tokens and produces 1 token ( p = 5 + 1 = 6 {\displaystyle p=5+1=6} , c = 3 + 2 = 5 {\displaystyle c=3+2=5} ). Step 6: The token at the end place is consumed ( c = 5 + 1 = 6 {\displaystyle c=5+1=6} ). The trace is complete. The fitness of the trace ( a , b , c , d {\displaystyle \mathbf {a,b,c,d} } ) on the model M {\displaystyle \mathbf {M} } is: 1 2 ( 1 − m c ) + 1 2 ( 1 − r p ) = 1 2 ( 1 − 0 6 ) + 1 2 ( 1 − 0 6 ) = 1 {\displaystyle {\frac {1}{2}}(1-{\frac {m}{c}})+{\frac {1}{2}}(1-{\frac {r}{p}})={\frac {1}{2}}(1-{\frac {0}{6}})+{\frac {1}{2}}(1-{\frac {0}{6}})=1} === Example 2: Replay the trace (a, b, d) on the model M === Step 1: A token is initiated. There is one produced token ( p = 1 {\displaystyle p=1} ). Step 2: The activity a {\displaystyle \mathbf {a} } consumes 1 token to be fired and produces 2 tokens ( p = 1 + 2 = 3 {\displaystyle p=1+2=3} and c = 1 {\displaystyle c=1} ). Step 3: The activity b {\displaystyle \mathbf {b} } consumes 1 token and produces 1 token ( p = 3 + 1 = 4 {\displaystyle p=3+1=4} and c = 1 + 1 = 2 {\displaystyle c=1+1=2} ). Step 4: The activity d {\displaystyle \mathbf {d} } needs to be fired but there are not enough tokens. One artificial token was produced and the missing token counter is increased by one ( m = 1 {\displaystyle m=1} ). The artificial token and the token at place [ b , d ] {\displaystyle [\mathbf {b,d} ]} are consumed ( c = 2 + 2 = 4 {\displaystyle c=2+2=4} ) and one token is produced at place end ( p = 4 + 1 = 5 {\displaystyle p=4+1=5} ). Step 5: The token in the end place is consumed ( c = 4 + 1 = 5 {\displaystyle c=4+1=5} ). The trace is complete. There is one remaining token at place [ a , c ] {\displaystyle [\mathbf {a,c} ]} ( r = 1 {\displaystyle r=1} ). The fitness of the trace ( a , b , d {\displaystyle \mathbf {a,b,d} } ) on the model M {\displaystyle \mathbf {M} } is: 1 2 ( 1 − m c ) + 1 2 ( 1 − r p ) = 1 2 ( 1 − 1 5 ) + 1 2 ( 1 − 1 5 ) = 0.8 {\displaystyle {\frac {1}{2}}(1-{\frac {m}{c}})+{\frac {1}{2}}(1-{\frac {r}{p}})={\frac {1}{2}}(1-{\frac {1}{5}})+{\frac {1}{2}}(1-{\frac {1}{5}})=0.8}

    Read more →
  • Block swap algorithms

    Block swap algorithms

    In computer algorithms, block swap algorithms swap two regions of elements of an array. It is simple to swap two non-overlapping regions of an array of equal size. However, it is not as simple to swap two contiguous regions of an array of unequal sizes (algorithms that perform such swapping are called rotation algorithms). A few well-known algorithms can accomplish this: Bentley's juggling (also known as the dolphin algorithm), Gries-Mills rotation, triple reversal algorithm, conjoined triple reversal algorithm (also known as the trinity rotation) and Successive rotation. == Triple reversal algorithm == The triple reversal algorithm is the simplest to explain, using rotations. A rotation is an in-place reversal of array elements. This method swaps two elements of an array from outside in within a range. The rotation works for an even or odd number of array elements. The reversal algorithm uses three in-place rotations to accomplish an in-place block swap: Rotate region A Rotate region B Rotate region AB Where A and B are adjacent regions of an array that together form the region AB. Gries-Mills and reversal algorithms perform better than Bentley's juggling, because of their cache-friendly memory access pattern behavior. The triple reversal algorithm parallelizes well, because rotations can be split into sub-regions, which can be rotated independently of others.

    Read more →
  • Sanad (government app)

    Sanad (government app)

    Sanad (Arabic: سند) is the official digital identity and e-government services application of the Hashemite Kingdom of Jordan. Developed and managed by the Ministry of Digital Economy and Entrepreneurship, the app provides a unified platform for accessing a range of public services and personal records digitally. == Overview == Launched in February 2020, Sanad is part of Jordan's broader digital transformation strategy aimed at improving public service delivery and enhancing administrative efficiency. The app allows users to authenticate their identity digitally and access over 550 services from more than 50 government and private sector entities. == Features == Sanad provides a wide array of services, including: Viewing and managing official digital documents Applying for government services (e.g., jordanian passport issuance or renewal, health insurance) Accessing personal records (e.g., pension, property ownership) Digitally signing documents Paying utility bills and traffic fines Receiving and tracking official notifications The app is available on iOS, Android, and HarmonyOS platforms and supports both Arabic and English languages. == Digital Identity == A core feature of Sanad is the digital identity system, which enables secure login and authentication for all integrated services. Users must activate their digital identity at designated Sanad stations across Jordan to access the full suite of services. == Adoption and Impact == As of 2025, more than 1.6 million Jordanians have activated their digital identities through Sanad. The app has played a significant role in streamlining government interactions and reducing the need for in-person visits, especially during the COVID-19 pandemic. == Recent Developments == In 2025, the Ministry launched an updated version of the app with enhanced user experience and new services, including the e-passport issuance feature.

    Read more →
  • Pseudonymization

    Pseudonymization

    Pseudonymization is a data management and de-identification procedure by which personally identifiable information fields within a data record are replaced by one or more artificial identifiers, or pseudonyms. A single pseudonym for each replaced field or collection of replaced fields makes the data record less identifiable while remaining suitable for data analysis and data processing. Pseudonymization (or pseudonymisation, the spelling under European guidelines) is one way to comply with the European Union's General Data Protection Regulation (GDPR) demands for secure data storage of personal information. Pseudonymized data can be restored to its original state with the addition of information which allows individuals to be re-identified. In contrast, anonymization is intended to prevent re-identification of individuals within the dataset. Clause 18, Module Four, footnote 2 of the Adoption by the European Commission of the Implementing Decisions (EU) 2021/914 "requires rendering the data anonymous in such a way that the individual is no longer identifiable by anyone ... and that this process is irreversible." == Impact of Schrems II ruling == The European Data Protection Supervisor (EDPS) on 9 December 2021 highlighted pseudonymization as the top technical supplementary measure for Schrems II compliance. Less than two weeks later, the EU Commission highlighted pseudonymization as an essential element of the equivalency decision for South Korea, which is the status that was lost by the United States under the Schrems II ruling by the Court of Justice of the European Union (CJEU). The importance of GDPR-compliant pseudonymization increased dramatically in June 2021 when the European Data Protection Board (EDPB) and the European Commission highlighted GDPR-compliant pseudonymization as the state-of-the-art technical supplementary measure for the ongoing lawful use of EU personal data when using third country (i.e., non-EU) cloud processors or remote service providers under the "Schrems II" ruling by the CJEU. Under the GDPR and final EDPB Schrems II Guidance, the term pseudonymization requires a new protected "state" of data, producing a protected outcome that: Protects direct, indirect, and quasi-identifiers, together with characteristics and behaviors; Protects at the record and data set level versus only the field level so that the protection travels wherever the data goes, including when it is in use; and Protects against unauthorized re-identification via the mosaic effect by generating high entropy (uncertainty) levels by dynamically assigning different tokens at different times for various purposes. The combination of these protections is necessary to prevent the re-identification of data subjects without the use of additional information kept separately, as required under GDPR Article 4(5) and as further underscored by paragraph 85(4) of the final EDPB Schrems II guidance: Article 4(5) "Definitions" of the GDPR defines pseudonymization as "the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person." "Use Case 2: Transfer of pseudonymised Data Paragraph 85(4)" of the final EDPB Schrems II Guidance requires that “the controller has established by means of a thorough analysis of the data in question – taking into account any information that the public authorities of the recipient country may be expected to possess and use – that the pseudonymised personal data cannot be attributed to an identified or identifiable natural person even if cross-referenced with such information." GDPR-compliant pseudonymization requires that data is "anonymous" in the strictest EU sense of the word – globally anonymous – but for the additional information held separately and made available under controlled conditions as authorized by the data controller for permitted re-identification of individual data subjects. Clause 18, Module Four, footnote 2 of the Adoption by the European Commission of the Implementing Decision (EU) 2021/914 "requires rendering the data anonymous in such a way that the individual is no longer identifiable by anyone, in line with recital 26 of Regulation (EU) 2016/679, and that this process is irreversible." Before the Schrems II ruling, pseudonymization was a technique used by security experts or government officials to hide personally identifiable information to maintain data structure and privacy of information. Some common examples of sensitive information include postal code, location of individuals, names of individuals, race and gender, etc. After the Schrems II ruling, GDPR-compliant pseudonymization must satisfy the above-noted elements as an "outcome" versus merely a technique. == Data fields == The choice of which data fields are to be pseudonymized is partly subjective. Less selective fields, such as birth date or postal code are often also included because they are usually available from other sources and therefore make a record easier to identify. Pseudonymizing these less identifying fields removes most of their analytic value and is therefore normally accompanied by the introduction of new derived and less identifying forms, such as year of birth or a larger postal code region. Data fields that are less identifying, such as date of attendance, are usually not pseudonymized. This is because too much statistical utility is lost in doing so, not because the data cannot be identified. For example, given prior knowledge of a few attendance dates it is easy to identify someone's data in a pseudonymized dataset by selecting only those people with that pattern of dates. This is an example of an inference attack. The weakness of pre-GDPR pseudonymized data to inference attacks is commonly overlooked. A famous example is the AOL search data scandal. The AOL example of unauthorized re-identification did not require access to separately kept "additional information" that was under the control of the data controller as is now required for GDPR-compliant pseudonymization, outlined below under the section "New Definition for Pseudonymization Under GDPR". Protecting statistically useful pseudonymized data from re-identification requires: a sound information security base controlling the risk that the analysts, researchers or other data workers cause a privacy breach The pseudonym allows tracking back of data to its origins, which distinguishes pseudonymization from anonymization, where all person-related data that could allow backtracking has been purged. Pseudonymization is an issue in, for example, patient-related data that has to be passed on securely between clinical centers. The application of pseudonymization to e-health intends to preserve the patient's privacy and data confidentiality. It allows primary use of medical records by authorized health care providers and privacy preserving secondary use by researchers. In the US, HIPAA provides guidelines on how health care data must be handled and data de-identification or pseudonymization is one way to simplify HIPAA compliance. However, plain pseudonymization for privacy preservation often reaches its limits when genetic data are involved (see also genetic privacy). Due to the identifying nature of genetic data, depersonalization is often not sufficient to hide the corresponding person. Potential solutions are the combination of pseudonymization with fragmentation and encryption. An example of application of pseudonymization procedure is creation of datasets for de-identification research by replacing identifying words with words from the same category (e.g. replacing a name with a random name from the names dictionary), however, in this case it is in general not possible to track data back to its origins. == New definition under GDPR == Effective as of May 25, 2018, the EU General Data Protection Regulation (GDPR) defines pseudonymization for the very first time at the EU level in Article 4(5). Under Article 4(5) definitional requirements, data is pseudonymized if it cannot be attributed to a specific data subject without the use of separately kept "additional information". Pseudonymized data embodies the state of the art in Data Protection by Design and by Default because it requires protection of both direct and indirect identifiers (not just direct). GDPR Data Protection by Design and by Default principles as embodied in pseudonymization require protection of both direct and indirect identifiers so that personal data is not cross-referenceable (or re-identifiable) via the "mosaic effect" without access to "additional information" that is kept separately by the controller. Because access to separately kept "additional information" is required

    Read more →
  • EdgeRank

    EdgeRank

    EdgeRank is the name commonly given to the algorithm that Facebook uses to determine what articles should be displayed in a user's News Feed. As of 2011, Facebook has stopped using the EdgeRank system and uses a machine learning algorithm that, as of 2013, takes more than 100,000 factors into account. EdgeRank was developed and implemented by Serkan Piantino. == Formula and factors == In 2010, a simplified version of the EdgeRank algorithm was presented as: ∑ e d g e s e u e w e d e {\displaystyle \sum _{\mathrm {edges\,} e}u_{e}w_{e}d_{e}} where: u e {\displaystyle u_{e}} is user affinity. w e {\displaystyle w_{e}} is how the content is weighted. d e {\displaystyle d_{e}} is a time-based decay parameter. User Affinity: The User Affinity part of the algorithm in Facebook's EdgeRank looks at the relationship and proximity of the user and the content (post/status update). Content Weight: What action was taken by the user on the content. Time-Based Decay Parameter: New or old. Newer posts tend to hold a higher place than older posts. Some of the methods that Facebook uses to adjust the parameters are proprietary and not available to the public. A study has shown that it is possible to hypothesize a disadvantage of the "like" reaction and advantages of other interactions (e.g., the "haha" reaction or "comments") in content algorithmic ranking on Facebook. The "like" button can decrease the organic reach as a "brake effect of viral reach". The "haha" reaction, "comments" and the "love" reaction could achieve the highest increase in total organic reach. == Impact == EdgeRank and its successors have a broad impact on what users actually see out of what they ostensibly follow: for instance, the selection can produce a filter bubble (if users are exposed to updates which confirm their opinions etc.) or alter people's mood (if users are shown a disproportionate amount of positive or negative updates). As a result, for Facebook pages, the typical engagement rate is less than 1% (or less than 0.1% for the bigger ones), and organic reach 10% or less for most non-profits. As a consequence, for pages, it may be nearly impossible to reach any significant audience without paying to promote their content.

    Read more →
  • Small data

    Small data

    Small data is data that is 'small' enough for human comprehension. It is data in a volume and format that makes it accessible, informative and actionable. The term "big data" is about machines and "small data" is about people. This is to say that eyewitness observations or five pieces of related data could be small data. Small data is what we used to think of as data. The only way to comprehend Big data is to reduce the data into small, visually-appealing objects representing various aspects of large data sets (such as histogram, charts, and scatter plots). Big Data is all about finding correlations, but Small Data is all about finding the causation, the reason why. A formal definition of small data has been proposed by Allen Bonde, former vice-president of Innovation at Actuate - now part of OpenText: "Small data connects people with timely, meaningful insights (derived from big data and/or “local” sources), organized and packaged – often visually – to be accessible, understandable, and actionable for everyday tasks." Another definition of small data is: The small set of specific attributes produced by the Internet of Things. These are typically a small set of sensor data such as temperature, wind speed, vibration and status. It was estimated (2016) that “If one takes the top 100 biggest innovations of our time, perhaps around 60% to 65% percent are really based on Small Data.” as Martin Lindstrom puts it. Small data includes everything from Snapchat to simple objects such as the post-it note. Lindstrom believes we become so focused on Big-Data that we tend to forget about more basic concepts and creativity. Lindstrom defines Small Data "as seemingly insignificant observations you identify in consumers’ homes, is everything from how you place your shoes on how you hang your paintings". He thus considers that one should perfectly master the basic (Small Data) in order to mine and find correlations. == Academic Recognition and Methodology == The growing significance of "small data" as a distinct field of inquiry was highlighted by the 2024 Thematic Einstein Semester (TES) on Small Data Analysis, hosted by the Berlin Mathematics Research Center MATH+. A central focus of this semester was the transition from theoretical analysis to practical decision-making. Because small data sets are primarily used to drive specific actions, the presentation of results becomes an essential methodological step. The semester’s findings emphasized that while small data may lack volume, it often contains a high density of "many possible interpretations." Consequently, the final conference of the TES was structured around the pillars of interpretation, explanation, and knowledge gain. Participants sought to develop new mathematical and methodical representations that could accurately depict this wealth of interpretative possibilities. This work underscores that analyzing small data is not purely a computational task; it requires a robust interface between mathematics and diverse disciplines to ensure that insights are both contextually grounded and scientifically rigorous. == Uses in business == === Marketing === Bonde has written about the topic for Forbes, Direct Marketing News, CMO.com and other publications. According to Martin Lindstrom, in his book, Small Data: "{In customer research, small data is} Seemingly insignificant behavioural observations containing very specific attributes pointing towards an unmet customer need. Small data is the foundation for breakthrough ideas or completely new ways to turnaround brands." His approach is based on the combination of the observation of small samples with intuition. Marketers can obtain market insights from gathering Small Data by engaging with and observing people in their own environments. In comparison to Big Data, Small Data has the power to trigger emotions and to provide insights into the reasons behind the behaviours of customers. It may uncover detailed information on a person's extroversion or introversion, self-confidence, whether one is having problems in his/her relationship, etc. According to Lindstrom, relationships among people and customer segments are organized around four criteria: Climate: It reveals for example how a person's environment affects their diet. Rulership: The power or government in charge Religion: The prevalence of religion in a country, depending on its influence, indicates whether a person's decision making process is impacted by their belief system. Tradition: Cultural norms influence people's behaviors and interactions. Many companies underestimate the power of Small Data, using samples of millions of consumers instead of recognizing the value of closely observing small samples in their market research. In his book, Lindstrom defines "7Cs", which companies should consider in the attempt to derive meaningful customer insights and market trends through small data from their customers: Collecting: Understanding the manner in which observations are translated inside a home. Clues: Uncovering other distinctive emotional reflections that can be observed. Connecting: Identifying the consequences of emotional behaviour. Causation: Understanding what emotions are being evoked. Correlation: Identifying the initial date of appearance of the behaviour or emotion. Compensation: Identifying the unmet or unfulfilled desire. Concept: Defining the “big idea” compensation for the identified consumer need. Some of Lindstrom's clients such as Lowes Foods looked at data in a different way and actually chose to live with the customer. “As you enter their store, they have now created an amazing community where every staff member acts in a character mood, based on Small Data”. The supermarket made everything it can to make the customer feel at home. All the behaviours of employees are inspired by customer feedbacks gathered from interviews directly done at customer’s home. === Healthcare === Researchers at Cornell University started developing applications to monitor health problems in patients, based on small data. This is an initiative of Cornell's Small Data Lab, in close cooperation with Weill Cornell Medicine College, led by Deborah Estrin. The Small Data Lab developed a series of apps, focusing not only on gathering data from patients' pain but also tracking habits in areas such as grocery shopping. In the case of patients with rheumatoid arthritis for example, which has flares and remissions that do not follow a particular cycle, the app gathers information passively, thus allowing to forecast when a flare might be coming up based on small changes in behaviour. Other apps developed also include monitoring online grocery shopping, to use this information from every user to adapt their groceries to the recommendations of nutritionists, or monitoring email language to identify patterns that might indicate "fluctuations in cognitive performance, fatigue, side effects of medication or poor sleep, and other conditions and treatments that are typically self-reported and self-medicated". === Postal Service === The United States Postal Service (USPS) used optical character recognition (OCR) to automatically read and process 98% of all hand-addressed mail and 99.5% of machine-printed mail. By combining this technology with its small data sample of US zip codes, the USPS can now process more than 36,000 pieces of mail per hour. === Aerospace === In 2015, Boeing established the analytics lab for aerospace data in cooperation with the Carnegie Mellon University to leverage the university's leadership in machine learning, language technologies and data analytics. One of the initiatives projects aims to by standardize maintenance logs using AI to dramatically reduce costs. Currently, there is no standardized procedure to document maintenance logs leading to small but highly unstructured data sets. As a result, it becomes highly difficult for maintenance workers to translate these variations in maintenance logs within a short period of time. However, with AI and a narrow data set of common aircraft maintenance terminology, it becomes possible to dynamically translate these logs in real time. By using AI to enhance the speed and accuracy of the airline maintenance workflow, airlines stand to save billions according to the Harvard Business Review.

    Read more →
  • Webull

    Webull

    Webull Corporation, often stylized as simply Webull, is a U.S.-based financial services holding company headquartered in St. Petersburg, Florida. It owns and operates the Webull electronic trading platform for self-directed retail investors. Depending on jurisdiction, the Webull platform offers trading in stocks, exchange-traded funds (ETFs), options, margin, bonds, cryptocurrency and futures, as well as market-data tools. Webull began operations in 2016 under Hunan Fumi Information Technology, a China-based financial technology company founded by Wang Anquan. It launched U.S. brokerage services through Webull Financial LLC in 2018 and expanded during the retail-trading boom of 2020 and 2021. In April 2025, Webull became a publicly traded company on the Nasdaq through a merger with special-purpose acquisition company SK Growth Opportunities Corporation. The company's U.S. brokerage revenue relies substantially on payment for order flow, with options trading accounting for the larger share of its order-flow rebates in 2025. Webull has faced regulatory actions related to options customer approvals, complaint handling, suspicious activity reporting, social-media marketing and customer disclosures. It has also faced scrutiny from U.S. lawmakers and state officials over its historical and operational ties to China and the handling of U.S. customer data. == History == === Founding === Webull was founded in 2016 under Hunan Fumi Information Technology, a China-based financial technology company, by Wang Anquan, a former employee of Alibaba Group and Xiaomi. Hunan Fumi Information Technology received backing from Xiaomi, Shunwei Capital, and other investors in China. Fumi Technology was a Hunan-based fintech start-up incubated by Xiaomi and raised about CNY200 million (approximately US$30 million) in a Series B financing round in 2018. On May 24, 2017, Webull Financial LLC was established as a Delaware limited liability company. It began offering brokerage services in the United States in May 2018. Wang hired Anthony Denier as CEO of the U.S. brokerage that year and the two mapped out their strategy on napkins at a Mexican restaurant in New York City. Webull Corporation was incorporated in the Cayman Islands in September 2019 as the group's holding company. === Retail trading boom === In May 2020, the company received SEC approval to launch a robo-advisor on its platform. By August 2020, the platform had over 11 million registered users, and in October 2020, it had 750,000 daily active users. Webull introduced options trading in 2020 and later added cryptocurrency trading through a separate digital-asset business. In November 2020, Webull began supporting cryptocurrency transactions. In December 2020, Webull launched trading services in Hong Kong. During the GameStop short squeeze in January 2021, Webull gained attention as some retail traders looked for alternatives to Robinhood. On January 27, 2021, Webull recorded its highest-ever number of active daily users, at 952,000, and the Webull app was downloaded across the Apple App and Google Play stores an estimated 100,000 times. That week, approximately 1.2 million people downloaded the Webull mobile app, which the company reported as a 1,548% week-over-week increase. On January 28, 2021, Webull was directed by its clearing house to temporarily halt buy orders for stocks affected by the GameStop short squeeze. In June 2021, Webull was reported to be considering a U.S. initial public offering that could raise up to $400 million. === Restructuring and expansion === Webull restructured its China-related corporate arrangements in 2022 and later stated that Hunan Fumi was no longer affiliated with the group. In 2022 and 2023, Webull expanded in several non-U.S. markets, including Singapore, Australia, South Africa, Japan, the United Kingdom and Indonesia. In June 2023, Webull moved cryptocurrency trading to a separate app called Webull Pay. By the end of 2023, Webull had 4.3 million funded accounts and US$8.2 billion in customer assets. In January 2024, Anthony Denier was promoted to group president of Webull Corporation. In November 2024, Webull launched overnight, or extended-hours, trading, expanding the trading window of U.S. stocks for users inside and outside the United States. === SPAC merger and Nasdaq listing === On February 28, 2024, Webull agreed to go public through a business combination with SK Growth Opportunities Corporation (NASDAQ: SKGR), a special-purpose acquisition company, in a deal that valued the company at approximately US$7.3 billion. The proposed valuation drew scrutiny because of Webull's limited financial disclosure at announcement, reliance on payment for order flow and small expected public float. SK Growth shareholders approved the business combination on March 30, 2025, and the transaction closed on April 10, 2025. Webull's Class A ordinary shares and warrants began trading on the Nasdaq on April 11, 2025 under the ticker symbols BULL and BULLW (incentive warrants traded under BULLZ until their redemption in June 2025). The merger brought Webull to the public market but generated little cash for the company: after shareholder redemptions, Webull disclosed net proceeds of US$430,066 from the transaction. After the listing, Webull's shares experienced extreme volatility, rising as much as 500% to US$79.56 on April 14, 2025, after closing at US$13.25 on the prior trading day. The initial post-listing surge increased the value of Webull holdings owned by earlier investors, including RIT Capital Partners, which had first invested in Webull in 2021. In April 2026, after Webull's shares had fallen about 70% over the previous year, the company authorized a US$100 million share repurchase program. == Business model and financials == Webull provides a self-directed electronic trading platform available through mobile, desktop and web applications. Depending on jurisdiction, the platform offers trading in stocks, exchange-traded funds, options, margin, futures, fixed income products, cryptocurrency, cash management features and market data tools. In the United States, Webull Financial LLC is a registered broker-dealer and member of FINRA and the Securities Investor Protection Corporation, while Webull operates in other markets through locally licensed brokerage subsidiaries. Webull operates a commission-free or low-cost brokerage model for self-directed retail investors. In the United States, a substantial part of its trading-related revenue comes from payment for order flow, while in some non-U.S. markets the company more commonly charges commissions directly to customers. The platform is aimed at more active retail investors, including users seeking options tools, extended-hours trading and real-time market data. For 2025, Webull reported total revenue of US$571.0 million, up from US$390.2 million in 2024. Equity and option order-flow rebates accounted for US$304.1 million, or 53.3% of revenue, making order-flow rebates the company's largest reported revenue category. Interest-related income accounted for US$154.3 million, handling charge income for US$87.3 million and other revenue for US$25.3 million. Options were the larger component of the company's order-flow rebates in 2025, generating US$210.0 million compared with US$94.2 million from equities. Webull also generates revenue from interest-related activities, including margin financing, customer bank deposits, stock lending and corporate bank deposits. The company has stated that its interest-related income is affected by interest rates, customer cash balances, margin balances and demand for stock lending. The company had approximately 20 million registered users worldwide as of February 2024. As of December 31, 2025, it reported 26.8 million registered users, 5.0 million funded accounts and US$24.6 billion in customer assets. As of March 2025, Webull operated in Hong Kong, Singapore, Australia, South Africa, Japan, the United Kingdom, the United States, Indonesia, Canada, Brazil, Thailand, Malaysia and Mexico. == Marketing and sponsorships == Webull has used paid digital advertising, referral incentives, free-stock promotions, affiliate marketing and sports sponsorships to acquire customers and promote its brand. In its 2025 annual filing, the company reported marketing and branding expenses of US$152.3 million in 2023, US$138.7 million in 2024 and US$135.9 million in 2025. Webull said most of its advertising and promotion costs were related to paid search and paid social advertising, and that it had reduced free-stock promotions while shifting toward deposit- and asset-transfer-based incentives. In September 2021, BSE Global, the parent company of the Brooklyn Nets and New York Liberty, entered into a global multi-year agreement with Webull. Under the agreement, Webull became an official sponsor and online brokerage partner of the teams, with branding that included a jersey patch on Brooklyn Nets uniforms. Spo

    Read more →
  • Point-in-time recovery

    Point-in-time recovery

    Point-in-time recovery (PITR) in the context of computers involves systems, often databases, whereby an administrator can restore or recover a set of data or a particular setting from a time in the past. Note for example Windows's capability to restore operating-system settings from a past date (for instance, before data corruption occurred). Time Machine for macOS provides another example of point-in-time recovery. Once PITR logging starts for a PITR-capable database, a database administrator can restore that database from backups to the state that it had at any time since.

    Read more →
  • Data Science Africa

    Data Science Africa

    Data Science Africa (DSA) is a non-profit knowledge sharing professional group that aims at bringing together leading researchers and practitioners working on data science methods or applications relevant to Africa, and providing training on state of the art data science methods to students and others interested in developing practical skills. Since 2013, DSA has been organizing conference, workshops and summer schools on machine learning and data science across East Africa. Facilitators of Summer School and workshops are researchers and practitioners from the academia, private and public institutions across the world. == Summer schools and workshops == The first summer school which started as Gaussian Process Summer School was held at Makerere University in Kampala, Uganda from 6th to 9 August 2013. The First Data Science Summer School and Workshop was held at Dedan Kimathi University of Technology in Nyeri, Kenya from 15th to 19 June 2015. The Second Data Science Summer School was held at Makerere University, Kampala, Uganda from 27th to 29 July 2016, and the workshop was held at Pulse Lab, Kampala, Uganda from 30 July to 1 August 2016. The Third Data Science Summer School and Workshop was held at Nelson Mandela African Institute of Science and Technology, Tanzania from 19th to 21 July 2017. Among the sponsors of the event was ARM

    Read more →