AI Data Quality Tools

AI Data Quality Tools — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Language resource

    Language resource

    In linguistics and language technology, a language resource is a "[composition] of linguistic material used in the construction, improvement and/or evaluation of language processing applications, (...) in language and language-mediated research studies and applications." According to Bird & Simons (2003), this includes data, i.e. "any information that documents or describes a language, such as a published monograph, a computer data file, or even a shoebox full of handwritten index cards. The information could range in content from unanalyzed sound recordings to fully transcribed and annotated texts to a complete descriptive grammar", tools, i.e., "computational resources that facilitate creating, viewing, querying, or otherwise using language data", and advice, i.e., "any information about what data sources are reliable, what tools are appropriate in a given situation, what practices to follow when creating new data". The latter aspect is usually referred to as "best practices" or "(community) standards". In a narrower sense, language resource is specifically applied to resources that are available in digital form, and then, "encompassing (a) data sets (textual, multimodal/multimedia and lexical data, grammars, language models, etc.) in machine readable form, and (b) tools/technologies/services used for their processing and management". == Typology == As of May 2020, no widely used standard typology of language resources has been established (current proposals include the LREMap, METASHARE, and, for data, the LLOD classification). Important classes of language resources include data lexical resources, e.g., machine-readable dictionaries, linguistic corpora, i.e., digital collections of natural language data, linguistic data bases such as the Cross-Linguistic Linked Data collection, tools linguistic annotations and tools for creating such annotations in a manual or semiautomated fashion (e.g., tools for annotating interlinear glossed text such as Toolbox and FLEx, or other language documentation tools), applications for search and retrieval over such data (corpus management systems), for automated annotation (part-of-speech tagging, syntactic parsing, semantic parsing, etc.), metadata and vocabularies vocabularies, repositories of linguistic terminology and language metadata, e.g., MetaShare (for language resource metadata), the ISO 12620 data category registry (for linguistic features, data structures and annotations within a language resource), or the Glottolog database (identifiers for language varieties and bibliographical database). == Language resource publication, dissemination and creation == A major concern of the language resource community has been to develop infrastructures and platforms to present, discuss and disseminate language resources. Selected contributions in this regard include: a series of International Conferences on Language Resources and Evaluation (LREC), the European Language Resources Association (ELRA, EU-based), and the Linguistic Data Consortium (LDC, US-based), which represent commercial hosting and dissemination platforms for language resources, the Open Languages Archives Community (OLAC), which provides and aggregates language resource metadata, the Language Resources and Evaluation Journal (LREJ), the European Language Grid is a European platform for language technologies (eg services), data and resources. As for the development of standards and best practices for language resources, these are subject of several community groups and standardization efforts, including ISO Technical Committee 37: Terminology and other language and content resources (ISO/TC 37), developing standards for all aspects of language resources, W3C Community Group Best Practices for Multilingual Linked Open Data (BPMLOD), working on best practice recommendations for publishing language resources as Linked Data or in RDF, W3C Community Group Linked Data for Language Technology (LD4LT), working on linguistic annotations on the web and language resource metadata, W3C Community Group Ontology-Lexica (OntoLex), working on lexical resources, the Open Linguistics working group of the Open Knowledge Foundation, working on conventions for publishing and linking open language resources, developing the Linguistic Linked Open Data cloud, the Text Encoding Initiative (TEI), working on XML-based specifications for language resources and digitally edited text.

    Read more →
  • Mean opinion score

    Mean opinion score

    Mean opinion score (MOS) is a measure used in the domain of Quality of Experience and telecommunications engineering, representing overall quality of a stimulus or system. It is the arithmetic mean over all individual "values on a predefined scale that a subject assigns to his opinion of the performance of a system quality". Such ratings are usually gathered in a subjective quality evaluation test, but they can also be algorithmically estimated. MOS is a commonly used measure for video, audio, and audiovisual quality evaluation, but not restricted to those modalities. ITU-T has defined several ways of referring to a MOS in Recommendation ITU-T P.800.1, depending on whether the score was obtained from audiovisual, conversational, listening, talking, or video quality tests. == Rating scales and mathematical definition == The MOS is expressed as a single rational number, typically in the range 1–5, where 1 is lowest perceived quality, and 5 is the highest perceived quality. Other MOS ranges are also possible, depending on the rating scale that has been used in the underlying test. The Absolute Category Rating scale is very commonly used, which maps ratings between Bad and Excellent to numbers between 1 and 5, as seen in below table. Other standardized quality rating scales exist in ITU-T Recommendations (such as ITU-T P.800 or ITU-T P.910). For example, one could use a continuous scale ranging between 1–100. Which scale is used depends on the purpose of the test. In certain contexts there are no statistically significant differences between ratings for the same stimuli when they are obtained using different scales. The MOS is calculated as the arithmetic mean over single ratings performed by human subjects for a given stimulus in a subjective quality evaluation test. Thus: M O S = ∑ n = 1 N R n N {\displaystyle MOS={\frac {\sum _{n=1}^{N}{R_{n}}}{N}}} Where R {\displaystyle R} are the individual ratings for a given stimulus by N {\displaystyle N} subjects. == Properties of the MOS == The MOS is subject to certain mathematical properties and biases. In general, there is an ongoing debate on the usefulness of the MOS to quantify Quality of Experience in a single scalar value. When the MOS is acquired using a categorical rating scales, it is based on – similar to Likert scales – an ordinal scale. In this case, the ranking of the scale items is known, but their interval is not. Therefore, it is mathematically incorrect to calculate a mean over individual ratings in order to obtain the central tendency; the median should be used instead. However, in practice and in the definition of MOS, it is considered acceptable to calculate the arithmetic mean. It has been shown that for categorical rating scales (such as ACR), the individual items are not perceived equidistant by subjects. For example, there may be a larger "gap" between Good and Fair than there is between Good and Excellent. The perceived distance may also depend on the language into which the scale is translated. However, there exist studies that could not prove a significant impact of scale translation on the obtained results. Several other biases are present in the way MOS ratings are typically acquired. In addition to the above-mentioned issues with scales that are perceived non-linearly, there is a so-called "range-equalization bias": subjects, over the course of a subjective experiment, tend to give scores that span the entire rating scale. This makes it impossible to compare two different subjective tests if the range of presented quality differs. In other words, the MOS is never an absolute measure of quality, but only relative to the test in which it has been acquired. For the above reasons – and due to several other contextual factors influencing the perceived quality in a subjective test – a MOS value should only be reported if the context in which the values have been collected in is known and reported as well. MOS values gathered from different contexts and test designs therefore should not be directly compared. Recommendation ITU-T P.800.2 prescribes how MOS values should be reported. Specifically, P.800.2 says:it is not meaningful to directly compare MOS values produced from separate experiments, unless those experiments were explicitly designed to be compared, and even then the data should be statistically analysed to ensure that such a comparison is valid. == MOS for speech and audio quality estimation == MOS historically originates from subjective measurements where listeners would sit in a "quiet room" and score a telephone call quality as they perceived it. This kind of test methodology had been in use in the telephony industry for decades and was standardized in Recommendation ITU-T P.800. It specifies that "the talker should be seated in a quiet room with volume between 30 and 120 m³ and a reverberation time less than 500 ms (preferably in the range 200–300 ms). The room noise level must be below 30 dBA with no dominant peaks in the spectrum." Requirements for other modalities were similarly specified in later ITU-T Recommendations. == MOS estimation using quality models == Obtaining MOS ratings may be time-consuming and expensive as it requires the recruitment of human assessors. For various use cases such as codec development or service quality monitoring purposes – where quality should be estimated repeatedly and automatically – MOS scores can also be predicted by objective quality models, which typically have been developed and trained using human MOS ratings. A question that arises from using such models is whether the MOS differences produced are noticeable to the users. For example, when rating images on a five point MOS scale, an image with a MOS equal to 5 is expected to be noticeably better in quality than one with a MOS equal to 1. Contrary to that, it is not evident whether an image with a MOS equal to 3.8 is noticeably better in quality than one with a MOS equal to 3.6. Research conducted on determining the smallest MOS difference that is perceptible to users for digital photographs showed that a MOS difference of approximately 0.46 is required in order for 75% of the users to be able to detect the higher quality image. Nevertheless, image quality expectation, and hence MOS, changes over time with the change of user expectations. As a result, minimum noticeable MOS differences determined using analytical methods such as in may change over time.

    Read more →
  • List of video games using NFC

    List of video games using NFC

    This is a list of video games that use near field communication (NFC) technology. Currently, games have leveraged NFC in unlocking additional features through payment. This takes the form of a direct transaction over NFC or by purchasing a physical item, which signals to the platform that a certain set of features has been purchased (e.g. Skylanders). This list catalogues gaming NFC platforms by device. == Mobile == === Android === Gun Bros. Near Field Ninja NFC Cards Skylanders, with an NFC base. The Haunted House: Soul Fighters, with an NFC base. === iOS === ==== As item-triggered game enhancement ==== Skylanders, with an NFC base. ==== As payment ==== In-App Purchases Here, games that leverage Apple's In-App Purchase framework use information stored in the NFC Secure Element to process the purchase through Apple Pay. While an NFC radio is not used here, the NFC protocol is used nonetheless. == Console == === Nintendo Wii, Wii U, Switch, Switch 2, 3DS and 2DS === ==== As item-triggered game enhancement ==== Pokémon Rumble U NFC Figure Amiibo, built into Nintendo consoles since 2014. Works with Wii U, New Nintendo 3DS/3DS XL, New Nintendo 2DS XL, Nintendo Switch, Nintendo Switch 2 and older Nintendo 3DS/Nintendo 2DS systems via a peripheral device. Disney Infinity, with an NFC base. Works with Wii, Nintendo 3DS, Nintendo 2DS and Wii U. Lego Dimensions, with an NFC base. Works with Wii U. Skylanders, with an NFC base. Works with Wii, Nintendo 3DS, Nintendo 2DS and Wii U. The Nintendo Switch version of Skylanders: Imaginators uses the NFC built into the game controller, it is also has full backward compatibility with Nintendo Switch 2. Some functionalities are missing compared to the other versions. ==== As payment ==== The Wii U GamePad controller, Joy-Con R, Joy-Con 2 R, Nintendo Switch Pro Controller and Nintendo Switch 2 Pro Controller can read information from an NFC data source. === PlayStation === Disney Infinity, with an NFC base. Works with PlayStation 3, PlayStation Vita, PlayStation 4 and PlayStation 5. Lego Dimensions, with an NFC base. Works with PlayStation 3, PlayStation 4 and PlayStation 5. Skylanders, with an NFC base. Works with PlayStation 3, PlayStation 4 and PlayStation 5. === Xbox === While NFC bases are normally interoperable between all platforms, the Xbox 360, Xbox One and Xbox Series X require specific bases that are compatible only with the respective platform. Disney Infinity, with an NFC base. Lego Dimensions, with an NFC base. Skylanders, with an NFC base.

    Read more →
  • FutureMedia

    FutureMedia

    FutureMedia is a program that analyzes the state and future of digital, social, and mobile media. It functions as a collaborative initiative at Georgia Tech and the Georgia Tech Research Institute. FutureMedia consults approximately 500 faculty members working in those fields. == History == In 2019, Future Media expanded into the Direct-To-Consumer market by acquiring Australian watchmaker Oak & Jackal. == Programs == === FutureMedia Fest === The organization most recently hosted FutureMedia Fest 2010, a four-day conference (Oct 4–7, 2010) with a keynote addresses from Michael Jones, the chief technology advocate at Google. The event featured panels, workshops, and technology demonstrations. === FutureMedia Outlook === Contemporaneous with FutureMedia Fest 2010, the organization released the FutureMedia Outlook, an analysis of the future of media, concentrating on six major trends in those fields, including information overload, personalization, data integrity, an expectation of multimedia, augmented reality, and collaborative software.

    Read more →
  • Semantic space

    Semantic space

    Semantic spaces in the natural language domain aim to create representations of natural language that are capable of capturing meaning. The original motivation for semantic spaces stems from two core challenges of natural language: Vocabulary mismatch (the fact that the same meaning can be expressed in many ways) and ambiguity of natural language (the fact that the same term can have several meanings). The application of semantic spaces in natural language processing (NLP) aims at overcoming limitations of rule-based or model-based approaches operating on the keyword level. The main drawback with these approaches is their brittleness, and the large manual effort required to create either rule-based NLP systems or training corpora for model learning. Rule-based and machine learning based models are fixed on the keyword level and break down if the vocabulary differs from that defined in the rules or from the training material used for the statistical models. Research in semantic spaces dates back more than 20 years. In 1996, two papers were published that raised a lot of attention around the general idea of creating semantic spaces: latent semantic analysis and Hyperspace Analogue to Language. However, their adoption was limited by the large computational effort required to construct and use those semantic spaces. A breakthrough with regard to the accuracy of modelling associative relations between words (e.g. "spider-web", "lighter-cigarette", as opposed to synonymous relations such as "whale-dolphin", "astronaut-driver") was achieved by explicit semantic analysis (ESA) in 2007. ESA was a novel (non-machine learning) based approach that represented words in the form of vectors with 100,000 dimensions (where each dimension represents an Article in Wikipedia). However practical applications of the approach are limited due to the large number of required dimensions in the vectors. More recently, advances in neural network techniques in combination with other new approaches (tensors) led to a host of new recent developments: Word2vec from Google, GloVe from Stanford University, and fastText from Facebook AI Research (FAIR) labs.

    Read more →
  • Bare machine

    Bare machine

    In information technology, a bare machine (or bare-metal computer) is a computer which has no operating system. The software executed by a bare machine, commonly called a bare metal program or bare metal application, is designed to interact directly with hardware. Bare machines are widely used in embedded systems, particularly in cases where resources are limited or high performance is required. == Bare machine computing == Bare Machine Computing is a computing paradigm in which application software runs directly on a bare machine as a single, stand-alone executable, without an operating system or device drivers. The application software has direct access to hardware resources, and there is typically no distinction between user and kernel mode. It is self-managed software that boots, loads and runs without using any other software components. Bare metal programs are typically written in a close-to-hardware language such as C or assembly language. == Advantages == Typically, a bare-metal application will run faster, use less memory and be more power efficient than an equivalent program that relies on an operating system, due to the inherent overhead imposed by system calls. For example, hardware inputs and outputs are directly accessible to bare metal software, whereas they must usually be accessed through system calls when using an OS. It has no OS and therefore has no OS-related vulnerabilities. == Disadvantages == Bare metal applications typically require more effort to develop because operating system services such as memory management and task scheduling are not available. Debugging a bare-metal program may be complicated by factors such as: Lack of a standard output. The target machine may differ from the hardware used for program development (e.g., emulator, simulator). This forces setting up a way to load the bare-metal program onto the target (flashing), start the program execution and access the target resources. == Examples == === Early computers === Early computers, such as the PDP-11, allowed programmers to load a program, supplied in machine code, to RAM. The resulting operation of the program could be monitored by lights, and output derived from magnetic tape, print devices, or storage. Amdahl UTS's performance improves by 25% when run on bare metal without VM, the company said in 1986. === Embedded systems === Bare machine programming is a common practice in embedded systems, in which microcontrollers or microprocessors boot directly into monolithic, single-purpose software without loading an operating system. Such embedded software can vary in structure. For example, one such program paradigm, known as foreground-background or superloop architecture, consists of an infinite main loop in which each task is executed sequentially and must voluntarily return control back to the loop. The loop runs these cooperative background processes that are not time-critical, while interrupt service routines momentarily interrupt the loop to handle time-critical foreground tasks.

    Read more →
  • Visopsys

    Visopsys

    Visopsys (Visual Operating System), is an operating system, written by Andy McLaughlin. Development of the operating system began in 1997. The operating system is licensed under the GNU GPL, with the headers and libraries under the less restrictive LGPL license. It runs on the 32-bit IA-32 architecture. It features a multitasking kernel, supports asynchronous I/O and the FAT line of file systems. It requires a Pentium processor. == History == The development of Visopsys began in 1997, being written by Andy McLaughlin. The first public release of the Operating System was on 2 March 2001, with version 0.1. In this release, Visopsys was a 32 bit operating system, supporting preemptive multitasking and virtual memory. == System overview == Visopsys uses a monolithic kernel, written in the C programming language, with elements of assembly language for certain interactions with the hardware. The operating system supports a graphical user interface, with a small C library.

    Read more →
  • Alt TikTok

    Alt TikTok

    Alt TikTok (or 2020 Alt) was an online youth subculture and internet community that emerged on TikTok in 2020. Alt TikTok users (also known as alt girls, alt boys, or alt kids) emerged as primarily LGBTQ+ individuals who were in contrast to "Straight TikTok" which was seen as the mainstream and heteronormative side of the platform. The subculture became closely associated with music surrounding the hyperpop scene, particularly 100 gecs and also led to a short-lived fashion style and Internet aesthetic adopted by Generation Z during the COVID-19 lockdowns. Notable artists associated with the movement included Girl in Red, Freddie Dredd, David Shawty, WHOKILLEDXIX, and 645AR. While "alt kid" might imply a general association with traditional alternative fashion, the subculture was more an offshoot of e-girls and e-boys. In 2023, the hashtag #altfashion on TikTok amassed over 1.8 billion views. == History == Around mid-2020, users on TikTok began to group different content on the site into labels like "elite TikTok", "deep TikTok", and "floptok". These categories acted as different "sides of TikTok", deviating from mainstream lip syncing, online trends, and dance videos. Alt TikTok became one of the many subcultural communities to emerge during this period, initially referred to interchangeably with "elite TikTok". The movement quickly identified itself with alternative and queer users, in contrast to "Straight TikTok", also known as the "straight side of TikTok", which was seen as the mainstream and heteronormative side of the platform. Alt TikTok was accompanied by memes with surrealist or supernatural themes (sometimes being described as cursed), such as videos with heavy saturation and humanoid animals. One of the popular videos from Alt TikTok, gaining 18 million likes, shows a llama dancing to a cover of a song from a Russian commercial by the cereal brand Miel Pops, later becoming a viral audio. Some Alt TikTok users personified brands and products in what was referred to as Retail TikTok. In 2020, Rolling Stone described Alt TikTok as "one of the primary countercultures on the app." In 2020, American journalist Taylor Lorenz stated in an article of The New York Times, "Every pop sensation needs its ironic counterpoints. Alt Tiktok gets it done. [...] alt TikTok stars like Mooptopia are mainstays on the more indie side of the app. They aren't the popular crowd, but their cool, quirky content still attracts millions." === Trump rally trolling === In June 2020, alt TikTok and K-pop twitter users coordinated a strategy to ruin a Trump rally in Tulsa, Oklahoma. American politician and activist Alexandria Ocasio-Cortez later saluted the individuals for their "Trump troll". == Alt subculture == In 2020, Alt TikTok was one of many subcultural communities to emerge on TikTok, alongside Deep TikTok (aka DeepTok) and Flop TikTok (aka Floptok). The alt kid subculture emerged from Alt TikTok primarily among young Gen Z women, influenced by online fashion and aesthetics shaped by e-girls and e-boys. The movement was accelerated by the COVID-19 lockdowns, while the subculture itself stood in opposition to mainstream "Straight TikTok" and the VSCO girl movement, primarily adopting aspects of queer and alternative culture. While the phrase might imply a general association with alternative fashion or alternative culture, it is more accurately understood as a specific internet-driven outgrowth of online aesthetic youth subcultures like e-girls and e-boys. The alt subculture's visual style blended influences from goth, punk, emo, and grunge, often expressed through fashion, music taste, and online presence. === Style and music === The style of alt-girls is reminiscent of a myriad of previous alternative fashion trends, often blending these influences with online aesthetics. In 2020, TikTok alt-girls were teens ranging from ages 13 to 16, who tended to wear friendship bracelets, goth boots, Dr. Martens, bunny and frog hats, piercings, and split-dyed hair, as well as iconography lifted from Monster Energy and Hello Kitty. Some alt-girls displayed a love of cosplay, while drawing from Japanese anime and manga, particularly Danganronpa and Haikyu!!, which originally gained traction on the app through Anime TikTok (aka Anitok). Alt TikTok has been noted for being primarily influenced by queer and alternative culture, positioning itself in contrast to "Straight TikTok", which focused on mainstream dances and music. Alt kids frequently intersected with the e-girls and e-boys subculture, in terms of music, style, visual media, and aesthetics. Several musicians and artists were closely associated with the alt subculture, particularly those in the hyperpop scene, while alt tiktok users became important in the wider popularization of artists like 100 gecs. Notable prominent artists associated with Alt Tiktok included Girl in Red, Freddie Dredd, David Shawty, WHOKILLEDXIX, and 645AR, alongside music by YouTubers turned musicians such as Wilbur Soot's "I'm in Love With an E‐Girl" and Corpse Husband's "E-Girls Are Ruining My Life!". == Legacy == In 2020, Pitchfork claimed Alt TikTok as having an influence on wider music trends, stating: "Alt TikTok's music is now a hot zone for major record labels, pushing it even further into the mainstream". After the COVID-19 lockdowns, Alt TikTok, alongside its subculture, fell out of prominence and was taken over by other Gen Z-related internet aesthetics, developments, and online trends.

    Read more →
  • National Cyber Security Policy 2013

    National Cyber Security Policy 2013

    National Cyber Security Policy is a policy framework by Department of Electronics and Information Technology (DeitY) It aims at protecting the public and private infrastructure from cyber attacks. The policy also intends to safeguard "information, such as personal information (of web users), financial and banking information and sovereign data". This was particularly relevant in the wake of US National Security Agency (NSA) leaks that suggested the US government agencies are spying on Indian users, who have no legal or technical safeguards against it. Ministry of Communications and Information Technology (India) defines Cyberspace as a complex environment consisting of interactions between people, software services supported by worldwide distribution of information and communication technology. == Reason for Cyber Security policies == India had no Cyber security policy before 2013. In 2013, The Hindu newspaper, citing documents leaked by NSA whistle-blower Edward Snowden, has alleged that much of the NSA surveillance was focused on India's domestic politics and its strategic and commercial interests. This sparked a furore among people. Under pressure, the government unveiled a National Cyber Security Policy 2013 on 2 July 2013. == Vision == To build a secure and resilient cyberspace for citizens, business, and government and also to protect anyone from intervening in user's privacy.It mentioned a five year target of training five lakh cyber security personnel by 2018. == Mission == To protect information and information infrastructure in cyberspace, build capabilities to prevent and respond to cyber threat, reduce vulnerabilities and minimize damage from cyber incidents through a combination of institutional structures, people, processes, technology, and cooperation. == Objective == Ministry of Communications and Information Technology (India) define objectives as follows: To create a secure cyber ecosystem in the country, generate adequate trust and confidence in IT system and transactions in cyberspace and thereby enhance adoption of IT in all sectors of the economy. To create an assurance framework for the design of security policies and promotion and enabling actions for compliance to global security standards and best practices by way of conformity assessment (Product, process, technology & people). To strengthen the Regulatory Framework for ensuring a SECURE CYBERSPACE ECOSYSTEM. To enhance and create National and Sectoral level 24x7 mechanism for obtaining strategic information regarding threats to ICT infrastructure, creating scenarios for response, resolution and crisis management through effective predictive, preventive, protective response and recovery actions. -To improve visibility of integrity of ICT products and services by establishing infrastructure for testing & validation of security of such product. To create workforce for 500,000 professionals skilled in next 5 years through capacity building skill development and training. To provide fiscal benefit to businesses for adoption of standard security practices and processes. To enable Protection of information while in process, handling, storage & transit so as to safeguard privacy of citizen's data and reducing economic losses due to cyber crime or data theft. To enable effective prevention, investigation and prosecution of cybercrime and enhancement of law enforcement capabilities through appropriate legislative intervention. == Strategies == Creating a secured Ecosystem. Creating an assurance framework. Encouraging Open Standards. Strengthening The regulatory Framework. Creating a mechanism for Security Threats Early Warning, Vulnerability management, and response to security threats. Securing E-Governance services. Protection and resilience of Critical Information Infrastructure. Promotion of Research and Development in cyber security. Reducing supply chain risks Human Resource Development (fostering education and training programs both in formal and informal sectors to Support the Nation's cyber security needs and build capacity. Creating cyber security awareness. Developing effective Public-Private partnerships. To develop bilateral and multilateral relationships in the area of cyber security with another country. (Information sharing and cooperation) a Prioritized approach for implementation.

    Read more →
  • AMiner (database)

    AMiner (database)

    AMiner (formerly ArnetMiner) is a free online service used to index, search, and mine big scientific data. == Overview == AMiner (ArnetMiner) is designed to search and perform data mining operations against academic publications on the Internet, using social network analysis to identify connections between researchers, conferences, and publications. This allows it to provide services such as expert finding, geographic search, trend analysis, reviewer recommendation, association search, course search, academic performance evaluation, and topic modeling. AMiner was created as a research project in social influence analysis, social network ranking, and social network extraction. A number of peer-reviewed papers have been published arising from the development of the system. It has been in operation for more than three years, and has indexed 130,000,000 researchers and more than 265 million publications. The research was funded by the Chinese National High-tech R&D Program and the National Science Foundation of China. AMiner is commonly used in academia to identify relationships between and draw statistical correlations about research and researchers. It has attracted more than 10 million independent IP accesses from 220 countries and regions. The product has been used in Elsevier's SciVerse platform, and academic conferences such as SIGKDD, ICDM, PKDD, WSDM. == Operation == AMiner automatically extracts the researcher profile from the web. It collects and identifies the relevant pages, then uses a unified approach to extract data from the identified documents. It also extracts publications from online digital libraries using heuristic rules. It integrates the extracted researchers’ profiles and the extracted publications. It employs the researcher name as the identifier. A probabilistic framework has been proposed to deal with the name ambiguity problem in the integration. The integrated data is stored into a researcher network knowledge base (RNKB). The principal other product in the area are Google Scholar, Elsevier's Scirus, and the open source project CiteSeer. == History == It was initiated and created by professor Jie Tang from Tsinghua University, China. It was first launched in March 2006. The following provide a list of updates in the past years: March 2006, Version 0.1, Functions include researcher profiling, expert search, conference search, and publication search. The system was developed in Perl; August 2006, Version 1.0, The system was re-implemented in Java; July 2007, Version 2.0, New functions include researcher interest mining, association search, survey paper finding (unavailable now); April 2008, Version 3.0, New functions include query understanding, new GUI, and search log analysis; November 2008, Version 4.0, New functions include graph search, topic modeling, NSF/NSFC funding information extraction; April 2009, Version 5.0, New functions include Profile edition, open API service, Bole search, course search (unavailable now); December 2009, Version 6.0, New functions include academic performance evaluation, user feedback, conference analysis; May 2010, Version 7.0, New functions include name disambiguation, paper-reviewer recommendation, ArnetPage creation; March 2012, Version II, renamed as AMiner, rewrote all the codes and redesign the GUI. New functions include: geographic search, ArnetAPP platform. June 2014, Version II, renamed as AMiner, rewrote all the codes and redesign the GUI. New functions include: geographic search, ArnetAPP platform. December 2015, a completely new version got online. May 2017, professional version got online. April 2018, New functions include Trend Analysis, a deep learning based Name Disambiguation == Resources == AMiner published several datasets for academic research purpose, including Open Academic Graph, DBLP+citation (a data set augmenting citations into the DBLP data from Digital Bibliography & Library Project), Name Disambiguation, Social Tie Analysis. For more available datasets and source codes for research, please refer to.

    Read more →
  • COVFEFE Act

    COVFEFE Act

    The Communications Over Various Feeds Electronically for Engagement Act (COVFEFE Act), House Bill H.R. 2884, was introduced in the United States House of Representatives on June 12, 2017, during the 115th United States Congress. The bill was intended to amend the Presidential Records Act to preserve Twitter posts and other social media interactions of the President of the United States and require the National Archives to store such items. H.R. 2884 was assigned to the House Oversight and Reform Committee for consideration. While in committee, there were no roll call votes related to the bill. The bill died in committee. U.S. Representative Mike Quigley, Democrat of Illinois, introduced the legislation due to Donald Trump's routine use of Twitter, stating "In order to maintain public trust in government, elected officials must answer for what they do and say; this includes 140-character tweets. If the president is going to take to social media to make sudden public policy proclamations, we must ensure that these statements are documented and preserved for future reference". If enacted, the bill "would bar the prolifically tweeting president from deleting his posts, as he has sometimes done". The COVFEFE Act would have also treated a president's personal social media accounts (e.g., Trump's "@realDonaldTrump" Twitter account) the same as official social media accounts (e.g., the "@POTUS" Twitter account). == Background == The bill title refers to "covfefe", a word in a May 31, 2017 tweet that Trump sent at 12:06 AM EDT, reading "Despite the constant negative press covfefe". This incomplete tweet was liked and retweeted hundreds of thousands of times, making it one of the most popular tweets of 2017, as people speculated on its meaning. The tweet was deleted at 5:48 AM EDT. At 6:09 AM EDT, Trump's account tweeted "Who can figure out the true meaning of 'covfefe' ??? Enjoy!" During the May 31 White House press briefing, Hunter Walker of Yahoo! News asked White House press secretary Sean Spicer about the tweet and if there was any concern about the president sending out incoherent tweets that stay up for hours. Spicer responded, "I think the president and a small group of people know exactly what he meant" and offered no other explanation. This unexpected response spawned additional media attention and criticism for its cryptic meaning, with commentators unsure whether or not Spicer was joking. Callum Borchers of The Washington Post's The Fix noted that the Trump administration deliberately responded in a way that encouraged the media and the public to focus on covfefe instead of other controversies like the Russia investigation, resignation of White House communications director Michael Dubke, or U.S.-Germany relations. == Legal significance of Trump's tweeting == Trump's tweets have been legally significant in the past. White House Press Secretary Sean Spicer stated that Trump's tweets are "considered official statements by the President of the United States". Some of his tweets have contradicted his agenda by undercutting or contradicting statements of public officials as well as the arguments of U.S. Department of Justice attorneys seeking to defend Trump's decisions in court. A federal appellate court cited one of Trump's tweets in upholding a lower court's order blocking Trump's Executive Order 13780 from going into effect in 2017. Courts have been clear that Twitter statements can be used as evidence of intent. Before Trump's "@realDonaldTrump" Twitter account was suspended, he blocked a number of users, preventing them from viewing his tweets or posting public replies. A group associated with Columbia University filed a lawsuit on behalf of blocked users, called Knight First Amendment Institute v. Trump. Plaintiffs successfully argued that @realDonaldTrump reply threads constituted a "designated public forum" akin to a public meeting, and therefore blocking users based on their political viewpoints violated their constitutional right to freedom of speech. The Second Circuit upheld this ruling on July 9, 2019. Regardless of the failure of the bill, Trump's tweets have been archived in accordance with the Presidential and Federal Records Act Amendments of 2014.

    Read more →
  • Digital scrapbooking

    Digital scrapbooking

    Digital scrapbooking is the term for the creation of a new 2D artwork by re-combining various graphic elements. It is a form of scrapbooking that is done using a personal computer, digital or scanned photos and computer graphics software. It is a relatively new form of the traditional print scrapbooking. Recent advances in technology now enable the craft to be pursued on tablets and smart devices utilising imaging apps as well as hobby specific apps, some of which have been created specifically by brands for use with their own image products. Digital scrapbooking kits are available to purchase and download at many websites that specialize in the craft. Kits contain graphics and word-art and are usually themed and color-coordinated. They usually consist of a mix of background images and "cut out" [extracted] images containing alpha channels. Once a kit has been downloaded to the computer or device, it can then be used over and over again to make new scrapbook pages (scrapbook layouts) within the software program that one chooses to use, often in combination with the users's own family photographs, scanned keepsakes and other unique personal elements scanned on a flatbed scanner. Scanning is usually done at 300dpi, to make the resulting images suitable for print. == Licensing and Copyright == Kits are sometimes licensed differently from other forms of traditional royalty-free stock images that may be purchased per-item or in sets at online stock photography sites. Some kit packs will be wholly royalty-free, but some kit makers may restrict usage to non-commercial work only. Some may specifically forbid the use of their work in projects for commercial gain, for example greetings cards and gift tags that may be made with their kits. Licensing often varies from kit to kit, even from the same maker. Some kits include derivative works of public domain material. In contrast to stock, creators of digital scrapbooking kits often require a credit or byline to indicate that their image elements have been used in a new creation. == Uses == Some artistic individuals combine digital scrapbooking with traditional scrapbooking to create what's known as hybrid scrapbooking projects. Hybrid scrapbooking involves creating layouts on the computer using digital supplies that will then be printed and combined with traditional supplies such as buttons, ribbons and other elements. Conversely, a hybrid scrapbook project may also be created using traditional paper supplies and augmented with digital elements that have been printed and cut out specifically for use on the project. Journaling may be done within the software programs to accompany images and to create digital storybooks, or scrapbooks, which are then published in photo books via various popular print-on-demand services, printed and added to traditional scrapbooks, burned to CDs or posted on the Web. Digital Scrapbooking may also be done online by uploading photos to a specialist scrapbooking website and utilising their custom built platforms and decorative image elements to complete the projects for print to finished products, for example photo books and holiday greeting cards. == Market Size == The traditional scrapbooking market appeared to decline somewhat in the USA since 2010, probably due to the 2008 financial crisis, and the digital scrapbooking market (being potentially a much cheaper form of scrapbooking) may have increased accordingly. Both markets currently appear to have recovered lost ground and expanded since the beginning of the COVID-19 pandemic as many people sought to productively fill their time during lockdowns, quarantines and self-isolation / stay at home directions. == Digital scrapbooking software == The main software programs that are typically used are Adobe Photoshop, Adobe Photoshop Elements, paint.net (freeware), Filter Forge, Corel Paintshop Pro, and GIMP. Additionally Adobe offer the Photoshop iOS product using the same code base as the desktop version to drive the app version. == Digital scrapbooking supplies == Digital scrapbooking supplies are downloaded from the Internet and then stored on a computer or external hardrive, DVD or CD media, SD cards, or in the cloud, to be used as needed. Both paid and free digital scrapbooking supplies available from numerous designers on their blogs or in e-commerce stores either as solo designers or as part of a wide cohort of designers working cooperatively in large full service e-commerce websites. Usually designed at 300ppi image resolution, digital scrapbooking product offerings and supplies often include: Full coordinated kits containing digital background “papers”, decorative alphabets, and diverse embellishments generally containing a mixture of .JPG and .PNG files; "Quick pages", flattened files containing a completed page layout with transparent photo windows in .PNG file format; Digital templates, fully layered layouts i.e. pages that have had the composition pre-designed ready for use in an imaging program or app, fully customizable for color schemes, kit choices, photographs and other embellishments, generally supplied in either .PSD or .TIF file format; Hybrid “quick pages”, i.e. layouts that are both fully designed and fully layered for customization, generally supplied in either .PSD or .TIF file format; Adobe Photoshop actions, brushes, custom shapes, paths and styles, saved in their respective native Photoshop file formats; and Corel PaintShop Pro equivalent tools.

    Read more →
  • Text-to-image personalization

    Text-to-image personalization

    Text-to-Image personalization is a task in deep learning for computer graphics that augments pre-trained text-to-image generative models. In this task, a generative model that was trained on large-scale data (usually a foundation model), is adapted such that it can generate images of novel, user-provided concepts. These concepts are typically unseen during training, and may represent specific objects (such as the user's pet) or more abstract categories (new artistic style or object relations). Text-to-Image personalization methods typically bind the novel (personal) concept to new words in the vocabulary of the model. These words can then be used in future prompts to invoke the concept for subject-driven generation, inpainting, style transfer and even to correct biases in the model. To do so, models either optimize word-embeddings, fine-tune the generative model itself, or employ a mixture of both approaches. == Technology == Text-to-Image personalization was first proposed during August 2022 by two concurrent works, Textual Inversion and DreamBooth. In both cases, a user provides a few images (typically 3–5) of a concept, like their own dog, together with a coarse descriptor of the concept class (like the word "dog"). The model then learns to represent the subject through a reconstruction based objective, where prompts referring to the subject are expected to reconstruct images from the training set. In Textual Inversion, the personalized concepts are introduced into the text-to-image model by adding new words to the vocabulary of the model. Typical text-to-image models represent words (and sometimes parts-of-words) as tokens, or indices in a predefined dictionary. During generation, an input prompt is converted into such tokens, each of which is converted into a ‘word-embedding’: a continuous vector representation which is learned for each token as part of the model's training. Textual Inversion proposes to optimize a new word-embedding vector for representing the novel concept. This new embedding vector can then be assigned to a user-chosen string, and invoked whenever the user's prompt contains this string. In DreamBooth, rather than optimizing a new word vector, the full generative model itself is fine-tuned. The user first selects an existing token, typically one which rarely appears in prompts. The subject itself is then represented by a string containing this token, followed by a coarse descriptor of the subject's class. A prompt describing the subject will then take the form: "A photo of " (e.g. "a photo of sks cat" when learning to represent a specific cat). The text-to-image model is then tuned so that prompts of this form will generate images of the subject. == Textual Inversion == The key idea in Textual Inversion is to add a new term to the vocabulary of the diffusion model that corresponds to the new (personalized) concept. Textual Inversion operates by inverting the concepts into new pseudo-words within the textual embedding space of a pre-trained text-to-image model. These pseudo-words can be injected into new scenes using simple natural language descriptions, allowing for simple and intuitive modifications. The method allows a user to leverage multi-modal information — using a text-driven interface for ease of editing, but providing visual cues when approaching the limits of natural language. The resulting model is extremely light-weight per concept: only 1K long, but succeeds to encode detailed visual properties of the concept. == Extensions == Several approaches were proposed to refine and improve over the original methods. These include the following. Low-rank Adaptation (LoRA) - an adapter-based technique for efficient finetuning of models. In the case of text-to-image models, LoRA is typically used to modify the cross-attention layers of a diffusion model. Perfusion - a low rank update method that also locks the activations of the key matrix in the diffusion model's cross attention layers to the concept's coarse class. Extended Textual Inversion - a technique that learns an individual word embedding for each layer in the diffusion model's denoising network. Encoder-based methods that use another neural network to quickly personalize a model == Challenges and limitations == Text-to-image personalization methods must contend with several challenges. At their core is the goal of achieving high-fidelity to the personal concept while maintaining high alignment between novel prompts containing the subject, and the generated images (typically referred to as ‘editability’). Another challenge that personalization methods must contend with is memory requirements. Initial implementations of personalization methods required more than 20 Gigabytes of GPU memory, and more recent approaches have reported requirements of more than 40 Gigabytes. However, optimizations such as Flash Attention have since reduced this requirement considerably. Approaches that tune the entire generative model may also create checkpoints that are several gigabytes in size, making it difficult to share or store many models. Embedding based approaches require only a few kilobytes, but typically struggle to preserve identity while maintaining editability. More recent approaches have proposed hybrid tuning goals which optimize both an embedding and a subset of network weights. These can reduce storage requirements to as little as 100 Kilobytes while achieving quality comparable to full tuning methods. Finally, optimization processes can be lengthy, requiring several minutes of tuning for each novel concept. Encoder and quick-tuning methods aim to reduce this to seconds or less.

    Read more →
  • Packingham v. North Carolina

    Packingham v. North Carolina

    Packingham v. North Carolina, 582 U.S. 98 (2017), is a case in which the Supreme Court of the United States held that a North Carolina statute that prohibited registered sex offenders from using social media websites was unconstitutional because it violated the First Amendment to the U.S. Constitution, which protects freedom of speech. In 2010, Lester Gerard Packingham, a registered sex offender, posted on Facebook under a pseudonym to comment favorably on a recent traffic court experience. Police then identified Packingham and charged him with violating North Carolina's law. Packingham moved to dismiss the charges, arguing that the state's law violated the First Amendment. The trial court dismissed this motion and ultimately convicted Packingham. A state appellate court initially reversed the trial court, holding that the law did violate the First Amendment, but the North Carolina Supreme Court, the state's highest court, disagreed and reinstated the conviction. In June 2017, the U.S. Supreme Court unanimously reversed the North Carolina Supreme Court's judgment. In the majority opinion authored by Justice Anthony Kennedy, the Court held that social media—defined broadly to include Facebook, Amazon.com, The Washington Post, and WebMD, among many others—is a "protected space" under the First Amendment for lawful speech. The Court offered that North Carolina could protect children through less restrictive means, such as prohibiting "conduct that often presages a sexual crime, like contacting a minor or using a website to gather information about a minor". == Background == === North Carolina statute === In 2008, the state of North Carolina passed a law that made it a felony for a registered sex offender "to access a commercial social networking Web site where the sex offender knows that the site permits minor children to become members or to create or maintain personal Web pages". The law defined a "commercial social networking Web site" using four criteria. Specifically, the website must: be "operated by a person who derives revenue from membership fees, advertising, or other sources related to the operation of the Web site". facilitate "the social introduction between two or more persons for the purposes of friendship, meeting other persons, or information exchanges". allow "users to create Web pages or personal profiles that contain information such as the name or nickname of the user, photographs placed on the personal Web page by the user, other personal information about the user, and links to other personal Web pages on the commercial social networking Web site of friends or associates of the user that may be accessed by other users or visitors to the Web site". provide "users or visitors... mechanisms to communicate with other users, such as a message board, chat room, electronic mail, or instant messenger". The law exempted websites that "Provid[e] only one of the following discrete services: photo-sharing, electronic mail, instant messenger, or chat room or message board platform", as well as websites that have as their primary purpose "the facilitation of commercial transactions involving goods or services between [their] members or visitors". === Facts of the case === In 2002, Lester Gerard Packingham was convicted of taking "indecent liberties with a child", a felony that required him to register as a sex offender. A North Carolina court sentenced him to 10–12 months in prison with 24 months of supervised release. He was given no other special instructions on his behavior outside of prison other than to "remain away from" the minor. In 2010, after a state court dismissed a traffic ticket against Packingham, he submitted a post on Facebook under the name "J. R. Gerrard", stating: "Man God is Good! How about I got so much favor they dismissed the ticket before court even started? No fine, no court cost, no nothing spent. . . . . .Praise be to GOD, WOW! Thanks JESUS!" The Durham Police Department identified Packingham as the author of the post after cross-checking the time of the post with recently dismissed traffic tickets, and a grand jury indicted him for violating the North Carolina statute. === Lower court proceedings === Initially, Packingham moved to dismiss his indictment, arguing that it violated the First Amendment. A North Carolina Superior Court judge denied this motion, and he was convicted of violating the North Carolina social media law. Packingham appealed his conviction to the North Carolina Court of Appeals, which reversed the trial court's decision in 2013. Applying intermediate scrutiny, the court of appeals determined that North Carolina's law violated the First Amendment because it was too broad, applying to all registered sex offenders regardless of whether the offender had committed a crime involving a minor or whether the offender was a continuing threat to minors. The appeals court also stated that the law had been defined broadly enough to prohibit a registered sex offender from conducting a wide array of Internet activity, such as "conducting a 'Google' search, purchasing items on Amazon.com, or accessing a plethora of Web sites unrelated to online communication with minors". In 2015, the North Carolina Supreme Court, the state's highest court, reversed the court of appeals, holding that the law was "constitutional in all respects". The North Carolina Supreme Court found that the statute was a "limitation on conduct" and did not impede any free speech. The state had a vested interest in “forestalling the illicit lurking and contact of minors” by registered sex offenders and potential future victims, and upheld Packingham's conviction. == Supreme Court ruling == Packingham filed a petition for a writ of certiorari with the Supreme Court of the United States. The federal government also filed a brief recommending that the Supreme Court grant certiorari, arguing that the North Carolina Supreme Court incorrectly decided the case in favor of the state. The U.S. Supreme Court granted certiorari in October 2016. Amicus briefs in support of Packingham were filed by the libertarian Cato Institute and the American Civil Liberties Union. The North Carolina Supreme Court filed a brief supporting its prior decision, urging the importance of protecting minors from being stalked online. === Oral argument === The oral argument took place in February 2017. Packingham’s lawyer, David T. Goldberg, argued that the law banned “vast swaths of First Amendment activity”, went too far in restricting which Internet sites could be accessed, and forbade use of the Internet in general. The law targeted speech on some of the platforms that Americans use most often, Goldberg noted, and that under the law Packingham could not even use Twitter to read the myriad messages discussing his own case. He further noted that the law imposes punishment without regard to whether the offender actually did anything wrong. North Carolina’s senior deputy Attorney General, Robert C. Montgomery, argued for the state, and claimed that communication through social media sites is a “crucial channel”. Justice Sonia Sotomayor asked Montgomery to provide evidence as to the claim that by giving Packingham Internet privileges, he would commit another crime. Justice Stephen Breyer added that “It seems to be well-settled law that the state can’t (bar usage) unless there is a 'clear and present danger'." === Opinion of the Court === In June 2017 the Supreme Court delivered a judgment in favor of Packingham, unanimously voting to reverse the state court's ruling. Justice Anthony Kennedy authored the decision, joined by Justice Ginsburg, Justice Breyer, Justice Sotomayor, and Justice Kagan. Kennedy explained the decision: "A fundamental principle of the First Amendment is that all persons have access to places where they can speak and listen, and then, after reflection, speak and listen once more." He continued that "By prohibiting sex offenders from using those websites, North Carolina with one broad stroke bars access to what for many are the principal sources for knowing current events, checking ads for employment, speaking and listening in the modern public square, and otherwise exploring the vast realms of human thought and knowledge." Citing Ashcroft v. Free Speech Coalition as a precedent, Kennedy also wrote: "It is well established that, as a general rule, the Government 'may not suppress lawful speech as the means to suppress unlawful speech'." === Concurring opinion === Justice Samuel Alito wrote an opinion concurring in the judgment, joined by John Roberts and Clarence Thomas. While Alito agreed that the state statute at issue violated the First Amendment, he noted that there are reasonable scenarios for which legal bans for sex offenders can be placed, such as for sites targeted at teenagers. Justice Gorsuch took no part in the decision of the case. == Impact == Packingham v. North Carolina was one of the first U.S. Supreme Court cases to ana

    Read more →
  • Signal-to-crosstalk ratio

    Signal-to-crosstalk ratio

    The signal-to-crosstalk ratio at a specified point in a circuit is the ratio of the power of the wanted signal to the power of the unwanted signal from another channel. The signals are adjusted in each channel so that they are of equal power at the zero transmission level point in their respective channels. The signal-to-crosstalk ratio is usually expressed in dB.

    Read more →