AI Data Quality Analyst Jobs

AI Data Quality Analyst Jobs — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • The 2028 Global Intelligence Crisis

    The 2028 Global Intelligence Crisis

    The 2028 Global Intelligence Crisis is a report authored by James van Geelen and Alap Shah and published by Citrini Research in February 2026, on the impact of artificial intelligence on humanity's future. Written in the form of a scenario analysis, it was viewed millions of times online and reportedly caused a fall in the stock market prices of major tech and financial firms. It also received criticism among others, for its allegedly flawed economic logic. The 'thought exercise', as the authors called it, painted a gloomy picture for the near future, where outputs keep growing while consumer's ability to spend collapses. "...driven by ai agents that don’t sleep, take sick days or require health insurance”, "outputs that are shown in national accounts increases, "but never circulates through the real economy"(which the report calls 'Ghost GDP'), the authors argued. In other words, the authors predict a scenario where the owners of the AI firms will accumulate a vast fortune but there will be scant demand from consumers as AI would cause massive unemployment. The authors caution the reader that what they make is a scenario and not a prediction. In the scenario they visualise, any service whose value proposition is “I will navigate complexity that you find tedious” is getting disrupted. The reports argues that the unique ability of human beings to analyse, decide, create, persuade, and coordinate was “the thing that could not be replicated at scale,” and call the historical scarcity of this precious entity 'friction'. When this friction becomes zero, a gamut of changes occur which then triggers a cascading of changes across the economy. ”Travel booking platforms are an early casualty; Financial advice. tax prep., and routine legal work follow suit. National unemployment rate go as high 10.2% and the S&P 500 goes for a massive 38% peak-to-trough crash. In contrast to the previous technological revolutions the high-earning professionals suffers more and get forced to take up roles in the gig economy. Labour supply becomes abundant and this cuts wages all across the economy. The dent in income for the employees then affects other sectors of the economy such as the residential mortgage market. The losses for the software companies triggers loan defaults and heralds peril for the private credit sector.

    Read more →
  • Whitehead's algorithm

    Whitehead's algorithm

    Whitehead's algorithm is a mathematical algorithm in group theory for solving the automorphic equivalence problem in the finite rank free group Fn. The algorithm is based on a classic 1936 paper of J. H. C. Whitehead. It is still unknown (except for the case n = 2) if Whitehead's algorithm has polynomial time complexity. == Statement of the problem == Let F n = F ( x 1 , … , x n ) {\displaystyle F_{n}=F(x_{1},\dots ,x_{n})} be a free group of rank n ≥ 2 {\displaystyle n\geq 2} with a free basis X = { x 1 , … , x n } {\displaystyle X=\{x_{1},\dots ,x_{n}\}} . The automorphism problem, or the automorphic equivalence problem for F n {\displaystyle F_{n}} asks, given two freely reduced words w , w ′ ∈ F n {\displaystyle w,w'\in F_{n}} whether there exists an automorphism φ ∈ Aut ⁡ ( F n ) {\displaystyle \varphi \in \operatorname {Aut} (F_{n})} such that φ ( w ) = w ′ {\displaystyle \varphi (w)=w'} . Thus the automorphism problem asks, for w , w ′ ∈ F n {\displaystyle w,w'\in F_{n}} whether Aut ⁡ ( F n ) w = Aut ⁡ ( F n ) w ′ {\displaystyle \operatorname {Aut} (F_{n})w=\operatorname {Aut} (F_{n})w'} . For w , w ′ ∈ F n {\displaystyle w,w'\in F_{n}} one has Aut ⁡ ( F n ) w = Aut ⁡ ( F n ) w ′ {\displaystyle \operatorname {Aut} (F_{n})w=\operatorname {Aut} (F_{n})w'} if and only if Out ⁡ ( F n ) [ w ] = Out ⁡ ( F n ) [ w ′ ] {\displaystyle \operatorname {Out} (F_{n})[w]=\operatorname {Out} (F_{n})[w']} , where [ w ] , [ w ′ ] {\displaystyle [w],[w']} are conjugacy classes in F n {\displaystyle F_{n}} of w , w ′ {\displaystyle w,w'} accordingly. Therefore, the automorphism problem for F n {\displaystyle F_{n}} is often formulated in terms of Out ⁡ ( F n ) {\displaystyle \operatorname {Out} (F_{n})} -equivalence of conjugacy classes of elements of F n {\displaystyle F_{n}} . For an element w ∈ F n {\displaystyle w\in F_{n}} , | w | X {\displaystyle |w|_{X}} denotes the freely reduced length of w {\displaystyle w} with respect to X {\displaystyle X} , and ‖ w ‖ X {\displaystyle \|w\|_{X}} denotes the cyclically reduced length of w {\displaystyle w} with respect to X {\displaystyle X} . For the automorphism problem, the length of an input w {\displaystyle w} is measured as | w | X {\displaystyle |w|_{X}} or as ‖ w ‖ X {\displaystyle \|w\|_{X}} , depending on whether one views w {\displaystyle w} as an element of F n {\displaystyle F_{n}} or as defining the corresponding conjugacy class [ w ] {\displaystyle [w]} in F n {\displaystyle F_{n}} . == History == The automorphism problem for F n {\displaystyle F_{n}} was algorithmically solved by J. H. C. Whitehead in a classic 1936 paper, and his solution came to be known as Whitehead's algorithm. Whitehead used a topological approach in his paper. Namely, consider the 3-manifold M n = # i = 1 n S 2 × S 1 {\displaystyle M_{n}=\#_{i=1}^{n}\mathbb {S} ^{2}\times \mathbb {S} ^{1}} , the connected sum of n {\displaystyle n} copies of S 2 × S 1 {\displaystyle \mathbb {S} ^{2}\times \mathbb {S} ^{1}} . Then π 1 ( M n ) ≅ F n {\displaystyle \pi _{1}(M_{n})\cong F_{n}} , and, moreover, up to a quotient by a finite normal subgroup isomorphic to Z 2 n {\displaystyle \mathbb {Z} _{2}^{n}} , the mapping class group of M n {\displaystyle M_{n}} is equal to Out ⁡ ( F n ) {\displaystyle \operatorname {Out} (F_{n})} ; see. Different free bases of F n {\displaystyle F_{n}} can be represented by isotopy classes of "sphere systems" in M n {\displaystyle M_{n}} , and the cyclically reduced form of an element w ∈ F n {\displaystyle w\in F_{n}} , as well as the Whitehead graph of [ w ] {\displaystyle [w]} , can be "read-off" from how a loop in general position representing [ w ] {\displaystyle [w]} intersects the spheres in the system. Whitehead moves can be represented by certain kinds of topological "swapping" moves modifying the sphere system. Subsequently, Rapaport, and later, based on her work, Higgins and Lyndon, gave a purely combinatorial and algebraic re-interpretation of Whitehead's work and of Whitehead's algorithm. The exposition of Whitehead's algorithm in the book of Lyndon and Schupp is based on this combinatorial approach. Culler and Vogtmann, in their 1986 paper that introduced the Outer space, gave a hybrid approach to Whitehead's algorithm, presented in combinatorial terms but closely following Whitehead's original ideas. == Whitehead's algorithm == Our exposition regarding Whitehead's algorithm mostly follows Ch.I.4 in the book of Lyndon and Schupp, as well as. === Overview === The automorphism group Aut ⁡ ( F n ) {\displaystyle \operatorname {Aut} (F_{n})} has a particularly useful finite generating set W {\displaystyle {\mathcal {W}}} of Whitehead automorphisms or Whitehead moves. Given w , w ′ ∈ F n {\displaystyle w,w'\in F_{n}} the first part of Whitehead's algorithm consists of iteratively applying Whitehead moves to w , w ′ {\displaystyle w,w'} to take each of them to an "automorphically minimal" form, where the cyclically reduced length strictly decreases at each step. Once we find automorphically these minimal forms u , u ′ {\displaystyle u,u'} of w , w ′ {\displaystyle w,w'} , we check if ‖ u ‖ X = ‖ u ′ ‖ X {\displaystyle \|u\|_{X}=\|u'\|_{X}} . If ‖ u ‖ X ≠ ‖ u ′ ‖ X {\displaystyle \|u\|_{X}\neq \|u'\|_{X}} then w , w ′ {\displaystyle w,w'} are not automorphically equivalent in F n {\displaystyle F_{n}} . If ‖ u ‖ X = ‖ u ′ ‖ X {\displaystyle \|u\|_{X}=\|u'\|_{X}} , we check if there exists a finite chain of Whitehead moves taking u {\displaystyle u} to u ′ {\displaystyle u'} so that the cyclically reduced length remains constant throughout this chain. The elements w , w ′ {\displaystyle w,w'} are not automorphically equivalent in F n {\displaystyle F_{n}} if and only if such a chain exists. Whitehead's algorithm also solves the search automorphism problem for F n {\displaystyle F_{n}} . Namely, given w , w ′ ∈ F n {\displaystyle w,w'\in F_{n}} , if Whitehead's algorithm concludes that Aut ⁡ ( F n ) w = Aut ⁡ ( F n ) w ′ {\displaystyle \operatorname {Aut} (F_{n})w=\operatorname {Aut} (F_{n})w'} , the algorithm also outputs an automorphism φ ∈ Aut ⁡ ( F n ) {\displaystyle \varphi \in \operatorname {Aut} (F_{n})} such that φ ( w ) = w ′ {\displaystyle \varphi (w)=w'} . Such an element φ ∈ Aut ⁡ ( F n ) {\displaystyle \varphi \in \operatorname {Aut} (F_{n})} is produced as the composition of a chain of Whitehead moves arising from the above procedure and taking w {\displaystyle w} to w ′ {\displaystyle w'} . === Whitehead automorphisms === A Whitehead automorphism, or Whitehead move, of F n {\displaystyle F_{n}} is an automorphism τ ∈ Aut ⁡ ( F n ) {\displaystyle \tau \in \operatorname {Aut} (F_{n})} of F n {\displaystyle F_{n}} of one of the following two types: There is a permutation σ ∈ S n {\displaystyle \sigma \in S_{n}} of { 1 , 2 , … , n } {\displaystyle \{1,2,\dots ,n\}} such that for i = 1 , … , n {\displaystyle i=1,\dots ,n} τ ( x i ) = x σ ( i ) ± 1 {\displaystyle \tau (x_{i})=x_{\sigma (i)}^{\pm 1}} Such τ {\displaystyle \tau } is called a Whitehead automorphism of the first kind. There is an element a ∈ X ± 1 {\displaystyle a\in X^{\pm 1}} , called the multiplier, such that for every x ∈ X ± 1 {\displaystyle x\in X^{\pm 1}} τ ( x ) ∈ { x , x a , a − 1 x , a − 1 x a } . {\displaystyle \tau (x)\in \{x,xa,a^{-1}x,a^{-1}xa\}.} Such τ {\displaystyle \tau } is called a Whitehead automorphism of the second kind. Since τ {\displaystyle \tau } is an automorphism of F n {\displaystyle F_{n}} , it follows that τ ( a ) = a {\displaystyle \tau (a)=a} in this case. Often, for a Whitehead automorphism τ ∈ Aut ⁡ ( F n ) {\displaystyle \tau \in \operatorname {Aut} (F_{n})} , the corresponding outer automorphism in Out ⁡ ( F n ) {\displaystyle \operatorname {Out} (F_{n})} is also called a Whitehead automorphism or a Whitehead move. ==== Examples ==== Let F 4 = F ( x 1 , x 2 , x 3 , x 4 ) {\displaystyle F_{4}=F(x_{1},x_{2},x_{3},x_{4})} . Let τ : F 4 → F 4 {\displaystyle \tau :F_{4}\to F_{4}} be a homomorphism such that τ ( x 1 ) = x 2 x 1 , τ ( x 2 ) = x 2 , τ ( x 3 ) = x 2 x 3 x 2 − 1 , τ ( x 4 ) = x 4 {\displaystyle \tau (x_{1})=x_{2}x_{1},\quad \tau (x_{2})=x_{2},\quad \tau (x_{3})=x_{2}x_{3}x_{2}^{-1},\quad \tau (x_{4})=x_{4}} Then τ {\displaystyle \tau } is actually an automorphism of F 4 {\displaystyle F_{4}} , and, moreover, τ {\displaystyle \tau } is a Whitehead automorphism of the second kind, with the multiplier a = x 2 − 1 {\displaystyle a=x_{2}^{-1}} . Let τ ′ : F 4 → F 4 {\displaystyle \tau ':F_{4}\to F_{4}} be a homomorphism such that τ ′ ( x 1 ) = x 1 , τ ′ ( x 2 ) = x 1 − 1 x 2 x 1 , τ ′ ( x 3 ) = x 1 − 1 x 3 x 1 , τ ′ ( x 4 ) = x 1 − 1 x 4 x 1 {\displaystyle \tau '(x_{1})=x_{1},\quad \tau '(x_{2})=x_{1}^{-1}x_{2}x_{1},\quad \tau '(x_{3})=x_{1}^{-1}x_{3}x_{1},\quad \tau '(x_{4})=x_{1}^{-1}x_{4}x_{1}} Then τ ′ {\displaystyle \tau '} is actually an inner automorphism of F 4 {\displaystyle F_{4}} given by conjugation by x 1 {\displaystyle x_{1}} , and, moreover, τ ′ {\displaystyle \

    Read more →
  • Education by algorithm

    Education by algorithm

    Education by algorithm refers to automated solutions that algorithmic agents or social bots offer to education, to assist with mundane educational tasks. These are often instrumentalist “educational reforms” or “curriculum transformations”, which have been implemented by policy makers and are supported by proprietary education technologies. New educational policies, mandated by transnational governance forums (like the OECD), have manufactured a connection between economies and education. Governments, schools and universities are expected to introduce or prepare students for an “unknown future”, to “future proof” them against an identified issue or to mitigate a national crisis. Technologies are seen as a catalyst to effect these changes. However, these policies mask a deeper problem, which include the assetization of education and the use of technologies as a means for surveillance and behavior modification. The traces that students and leave, through cookies, logins learning activities, assignments and tests, are collected, facetted, and shared with commercial organizations by these agents, to both predict future behavior and shape it. Techno solutionist thinking has led to managers adopting educational policies and reforms, and looking towards technologies to act as disrupters, liberators or agents to improve efficiency. During the COVID-19 pandemic, many more students had to modify their learning and working circumstances to protect themselves. Academics shifted their assessment practices from the dominant assessment of learning paradigm to an orientation that saw value in "assessment for learning". Big tech assisted, and teaching infrastructure became further privatized, and unbundling of education provision went a step further. Following the return to class, this assessment paradigm became rationalised in education. Leaving the space for algorithmic agents to step in. Academics work was increasingly driven by learning experience platforms and student understanding was extended through interleaving, behavior modification nudges and rewards and scheduled high stakes assessments. This data collection may also be construed as surveillance., or perceived as evidence of a Fourth Industrial Revolution

    Read more →
  • Anyword

    Anyword

    Anyword is a technology company that offers an artificial intelligence platform, using natural language processing to generate and optimize marketing text for websites, social media, email, and ads. The company also offers a complete managed service to publishers and brands to help them increase their revenue through social ads. It is used by National Geographic, Red Bull, The New York Times, BBC, Ted Baker, etc. The company has an office in New York, and Tel Aviv. == History == It was founded in 2013 — its original name was Keywee Inc. In March 2015, Anyword received $9.1 million in the Series A funding round led by a notable group of investors. In July 2016, the company was selected as an official Facebook Marketing Partner. In August 2019, Anyword was named Best Content Marketing Platform in the Digiday Technology Award winners. In November 2021, it raised $21 million in its Series B funding round.

    Read more →
  • ActivTrak

    ActivTrak

    ActivTrak is an American company that produces workforce analytics and productivity software. The company was founded in 2009 by Birch Grove Software and is headquartered in Austin, Texas. The company has raised US$77.5 million in funding and is backed by Sapphire Ventures and Elsewhere Partners. == History == ActivTrak was founded in 2009 by Herb Axilrod and Anton Seidler in Dallas, Texas. ActivTrak's first on-demand software product launched in 2012, and the workforce analytics platform launched in 2015. It uses data sourced from more than 9,500 customers and 900,000 users. In 2019, ActivTrak raised $20 million in a Series A round of funding with Elsewhere Partners, a growth-stage venture capital firm that principally invests in B2B startups. Rita Selvaggi assumed the role of CEO. In 2020, ActivTrak raised $50M in a Series B round of funding with Sapphire Ventures and Elsewhere Partners. The company also introduced the ActivTrak Productivity Lab, an online resource about workforce productivity research, industry benchmark data, and best practices. == Product == ActivTrak is a workforce analytics and productivity platform that uses reports, dashboards, and data analysis. The platform uses machine learning (AI) to collect and analyze user activity data and produce reports about workforce productivity. The software runs on Microsoft Windows, Mac, Chrome, Terminal Services, and VDI. It includes the ActivTrak Agent, which runs in the background and collects data. It responds to user activity, sensing mouse and keyboard movement in the active window(s) of the user's device. This data is collected and stored in a database that aggregates the data based on the user's request. ActivTrak does not utilize keystroke logging, content scraping, camera access, video recording or mobile device monitoring. The database leverages data analytics to generate account and team benchmarks, and identify productivity patterns and outliers. == Awards == Built In, 100 Best Midsize Places to Work in Austin, 2025 G2, Winter: Best Estimated ROI, High Performer, Best Relationship, Best Support, Users Most Likely to Recommend, Easiest Setup, Easiest Admin, Best Meets Requirements, Users Love Us, 2025 TrustRadius, Buyer’s Choice, 2025 Deloitte Technology Fast 500, No. 468 Fastest-Growing Company, 2024 Product Marketing Alliance, AI Marketing Innovation, 2024 Fortune Best Workplaces in Technology™, 2024 Inc. 5000, No. 2335 of America’s Fastest-Growing Private Companies, 2024 Fortune Best Workplaces in Texas™, 2024 Reworked IMPACT Gold Award: Most Innovative Workplace Productivity Solution, 2024 TrustRadius, Most Loved, 2024 Great Place To Work-Certified™, 2024 Inc. 5000 Regionals: Southwest, 2024 Brandon Hall Group, Best Advance in HR Predictive Analytics Technology, 2024

    Read more →
  • Physical schema

    Physical schema

    A physical data model (or database design) is a representation of a data design as implemented, or intended to be implemented, in a database management system. In the lifecycle of a project it typically derives from a logical data model, though it may be reverse-engineered from a given database implementation. A complete physical data model will include all the database artifacts required to create relationships between tables or to achieve performance goals, such as indexes, constraint definitions, linking tables, partitioned tables or clusters. Analysts can usually use a physical data model to calculate storage estimates; it may include specific storage allocation details for a given database system. As of 2012 seven main databases dominate the commercial marketplace: Informix, Oracle, Postgres, SQL Server, Sybase, IBM Db2 and MySQL. Other RDBMS systems tend either to be legacy databases or used within academia such as universities or further education colleges. Physical data models for each implementation would differ significantly, not least due to underlying operating-system requirements that may sit underneath them. For example: SQL Server runs only on Microsoft Windows operating-systems (Starting with SQL Server 2017, SQL Server runs on Linux. It's the same SQL Server database engine, with many similar features and services regardless of your operating system), while Oracle and MySQL can run on Solaris, Linux and other UNIX-based operating-systems as well as on Windows. This means that the disk requirements, security requirements and many other aspects of a physical data model will be influenced by the RDBMS that a database administrator (or an organization) chooses to use. == Physical schema == Physical schema is a term used in data management to describe how data is to be represented and stored (files, indices, etc.) in secondary storage using a particular database management system (DBMS) (e.g., Oracle RDBMS, Sybase SQL Server, etc.). In the ANSI/SPARC Architecture three schema approach, the internal schema is the view of data that involved data management technology. This is as opposed to an external schema that reflects an individual's view of the data, or the conceptual schema that is the integration of a set of external schemas. The logical schema was the way data were represented to conform to the constraints of a particular approach to database management. At that time the choices were hierarchical and network. Describing the logical schema, however, still did not describe how physically data would be stored on disk drives. That is the domain of the physical schema. Now logical schemas describe data in terms of relational tables and columns, object-oriented classes, and XML tags. A single set of tables, for example, can be implemented in numerous ways, up to and including an architecture where table rows are maintained on computers in different countries.

    Read more →
  • Operational system

    Operational system

    An operational system is a term used in data warehousing to refer to a system that is used to process the day-to-day transactions of an organization. These systems are designed in a manner that processing of day-to-day transactions is performed efficiently and the integrity of the transactional data is preserved. == Synonyms == Sometimes operational systems are referred to as operational databases, transaction processing systems, or online transaction processing systems (OLTP). However, the use of the last two terms as synonyms may be confusing, because operational systems can be batch processing systems as well. Any enterprise must necessarily maintain a lot of data about its operation.

    Read more →
  • Semantic translation

    Semantic translation

    Semantic translation is the process of using semantic information to aid in the translation of data in one representation or data model to another representation or data model. Semantic translation takes advantage of semantics that associate meaning with individual data elements in one dictionary to create an equivalent meaning in a second system. An example of semantic translation is the conversion of XML data from one data model to a second data model using formal ontologies for each system such as the Web Ontology Language (OWL). This is frequently required by intelligent agents that wish to perform searches on remote computer systems that use different data models to store their data elements. The process of allowing a single user to search multiple systems with a single search request is also known as federated search. Semantic translation should be differentiated from data mapping tools that do simple one-to-one translation of data from one system to another without actually associating meaning with each data element. Semantic translation requires that data elements in the source and destination systems have "semantic mappings" to a central registry or registries of data elements. The simplest mapping is of course where there is equivalence. There are three types of Semantic equivalence: Class Equivalence - indicating that class or "concepts" are equivalent. For example: "Person" is the same as "Individual" Property Equivalence - indicating that two properties are equivalent. For example: "PersonGivenName" is the same as "FirstName" Instance Equivalence - indicating that two individual instances of objects are equivalent. For example: "Dan Smith" is the same person as "Daniel Smith" Semantic translation is very difficult if the terms in a particular data model do not have direct one-to-one mappings to data elements in a foreign data model. In that situation, an alternative approach must be used to find mappings from the original data to the foreign data elements. This problem can be alleviated by centralized metadata registries that use the ISO-11179 standards such as the National Information Exchange Model (NIEM).

    Read more →
  • Frameserver

    Frameserver

    A frameserver is any program that acts as a media source in the process called frameserving, which transfers digital video data from one computer program to another without intermediate files. The program that receives the data – the frameclient – could be any type of video application. The process is controlled by the frameclient: the frameclient requests audio/video frames and the frameserver serves them. The client can request frames in any order, allowing it to pause or jump to an arbitrary frame, just as a media player does with a file on disk. The client is most commonly a media encoder, a non-linear editing system, or a media player. == Frameservers == AviSynth VirtualDub VapourSynth Debugmode FrameServer

    Read more →
  • Information literacy

    Information literacy

    The Association of College and Research Libraries defines information literacy as a "set of integrated abilities encompassing the reflective discovery of information, the understanding of how information is produced and valued and the use of information in creating new knowledge and participating ethically in communities of learning". In the United Kingdom, the Chartered Institute of Library and Information Professionals' definition also makes reference to knowing both "when" and "why" information is needed. The 1989 American Library Association (ALA) Presidential Committee on Information Literacy formally defined information literacy (IL) as attributes of an individual, stating that "to be information literate, a person must be able to recognize when information is needed and have the ability to locate, evaluate and use effectively the needed information". In 1990, academic Lori Arp published a paper asking, "Are information literacy instruction and bibliographic instruction the same?" Arp argued that neither term was particularly well defined by theoreticians or practitioners in the field. Further studies were needed to lessen the confusion and continue to articulate the parameters of the question. The Alexandria Proclamation of 2005 defined the term as a human rights issue: "Information literacy empowers people in all walks of life to seek, evaluate, use and create information effectively to achieve their personal, social, occupational and educational goals. It is a basic human right in a digital world and promotes social inclusion in all nations." The United States National Forum on Information Literacy defined information literacy as "the ability to know when there is a need for information, to be able to identify, locate, evaluate, and effectively use that information for the issue or problem at hand." Meanwhile, in the UK, the library professional body CILIP, define information literacy as "the ability to think critically and make balanced judgements about any information we find and use. It empowers us as citizens to develop informed views and to engage fully with society." A number of other efforts have been made to better define the concept and its relationship to other skills and forms of literacy. Other pedagogical outcomes related to information literacy include traditional literacy, computer literacy, research skills and critical thinking skills. Information literacy as a sub-discipline is an emerging topic of interest and counter measure among educators and librarians with the prevalence of misinformation, fake news, and disinformation. Scholars have argued that in order to maximize people's contributions to a democratic and pluralistic society, educators should be challenging governments and the business sector to support and fund educational initiatives in information literacy. == History == The phrase "information literacy" first appeared in print in a 1974 report written on behalf of the National Commission on Libraries and Information Science by Paul G. Zurkowski, who was at the time president of the Information Industry Association (now the Software and Information Industry Association). Zurkowski used the phrase to describe the "techniques and skills" learned by the information literate "for utilizing the wide range of information tools as well as primary sources in molding information solutions to their problems" and drew a relatively firm line between the "literates" and "information illiterates." The concept of information literacy appeared again in a 1976 paper by Lee Burchina presented at the Texas A&M University library's symposium. Burchina identified a set of skills needed to locate and use information for problem solving and decision making. In another 1976 article in Library Journal, M.R. Owens applied the concept to political information literacy and civic responsibility, stating, "All [people] are created equal but voters with information resources are in a position to make more intelligent decisions than citizens who are information illiterates. The application of information resources to the process of decision-making to fulfill civic responsibilities is a vital necessity." In a literature review published in an academic journal in 2020, Oral Roberts University professor Angela Sample cites several conceptual waves of information literacy definitions as defining information as a way of thinking, a set of skills, and a social practice. The introduction of these concepts led to the adoption of a mechanism called metaliteracy and the creation of threshold concepts and knowledge dispositions, which led to the creation of the ALA's Information Literacy Framework. The American Library Association's Presidential Committee on Information Literacy released a report on January 10, 1989. Titled as the Presidential Committee on Information Literacy: Final Report, the article outlines the importance of information literacy, opportunities to develop it, and the idea of an Information Age School. The recommendations of the Committee led to establishment of the National Forum on Information Literacy, a coalition of more than 90 national and international organizations. In 1998, the American Association of School Librarians and the Association for Educational Communications and Technology published Information Power: Building Partnerships for Learning, which further established specific goals for information literacy education, defining some nine standards in the categories of "information literacy," "independent learning," and "social responsibility." Also in 1998, the Presidential Committee on Information Literacy updated its final report. The report outlined six recommendations from the original report, and examined areas of challenge and progress. In 1999, the Society of College, National and University Libraries (SCONUL) in the UK published The Seven Pillars of Information Literacy to model the relationship between information skills and IT skills, and the idea of the progression of information literacy into the curriculum of higher education. In 2003, the National Forum on Information Literacy, along with UNESCO and the National Commission on Libraries and Information Science, sponsored an international conference in Prague. Representatives from twenty-three countries gathered to discuss the importance of information literacy in a global context. The resulting Prague Declaration described information literacy as a "key to social, cultural, and economic development of nations and communities, institutions and individuals in the 21st century" and declared its acquisition as "part of the basic human right of lifelong learning". In the United States specifically, information literacy was prioritized in 2009 during President Barack Obama's first term. In effort to stress the value information literacy has on everyday communication, he designated October as National Information Literacy Awareness Month in his released proclamation. In 2015, the Association of College and Research Libraries (ACRL) adopted the Framework for Information Literacy for Higher Education, which defines information literacy as "the set of integrated abilities encompassing the reflective discovery of information, the understanding of how information is produced and valued, and the use of information in creating new knowledge and participating ethically in communities of learning".Association of College and Research Libraries (2015-02-09). "Framework for Information Literacy for Higher Education". Association of College and Research Libraries. American Library Association. Retrieved 2026-02-17. == Presidential Committee on Information Literacy == The American Library Association's Presidential Committee on Information Literacy defined information literacy as the ability "to recognize when information is needed and have the ability to locate, evaluate, and use effectively the needed information" and highlighted information literacy as a skill essential for lifelong learning and the production of an informed and prosperous citizenry. The committee outlined six principal recommendations. Included were recommendations like "Reconsider the ways we have organized information institutionally, structured information access, and defined information's role in our lives at home in the community, and in the work place"; to promote "public awareness of the problems created by information illiteracy"; to develop a national research agenda related to information and its use; to ensure the existence of "a climate conducive to students' becoming information literate"; to include information literacy concerns in teacher education democracy. In the updated report, the committee ended with an invitation, asking the National Forum and regular citizens to recognize that "the result of these combined efforts will be a citizenry which is made up of effective lifelong learners who can always find the information needed for the issue or decision at hand. This new

    Read more →
  • Social information architecture

    Social information architecture

    Social information architecture, also known as social iA, is a sub-domain of information architecture which deals with the social aspects of conceptualizing, modeling and organizing information. It has become more relevant because of the rise of social media and Web 2.0 in recent times. == Approach == There are different approaches to the explanation of social information architecture. === Architecture model (internal space) === Architects designing a physical community space, have to consider how the architecture will shape social interactions. A long hallway of offices creates an utterly different dynamic than desks with arranged in an open space. One might foster individuality, privacy, propriety; the other: collaboration, distraction, communalism. Still, physical spaces can be flexibly repurposed and worked around if the inhabitants desire a social dynamic not instantly afforded by the space. Office doors can be left open to invite easier interaction. Partitions can be raised between adjacent desks to limit distraction and increase privacy. That's physical architecture. The information architectures of online communities are far more deterministic and far less flexible. They literally define the social architecture by pre-specifying in immutable computer code what information you have access to, who you can talk to, where you can go. In the online world, information architecture = social architecture. === Social dialogue and information model (external space) === All major brands use information architecture to market their products online, it is then commonly wrapped under the umbrella phrase 'digital strategy'. Information architecture used for strategic purposes encompasses brand SEO, strategic placement of virals, social media presence etc. Charities, news outlets and social dialogue forums can make a much more specific use of the same tools for positive and important social purposes. Social Information Architecture is perceived as the socially conscious wing of commercial information architecture and function to exchange information and ideas between people and groups. Social iA can pick up on conflicting issues that are treated with misunderstanding between cultures and leaves individuals and societies vulnerable to exploitation and manipulation. Since the net has such a far reach it is obvious to use it for meaningful and coordinated social dialogue. Example of such issues are faith, environment, politics, climate change, war, injustice and other social challenges. Information architecture can help create frameworks in which sharing information brings people together, inspires and encourages them to participate in a forward thinking and unfragmented way. One of its core activities is to spread messages that bring people from opposite sites of social and cultural spectrums together and to confront uncomfortable subject head on. == How does social information architecture work? == Social iA utilizes a variety of Web2.0 applications to filter relevant or valuable information and weave them in appropriate information repository or provide feedback to interesting channels. Social iA makes strategic use of Search Engines, Social Media, Google Algorithms, as well as websites, video & news channels. It ‘reads’ or 'listens' to social conversations and search engine queries and engages with the net actively to gather clues about the world's pulse on the internet. It assesses data, social & political trends, and respond with targeted campaigns to give people ideas, as well as help people with making sense of information. == Principals == Dan Brown in his paper 8 Principals of Social Information Architecture enlists the following principals: 1. The principle of objects: Treat content as a living, breathing thing, with a lifecycle, behaviors and attributes. 2. The principle of choices: Create pages that offer meaningful choices to users, keeping the range of choices available focused on a particular task. 3. The principle of disclosure: Show only enough information to help people understand what kinds of information they'll find as they dig deeper. 4. The principle of exemplars: Describe the contents of categories by showing examples of the contents. 5. The principle of front doors: Assume at least half of the website's visitors will come through some page other than the home page. 6. The principle of multiple classification: Offer users several different classification schemes to browse the site's content. 7. The principle of focused navigation: Don't mix apples and oranges in your navigation scheme. 8. The principle of growth: Assume the content you have today is a small fraction of the content you will have tomorrow. == What can social information architecture achieve? == Social information architecture has many potentials in terms of fostering social connections and how information is shared in social spaces on the web.

    Read more →
  • Digital artifact

    Digital artifact

    Digital artifact in information science, is any undesired or unintended alteration in data introduced in a digital process by an involved technique and/or technology. Digital artifact can be of any content types including text, audio, video, image, animation or a combination. == Information science == In information science, digital artifacts result from: Hardware malfunction: In computer graphics, visual artifacts may be generated whenever a hardware component such as the processor, memory chip, cabling malfunctions, etc., corrupts data. Examples of malfunctions include physical damage, overheating, insufficient voltage and GPU overclocking. Common types of hardware artifacts are texture corruption and T-vertices in 3D graphics, and pixelization in MPEG compressed video. Software malfunction: Artifacts may be caused by algorithm flaws such as decoding/encoding audio or video, or a poor pseudo-random number generator that would introduce artifacts distinguishable from the desired noise into statistical models. Compression: Controlled amounts of unwanted information may be generated as a result of the use of lossy compression techniques. One example is the artifacts seen in JPEG and MPEG compression algorithms that produce compression artifacts. Quantization: Digital imprecision generated in the process of converting analog information into digital space, is due to the limited granularity of digital numbering space. In computer graphics, quantization is seen as pixelation. Aliasing: As a consequence of sampling or sample-rate conversion, energy from frequencies outside of the signal frequency band of interest are folded across multiples of the Nyquist frequency. This is typically mitigated by using an anti-aliasing filter. Filtering: The process of filtering a signal, such as using an anti-aliasing filter, causes undesired alterations to the signal due to imperfections in the frequency response magnitude and phase, and due to the time domain impulse response. Rolling shutter, the line scanning of an object that is moving too fast for the image sensor to capture a unitary image. Error diffusion: poorly-weighted kernel coefficients result in undesirable visual artifacts.

    Read more →
  • Domain adaptation

    Domain adaptation

    Domain adaptation is a field associated with machine learning and transfer learning. It addresses the challenge of training a model on one data distribution (the source domain) and applying it to a related but different data distribution (the target domain). A common example is spam filtering, where a model trained on emails from one user (source domain) is adapted to handle emails for another user with significantly different patterns (target domain). Domain adaptation techniques can also leverage unrelated data sources to improve learning. When multiple source distributions are involved, the problem extends to multi-source domain adaptation. Domain adaptation is a specific type of transfer learning. According to the taxonomy laid out by Pan and Yang (2010), it falls into the category of transductive transfer learning. In this setting, the source and target tasks are the same (e.g., both are object recognition), but the domains differ (different marginal distributions). This distinguishes it from inductive transfer learning (where labeled data is available for the target task) and unsupervised transfer learning (where labels are unavailable in both domains). == Classification of domain adaptation problems == Domain adaptation setups are classified in two different ways: according to the distribution shift between the domains, and according to the available data from the target domain. === Distribution shifts === Common distribution shifts are classified as follows: Covariate Shift occurs when the input distributions of the source and destination change, but the relationship between inputs and labels remains unchanged. The above-mentioned spam filtering example typically falls in this category. Namely, the distributions (patterns) of emails may differ between the domains, but emails labeled as spam in the one domain should similarly be labeled in another. Prior Shift (Label Shift) occurs when the label distribution differs between the source and target datasets, while the conditional distribution of features given labels remains the same. An example is a classifier of hair color in images from Italy (source domain) and Norway (target domain). The proportions of hair colors (labels) differ, but images within classes like blond and black-haired populations remain consistent across domains. A classifier for the Norway population can exploit this prior knowledge of class proportions to improve its estimates. Concept Shift (Conditional Shift) refers to changes in the relationship between features and labels, even if the input distribution remains the same. For instance, in medical diagnosis, the same symptoms (inputs) may indicate entirely different diseases (labels) in different populations (domains). === Data available during training === Domain adaptation problems typically assume that some data from the target domain is available during training. Problems can be classified according to the type of this available data: Unsupervised: Unlabeled data from the target domain is available, but no labeled data. In the above-mentioned example of spam filtering, this corresponds to the case where emails from the target domain (user) are available, but they are not labeled as spam. Domain adaptation methods can benefit from such unlabeled data, by comparing its distribution (patterns) with the labeled source domain data. Semi-supervised: Most data that is available from the target domain is unlabelled, but some labeled data is also available. In the above-mentioned case of spam filter design, this corresponds to the case that the target user has labeled some emails as being spam or not. Supervised: All data that is available from the target domain is labeled. In this case, domain adaptation reduces to refinement of the source domain predictor. In the above-mentioned example classification of hair-color from images, this could correspond to the refinement of a network already trained on a large dataset of labeled images from Italy, using newly available labeled images from Norway. == Formalization == Let X {\displaystyle X} be the input space (or description space) and let Y {\displaystyle Y} be the output space (or label space). The objective of a machine learning algorithm is to learn a mathematical model (a hypothesis) h : X → Y {\displaystyle h:X\to Y} able to attach a label from Y {\displaystyle Y} to an example from X {\displaystyle X} . This model is learned from a learning sample S = { ( x i , y i ) ∈ ( X × Y ) } i = 1 m {\displaystyle S=\{(x_{i},y_{i})\in (X\times Y)\}_{i=1}^{m}} . Usually in supervised learning (without domain adaptation), we suppose that the examples ( x i , y i ) ∈ S {\displaystyle (x_{i},y_{i})\in S} are drawn i.i.d. from a distribution D S {\displaystyle D_{S}} of support X × Y {\displaystyle X\times Y} (unknown and fixed). The objective is then to learn h {\displaystyle h} (from S {\displaystyle S} ) such that it commits the least error possible for labelling new examples coming from the distribution D S {\displaystyle D_{S}} . The main difference between supervised learning and domain adaptation is that in the latter situation we study two different (but related) distributions D S {\displaystyle D_{S}} and D T {\displaystyle D_{T}} on X × Y {\displaystyle X\times Y} . The domain adaptation task then consists of the transfer of knowledge from the source domain D S {\displaystyle D_{S}} to the target one D T {\displaystyle D_{T}} . The goal is then to learn h {\displaystyle h} (from labeled or unlabelled samples coming from the two domains) such that it commits as little error as possible on the target domain D T {\displaystyle D_{T}} . The major issue is the following: if a model is learned from a source domain, what is its capacity to correctly label data coming from the target domain? == Four algorithmic principles == === Reweighting algorithms === The objective is to reweight the source labeled sample such that it "looks like" the target sample (in terms of the error measure considered). === Iterative algorithms === A method for adapting consists in iteratively "auto-labeling" the target examples. The principle is simple: a model h {\displaystyle h} is learned from the labeled examples; h {\displaystyle h} automatically labels some target examples; a new model is learned from the new labeled examples. Note that there exist other iterative approaches, but they usually need target labeled examples. === Search of a common representation space === The goal is to find or construct a common representation space for the two domains. The objective is to obtain a space in which the domains are close to each other while keeping good performances on the source labeling task. This can be achieved through the use of Adversarial machine learning techniques where feature representations from samples in different domains are encouraged to be indistinguishable. === Hierarchical Bayesian Model === The goal is to construct a Bayesian hierarchical model p ( n ) {\displaystyle p(n)} , which is essentially a factorization model for counts n {\displaystyle n} , to derive domain-dependent latent representations allowing both domain-specific and globally shared latent factors. == Software packages == Several compilations of domain adaptation and transfer learning algorithms have been implemented over the past decades: SKADA (Python) ADAPT (Python) TLlib (Python) Domain-Adaptation-Toolbox (MATLAB)

    Read more →
  • Concordance (publishing)

    Concordance (publishing)

    A concordance is an alphabetical list of the principal words used in a book or body of work, listing every instance of each word with its immediate context. Historically, concordances have been compiled only for works of special importance, such as the Vedas, Bible, Qur'an or the works of Shakespeare, James Joyce or classical Latin and Greek authors, because of the time, difficulty, and expense involved in creating a concordance in the pre-computer era. A concordance is more than an index, with additional material such as commentary, definitions and topical cross-indexing which makes producing one a labor-intensive process even when assisted by computers. In the precomputing era, search technology was unavailable, and a concordance offered readers of long works such as the Bible something comparable to search results for every word that they would have been likely to search for. Today, the ability to combine the result of queries concerning multiple terms (such as searching for words near other words) has reduced interest in concordance publishing. In addition, mathematical techniques such as latent semantic indexing have been proposed as a means of automatically identifying linguistic information based on word context. A bilingual concordance is a concordance based on aligned parallel text. A topical concordance is a list of subjects that a book covers (usually The Bible), with the immediate context of the coverage of those subjects. Unlike a traditional concordance, the indexed word does not have to appear in the verse. The best-known topical concordance is Nave's Topical Bible. The first Bible concordance was compiled for the Vulgate Bible by Hugh of St Cher (d.1262), who employed 500 friars to assist him. In 1448, Rabbi Mordecai Nathan completed a concordance to the Hebrew Bible. It took him ten years. A concordance to the Greek New Testament was published in 1546 by Sixt Birck, and the Septuagint was done a by Conrad Kircher in 1602. The first concordance to the English Bible was published in 1550 by John Merbecke. According to Cruden, it did not employ the verse numbers devised by Robert Stephens in 1545, but "the pretty large concordance" of Mr Cotton did. Then followed Cruden's Concordance and Strong's Concordance. == Use in linguistics == Concordances are frequently used in linguistics, when studying a text. For example: comparing different usages of the same word analysing keywords analysing word frequencies finding and analysing phrases and idioms finding translations of subsentential elements, e.g. terminology, in bitexts and translation memories creating indexes and word lists (also useful for publishing) Concordancing techniques are widely used in national text corpora such as American National Corpus (ANC), British National Corpus (BNC), and Corpus of Contemporary American English (COCA) available on-line. Stand-alone applications that employ concordancing techniques are known as concordancers or more advanced corpus managers. Some of them have integrated part-of-speech taggers (POS taggers) and enable the user to create their own POS-annotated corpora to conduct various types of searches adopted in corpus linguistics. == Inversion == The reconstruction of the text of some of the Dead Sea Scrolls involved a concordance. Access to some of the scrolls was governed by a "secrecy rule" that allowed only the original International Team or their designates to view the original materials. After the death of Roland de Vaux in 1971, his successors repeatedly refused to even allow the publication of photographs to other scholars. This restriction was circumvented by Martin Abegg in 1991, who used a computer to "invert" a concordance of the missing documents made in the 1950s which had come into the hands of scholars outside of the International Team, to obtain an approximate reconstruction of the original text of 17 of the documents. This was soon followed by the release of the original text of the scrolls.

    Read more →
  • NewSQL

    NewSQL

    NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system. Many enterprise systems that handle high-profile data (e.g., financial and order processing systems) are too large for conventional relational databases, but have transactional and consistency requirements that are not practical for NoSQL systems. The only options previously available for these organizations were to either purchase more powerful computers or to develop custom middleware that distributes requests over conventional DBMS. Both approaches feature high infrastructure costs and/or development costs. NewSQL systems attempt to reconcile the conflicts. == History == The term was first used by 451 Group analyst Matthew Aslett in a 2011 research paper discussing the rise of a new generation of database management systems. One of the first NewSQL systems was the H-Store parallel database system. == Applications == Typical applications are characterized by heavy OLTP transaction volumes. OLTP transactions; are short-lived (i.e., no user stalls) touch small amounts of data per transaction use indexed lookups (no table scans) have a small number of forms (a small number of queries with different arguments). However, some support hybrid transactional/analytical processing (HTAP) applications. Such systems improve performance and scalability by omitting heavyweight recovery or concurrency control. == List of NewSQL-databases == Apache Trafodion Clustrix CockroachDB Couchbase CrateDB Google Spanner MySQL Cluster NuoDB OceanBase Pivotal GemFire XD SequoiaDB SingleStore was formerly known as MemSQL. TIBCO Active Spaces TiDB TokuDB TransLattice Elastic Database VoltDB YDB YugabyteDB == Features == The two common distinguishing features of NewSQL database solutions are that they support online scalability of NoSQL databases and the relational data model (including ACID consistency) using SQL as their primary interface. NewSQL systems can be loosely grouped into three categories: === New architectures === NewSQL systems adopt various internal architectures. Some systems employ a cluster of shared-nothing nodes, in which each node manages a subset of the data. They include components such as distributed concurrency control, flow control, and distributed query processing. === SQL engines === The second category are optimized storage engines for SQL. These systems provide the same programming interface as SQL, but scale better than built-in engines. === Transparent sharding === These systems automatically split databases across multiple nodes using Raft or Paxos consensus algorithm.

    Read more →