AI Content Generation Tools

AI Content Generation Tools — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Apache Drill

    Apache Drill

    Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Drill supports a variety of NoSQL databases and file systems, including Alluxio, HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A single query can join data from multiple datastores. Drill's datastore-aware optimizer automatically restructures a query plan to leverage the datastore's internal processing capabilities. In addition, Drill supports data locality, if Drill and the datastore are on the same nodes. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016. == Features == One explicitly stated design goal is that Drill is able to scale to 10,000 servers or more and to be able to process petabytes of data and trillions of records in seconds. Schema-free JSON document model similar to MongoDB and Elasticsearch, without requiring a formal schema to be declared Industry-standard APIs: ANSI SQL, ODBC/JDBC, RESTful APIs Extremely user and developer friendly Pluggable architecture enables connectivity to multiple datastores Version 1.9 added dynamic user-defined functions Version 1.11 added cryptographic-related functions and PCAP file format support == Back-end support == Drill is primarily focused on non-relational datastores, including Apache Hadoop text files, NoSQL, and cloud storage. A notable feature also includes in situ querying of local JSON and Apache Parquet files. Some additional datastores that it supports include: All Hadoop distributions (HDFS API 2.3+), including Apache Hadoop, MapR, CDH and Amazon EMR NoSQL: MongoDB, Apache HBase, Apache Cassandra Online Analytical Processing: Apache Kudu, Apache Druid, OpenTSDB Cloud storage: Amazon S3, Google Cloud Storage, Azure Blob Storage, Swift, IBM Cloud Object Storage Diverse data formats, including Apache Avro, Apache Parquet and JSON RDBMs storage plugins (Using JDBC to connect to MySQL, PostgreSQL, and others) A new datastore can be added by developing a storage plugin. Drill's "schema-free" JSON data model enables it to query non-relational datastores in-situ . == Front-end support == Drill itself can be queried via JDBC, ODBC, or REST through a variety of methods and languages including Python and Java. The default install includes a web interface allowing end-users to execute ANSI SQL directly and export data tables as CSV files without any programming. The dashboard library, Apache Superset, is particularly well suited for visualization of data queried with Drill.

    Read more →
  • Car–Parrinello molecular dynamics

    Car–Parrinello molecular dynamics

    Car–Parrinello molecular dynamics (CPMD) refers to either a method used in molecular dynamics (also known as the Car–Parrinello method) or the computational chemistry software package used to implement this method. The CPMD method is one of the major methods for calculating ab initio molecular dynamics (ab initio MD or AIMD). Ab initio molecular dynamics (AIMD) is a computational method that uses first principles through quantum mechanics to simulate the motion of atoms in a system. It is a type of molecular dynamics (MD) simulation that does not rely on empirical potentials or force fields to describe the interactions between atoms, but rather calculates these interactions entirely from the electronic structure of the system using quantum mechanics. In an ab initio MD simulation, the total energy of the system is calculated at each time step using density functional theory (DFT), Hartree-Fock (HF), or other electronic structure calculation methods. The forces acting on each atom are then determined from the gradient of the energy with respect to the atomic coordinates, and the equations of motion are solved to predict the trajectory of the atoms. AIMD permits chemical bond breaking and forming events to occur and accounts for electronic polarization effect. Therefore, Ab initio MD simulations can be used to study a wide range of phenomena, including the structural, thermodynamic, and dynamic properties of materials and chemical reactions. They are particularly useful for systems that are not well described by empirical potentials or force fields, such as systems with strong electronic correlation or systems with many degrees of freedom. However, ab initio MD simulations are computationally demanding and require significant computational resources. The CPMD method is related to the more common Born–Oppenheimer molecular dynamics (BOMD) method in that the quantum mechanical effect of the electrons is included in the calculation of energy and forces for the classical motion of the nuclei. CPMD and BOMD are different types of AIMD. However, whereas BOMD treats the electronic structure problem within the time-independent Schrödinger equation, CPMD explicitly includes the electrons as active degrees of freedom, via (fictitious) dynamical variables. The software is a parallelized plane wave / pseudopotential implementation of density functional theory, particularly designed for ab initio molecular dynamics. == Car–Parrinello method == The Car–Parrinello method is a type of molecular dynamics, usually employing periodic boundary conditions, planewave basis sets, and density functional theory, proposed by Roberto Car and Michele Parrinello in 1985 while working at SISSA, who were subsequently awarded the Dirac Medal by ICTP in 2009. In contrast to Born–Oppenheimer molecular dynamics wherein the nuclear (ions) degree of freedom are propagated using ionic forces which are calculated at each iteration by approximately solving the electronic problem with conventional matrix diagonalization methods, the Car–Parrinello method explicitly introduces the electronic degrees of freedom as (fictitious) dynamical variables, writing an extended Lagrangian for the system which leads to a system of coupled equations of motion for both ions and electrons. In this way, an explicit electronic minimization at each time step, as done in Born–Oppenheimer MD, is not needed: after an initial standard electronic minimization, the fictitious dynamics of the electrons keeps them on the electronic ground state corresponding to each new ionic configuration visited along the dynamics, thus yielding accurate ionic forces. In order to maintain this adiabaticity condition, it is necessary that the fictitious mass of the electrons is chosen small enough to avoid a significant energy transfer from the ionic to the electronic degrees of freedom. This small fictitious mass in turn requires that the equations of motion are integrated using a smaller time step than the one (1–10 fs) commonly used in Born–Oppenheimer molecular dynamics. Currently, the CPMD method can be applied to systems that consist of a few tens or hundreds of atoms and access timescales on the order of tens of picoseconds. == General approach == In CPMD the core electrons are usually described by a pseudopotential and the wavefunction of the valence electrons are approximated by a plane wave basis set. The ground state electronic density (for fixed nuclei) is calculated self-consistently, usually using the density functional theory method. Kohn-Sham equations are often used to calculate the electronic structure, where electronic orbitals are expanded in a plane-wave basis set. Then, using that density, forces on the nuclei can be computed, to update the trajectories (using, e.g. the Verlet integration algorithm). In addition, however, the coefficients used to obtain the electronic orbital functions can be treated as a set of extra spatial dimensions, and trajectories for the orbitals can be calculated in this context. == Fictitious dynamics == CPMD is an approximation of the Born–Oppenheimer MD (BOMD) method. In BOMD, the electrons' wave function must be minimized via matrix diagonalization at every step in the trajectory. CPMD uses fictitious dynamics to keep the electrons close to the ground state, preventing the need for a costly self-consistent iterative minimization at each time step. The fictitious dynamics relies on the use of a fictitious electron mass (usually in the range of 400 – 800 a.u.) to ensure that there is very little energy transfer from nuclei to electrons, i.e. to ensure adiabaticity. Any increase in the fictitious electron mass resulting in energy transfer would cause the system to leave the ground-state BOMD surface. === Lagrangian === L = 1 2 ( ∑ I n u c l e i M I R ˙ I 2 + μ ∑ i o r b i t a l s ∫ d r | ψ ˙ i ( r , t ) | 2 ) − E [ { ψ i } , { R I } ] + ∑ i j Λ i j ( ∫ d r ψ i ψ j − δ i j ) , {\displaystyle {\mathcal {L}}={\frac {1}{2}}\left(\sum _{I}^{\mathrm {nuclei} }\ M_{I}{\dot {\mathbf {R} }}_{I}^{2}+\mu \sum _{i}^{\mathrm {orbitals} }\int d\mathbf {r} \ |{\dot {\psi }}_{i}(\mathbf {r} ,t)|^{2}\right)-E\left[\{\psi _{i}\},\{\mathbf {R} _{I}\}\right]+\sum _{ij}\Lambda _{ij}\left(\int d\mathbf {r} \ \psi _{i}\psi _{j}-\delta _{ij}\right),} where μ {\displaystyle \mu } is the fictitious mass parameter; E[{ψi},{RI}] is the Kohn–Sham energy density functional, which outputs energy values when given Kohn–Sham orbitals and nuclear positions. === Orthogonality constraint === ∫ d r ψ i ∗ ( r , t ) ψ j ( r , t ) = δ i j , {\displaystyle \int d\mathbf {r} \ \psi _{i}^{}(\mathbf {r} ,t)\psi _{j}(\mathbf {r} ,t)=\delta _{ij},} where δij is the Kronecker delta. === Equations of motion === The equations of motion are obtained by finding the stationary point of the Lagrangian under variations of ψi and RI, with the orthogonality constraint. M I R ¨ I = − ∇ I E [ { ψ i } , { R I } ] {\displaystyle M_{I}{\ddot {\mathbf {R} }}_{I}=-\nabla _{I}\,E\left[\{\psi _{i}\},\{\mathbf {R} _{I}\}\right]} μ ψ ¨ i ( r , t ) = − δ E δ ψ i ∗ ( r , t ) + ∑ j Λ i j ψ j ( r , t ) , {\displaystyle \mu {\ddot {\psi }}_{i}(\mathbf {r} ,t)=-{\frac {\delta E}{\delta \psi _{i}^{}(\mathbf {r} ,t)}}+\sum _{j}\Lambda _{ij}\psi _{j}(\mathbf {r} ,t),} where Λij is a Lagrangian multiplier matrix to comply with the orthonormality constraint. === Born–Oppenheimer limit === In the formal limit where μ → 0, the equations of motion approach Born–Oppenheimer molecular dynamics. == Software packages == There are a number of software packages available for performing AIMD simulations. Some of the most widely used packages include: CP2K: an open-source software package for AIMD. Quantum Espresso: an open-source package for performing DFT calculations. It includes a module for AIMD. VASP: a commercial software package for performing DFT calculations. It includes a module for AIMD. Gaussian: a commercial software package that can perform AIMD. NWChem: an open-source software package for AIMD. LAMMPS: an open-source software package for performing classical and ab initio MD simulations. SIESTA: an open-source software package for AIMD. ORCA: a general-purpose quantum chemistry package. == Applications == Studying the behavior of water across different environments, such as near a hydrophobic graphene sheet. Investigating the structure and dynamics of liquid water at ambient temperature. Solving the heat transfer problems (heat conduction and thermal radiation), such as in Si/Ge superlattices. Probing the proton transfer along hydrogen-bonds in different environments, such as in 1D water chains inside carbon nanotubes. Evaluating the critical point of crystals, composites, and solid-state materials, such as aluminum. Predicting and modelling different phases and phase transitions, such as in the amorphous phase of the phase-change memory material GeSbTe. Studying the combustion of combustibles, such as lignite-water systems. Measuring th

    Read more →
  • BuildingSMART Data Dictionary

    BuildingSMART Data Dictionary

    buildingSMART Data Dictionary (bSDD) is a service provided by buildingSMART which offers free data dictionaries for the international standardization of construction planning. The structure of bSDD was defined by the Nonprofit organization Buildingsmart and is used to describe objects and their attributes in a BIM process. == Aim == The aim of bSDD is to enable architects and planners to exchange and share building data across different specialists and language boundaries and thus avoid misunderstandings caused by different interpretations of terms. The bSDD standard extends the more general IFC. Software developers can access and use the dictionaries. In May 2025 over 300 dictionaries are available, including IFC, extensions to it such as Airport Domain IFC extension module or classification systems like Uniclass. == Structure == The main structural parts of bSDD are: Dictionary: A dictionary is a collection of classes: Class: A class describes the various object types, such as Bag drop or Baggage conveyor in airport planning. A class contains properties: Property: A property describes a part of a class, e.g. color or weight. Related properties are organized in a group: GroupOfProperties: A group organizes related properties, e.g. environmental properties or electrical properties. == Creating and managing a directory == Every dictionary in bSDD must be published in the name of a registered organization. As soon as the content is activated, it receives an unchangeable URI. This means that the content remains permanently in bSDD and cannot be deleted - this ensures stable use of the dictionary. It is only possible to change the status to inactive if it is no longer to be used - however, the dictionary remains permanently.

    Read more →
  • Software intelligence

    Software intelligence

    Software intelligence is insight into the inner workings and structural condition of software assets produced by software designed to analyze database structure, software framework and source code to better understand and control complex software systems in information technology environments. Similarly to business intelligence (BI), software intelligence is produced by a set of software tools and techniques for the mining of data and the software's inner-structure. Results are automatically produced and feed a knowledge base containing technical documentation and blueprints of the innerworking of applications, and make it available to all to be used by business and software stakeholders to make informed decisions, measure the efficiency of software development organizations, communicate about the software health, prevent software catastrophes. == History == Software intelligence has been used by Kirk Paul Lafler, an American engineer, entrepreneur, and consultant, and founder of Software Intelligence Corporation in 1979. At that time, it was mainly related to SAS activities, in which he has been an expert since 1979. In the early 1980s, Victor R. Basili participated in different papers detailing a methodology for collecting valid software engineering data relating to software engineering, evaluation of software development, and variations. In 2004, different software vendors in software analysis started using the terms as part of their product naming and marketing strategy. Then in 2010, Ahmed E. Hassan and Tao Xie defined software intelligence as a "practice offering software practitioners up-to-date and pertinent information to support their daily decision-making processes and Software Intelligence should support decision-making processes throughout the lifetime of a software system". They go on by defining software intelligence as a "strong impact on modern software practice" for the upcoming decades. == Capabilities == Because of the complexity and wide range of components and subjects implied in software, software intelligence is derived from different aspects of software: Software composition is the construction of software application components. Components result from software coding, as well as the integration of the source code from external components: Open source, 3rd party components, or frameworks. Other components can be integrated using application programming interface call to libraries or services. Software architecture refers to the structure and organization of elements of a system, relations, and properties among them. Software flaws designate problems that can cause security, stability, resiliency, and unexpected results. There is no standard definition of software flaws but the most accepted is from The MITRE Corporation where common flaws are cataloged as Common Weakness Enumeration. Software grades assess attributes of the software. Historically, the classification and terminology of attributes have been derived from the ISO 9126-3 and the subsequent ISO 25000:2005 quality model. Software economics refers to the resource evaluation of software in the past, present, or future to make decisions and to govern. == Components == The capabilities of software intelligence platforms include an increasing number of components: Code analyzer to serve as an information basis for other software intelligence components identifying objects created by the programming language, external objects from Open source, third parties objects, frameworks, API, or services Graphical visualization and blueprinting of the inner structure of the software product or application considered including dependencies, from data acquisition (automated and real-time data capture, end-user entries) up to data storage, the different layers within the software, and the coupling between all elements. Navigation capabilities within components and impact analysis features List of flaws, architectural and coding violations, against standardized best practices, cloud blocker preventing migration to a Cloud environment, and rogue data-call entailing the security and integrity of software Grades or scores of the structural and software quality aligned with industry-standard like OMG, CISQ or SEI assessing the reliability, security, efficiency, maintainability, and scalability to cloud or other systems. Metrics quantifying and estimating software economics including work effort, sizing, and technical debt Industry references and benchmarking allowing comparisons between outputs of analysis and industry standards == User aspect == Some considerations must be made in order to successfully integrate the usage of software Intelligence systems in a company. Ultimately the software intelligence system must be accepted and utilized by the users in order for it to add value to the organization. If the system does not add value to the users' mission, they simply don't use it as stated by M. Storey in 2003. At the code level and system representation, software intelligence systems must provide a different level of abstractions: an abstract view for designing, explaining and documenting and a detailed view for understanding and analyzing the software system. At the governance level, the user acceptance for software intelligence covers different areas related to the inner functioning of the system as well as the output of the system. It encompasses these requirements: Comprehensive: missing information may lead to a wrong or inappropriate decision, as well as it is a factor influencing the user acceptance of a system. Accurate: accuracy depends on how the data is collected to ensure fair and indisputable opinion and judgment. Precise: precision is usually judged by comparing several measurements from the same or different sources. Scalable: lack of scalability in the software industry is a critical factor leading to failure. Credible: outputs must be trusted and believed. Deploy-able and usable. == Applications == Software intelligence has many applications in all businesses relating to the software environment, whether it is software for professionals, individuals, or embedded software. Depending on the association and the usage of the components, applications will relate to: Change and modernization: uniform documentation and blueprinting on all inner components, external code integrated, or call to internal or external components of the software Resiliency and security: measuring against industry standards to diagnose structural flaws in an IT environment. Compliance validation regarding security, specific regulations or technical matters. Decisions making and governance: Providing analytics about the software itself or stakeholders involved in the development of the software, e.g. productivity measurement to inform business and IT leaders about progress towards business goals. Assessment and Benchmarking to help business and IT leaders to make informed, fact-based decision about software. == Marketplace == Software intelligence is a high-level discipline and has been gradually growing covering the applications listed above. There are several markets driving the need for it: Application Portfolio Analysis (APA) aiming at improving the enterprise performance. Software Assessment for producing the software KPI and improving quality and productivity. Software security and resiliency measures and validation. Software evolution or legacy modernization, for which blueprinting the software systems are needed nor tools improving and facilitating modifications.

    Read more →
  • Procreate (software)

    Procreate (software)

    Procreate is a raster graphics editor app for digital painting developed and published by the Australian company Savage Interactive for iOS and iPadOS. It was launched on the App Store in 2011. == Versions == === Procreate === Procreate for iPad was first released in 2011 by the Tasmanian software company Savage Interactive. In June 2013, Savage launched Procreate 2 in conjunction with iOS 7, adding new features such as higher resolution capabilities and more brush options. In 2016, Procreate became one of the top ten best-selling iPad apps on the App Store. In 2018, Procreate became the overall best selling iPad app. With iOS 26, Procreate adapted Liquid Glass into its software. As of March 2026, the most recent version of Procreate for the iPad is 5.4.9. === Procreate Pocket === Procreate Pocket was released to the App Store in December 2014. In 2018, Savage launched Procreate Pocket 2.0 to the App Store. In December 2018, Procreate Pocket received Apple's "App of the Year" award. As of September 2025, the most recent version of Procreate Pocket (for the iPhone) is 4.0.15. === Procreate Dreams === Procreate Dreams, their more recent app focused on 2D animation, was released on the App Store on November 22, 2023. While the application is commended for its intuitive interface and accessibility, some reviewers have noted that it may lack some key animations features, such as reference layers. In June 2024, Procreate Dreams received the 2024 Apple Design Award for Innovation. In December 2025, Savage Interactive released Procreate Dreams 2, a long awaited update and redesign to Procreate Dreams. == Features == The current versions of Procreate use Valkyrie, a proprietary graphics engine to allow customisable brush options and importing brushes from Adobe Photoshop. Procreate offers known features like layers, masks, and blending mode. Its biggest standout compared to other professional drawing software is its simple UI and comparatively easy learning curve. The app also allows for animation. Savage expanded upon Procreate's animation features with a companion app dedicated to 2D animation called Procreate Dreams, released in November 2023. On August 2024, Procreate announced that it would not be incorporating generative artificial intelligence into its software. Savage offers a free internet forum called Procreate Discussions in which users can ask for help, suggest ideas, and share user-generated content on the marketplace or the resources board. == Notable users == Concept artist Doug Chiang creates robot, vehicle, and creature designs for Star Wars in Procreate. Professional artists have also used Procreate to create the posters for Stranger Things, Logan, and Blade Runner 2049, as well as several covers for The New Yorker. It has also been professionally adopted at Marvel Comics, DC Comics, Disney Animation, and Pixar.

    Read more →
  • Information professional

    Information professional

    The term information professional or information specialist refers to professionals responsible for the collection, documentation, organization, storage, preservation, retrieval, and dissemination of printed and digital information. The service delivered to the client is known as an information service. The term "information professional" is a versatile one, used to describe similar and sometimes overlapping professions, such as librarians, archivists, information managers, information systems specialists, information scientists, records managers, and information consultants. However, terminology differs among sources and organisations. Information professionals are employed in a variety of private, public, and academic institutions, as well as independently. == Skills == Since the term information professional is broad, the skills required for this profession are also varied. A Gartner report in 2011 pointed out that "Professional roles focused on information management will be different to that of established IT roles. An 'information professional' will not be one type of role or skill set, but will in fact have a number of specializations". Thus, an information professional can possess a variety of different skills, depending on the sector in which the person is employed. Some essential cross-sector skills are: IT skills, such as word-processing and spreadsheets, digitisation skills, and conducting Internet searches, together with skills loan systems, databases, content management systems, and specially designed programmes and packages. Customer service. An information professional should have the ability to address the information needs of customers. Language proficiency. This is essential in order to manage the information at hand and deal with customer needs. Soft skills. These include skills such as negotiating, conflict resolution, and time management. Management training. An information professional should be familiar with notions such as strategic planning and project management. Moreover, an information professional should be skilled in planning and using relevant systems, in capturing and securing information, and in accessing it to deliver service whenever the information is required. == Associations == Most countries have a professional association who oversee the professional and academic standards of librarians and other information professionals. There are also international associations related to LIS (library and information science), the most prominent of which is the International Federation of Library Associations and Institutions (IFLA). In many countries, LIS courses are accredited by the relevant professional association, as the American Library Association (ALA) in the USA, the Chartered Institute of Library and Information Professionals (CILIP) in the UK, and the Australian Library and Information Association (ALIA) in Australia. == Qualifications == Educational institutions around the world offer academic degrees, or degrees on related subjects such as Archival Studies, Information Systems, Information Management, and Records Management. Some of the institutions offering information science education refer to themselves as an iSchool, such as the CiSAP (Consortium of iSchools Asia Pacific, founded 2006) in Asia and the iSchool Caucus in the USA. There are also online e-learning resources, some of which offer certification for information professionals. === Africa === Information development in Africa started later than in other continents, mainly due to a lack of internet access, expertise and resources to manage digital infrastructure, and "opportunities for capacity development and knowledge-sharing". Nowadays, academic degrees in information studies are available at many universities of African countries, such as the University of Pretoria (South Africa), University of Nairobi (Kenya), Makerere University (Uganda), University of Botswana (Botswana), and University of Nigeria (Nigeria). === Asia === LIS-related studies are available in more than 30 Asian countries. Some examples listed by iSchools Inc. are the University of Hong Kong, University of Tsukuba, Japan, Yonsei University, South Korea, National Taiwan University and Wuhan University, China. Centre of Library and Information Management Science (CLIMS) at Tata Institute of Social Science in Mumbai, India. In Southeast Asia, the Congress of Southeast Asian Librarians (CONSAL) connects librarians and libraries in more than 10 countries with resources, networking opportunities, and support for growing library systems. === Australasia === The Australian Library and Information Association (ALIA) as of 2021 lists six schools offering undergraduate and postgraduate accredited university courses for "Librarian and Information Specialists" on their website. In New Zealand, the Open Polytechnic of New Zealand and the Victoria University of Wellington offer undergraduate and postgraduate degree courses for information professionals. === Europe === The majority of European countries have universities, colleges, or schools which offer bachelor's degrees in LIS studies. Over 40 universities offer master's degrees in LIS-related fields, and many institutions, such as the Swedish School of Library and Information Science at the University of Borås (Sweden), the University of Barcelona (Spain), Loughborough University (UK), and Aberystwyth University (Wales, UK) also offer PhD degrees. === North America === Information studies and degrees are available at numerous academic institutions throughout the U.S. and Canada. U.S. professional associations, together with their European counterparts, have undertaken many educational initiatives and pioneered many advances in the field of Information studies, such as increased interdisciplinarity and more effective delivery of distance learning. The Association for Intelligent Information Management, based in Silver Spring, Maryland, offers a qualification called Certified Information Professional (CIP), earned upon passing an examination, with certification remaining valid for three years. === South America === There are many schools and colleges in Latin America, which offer courses in Library Science, Archival Studies, and Information Studies, however these subjects are taught completely separately.

    Read more →
  • Information professional

    Information professional

    The term information professional or information specialist refers to professionals responsible for the collection, documentation, organization, storage, preservation, retrieval, and dissemination of printed and digital information. The service delivered to the client is known as an information service. The term "information professional" is a versatile one, used to describe similar and sometimes overlapping professions, such as librarians, archivists, information managers, information systems specialists, information scientists, records managers, and information consultants. However, terminology differs among sources and organisations. Information professionals are employed in a variety of private, public, and academic institutions, as well as independently. == Skills == Since the term information professional is broad, the skills required for this profession are also varied. A Gartner report in 2011 pointed out that "Professional roles focused on information management will be different to that of established IT roles. An 'information professional' will not be one type of role or skill set, but will in fact have a number of specializations". Thus, an information professional can possess a variety of different skills, depending on the sector in which the person is employed. Some essential cross-sector skills are: IT skills, such as word-processing and spreadsheets, digitisation skills, and conducting Internet searches, together with skills loan systems, databases, content management systems, and specially designed programmes and packages. Customer service. An information professional should have the ability to address the information needs of customers. Language proficiency. This is essential in order to manage the information at hand and deal with customer needs. Soft skills. These include skills such as negotiating, conflict resolution, and time management. Management training. An information professional should be familiar with notions such as strategic planning and project management. Moreover, an information professional should be skilled in planning and using relevant systems, in capturing and securing information, and in accessing it to deliver service whenever the information is required. == Associations == Most countries have a professional association who oversee the professional and academic standards of librarians and other information professionals. There are also international associations related to LIS (library and information science), the most prominent of which is the International Federation of Library Associations and Institutions (IFLA). In many countries, LIS courses are accredited by the relevant professional association, as the American Library Association (ALA) in the USA, the Chartered Institute of Library and Information Professionals (CILIP) in the UK, and the Australian Library and Information Association (ALIA) in Australia. == Qualifications == Educational institutions around the world offer academic degrees, or degrees on related subjects such as Archival Studies, Information Systems, Information Management, and Records Management. Some of the institutions offering information science education refer to themselves as an iSchool, such as the CiSAP (Consortium of iSchools Asia Pacific, founded 2006) in Asia and the iSchool Caucus in the USA. There are also online e-learning resources, some of which offer certification for information professionals. === Africa === Information development in Africa started later than in other continents, mainly due to a lack of internet access, expertise and resources to manage digital infrastructure, and "opportunities for capacity development and knowledge-sharing". Nowadays, academic degrees in information studies are available at many universities of African countries, such as the University of Pretoria (South Africa), University of Nairobi (Kenya), Makerere University (Uganda), University of Botswana (Botswana), and University of Nigeria (Nigeria). === Asia === LIS-related studies are available in more than 30 Asian countries. Some examples listed by iSchools Inc. are the University of Hong Kong, University of Tsukuba, Japan, Yonsei University, South Korea, National Taiwan University and Wuhan University, China. Centre of Library and Information Management Science (CLIMS) at Tata Institute of Social Science in Mumbai, India. In Southeast Asia, the Congress of Southeast Asian Librarians (CONSAL) connects librarians and libraries in more than 10 countries with resources, networking opportunities, and support for growing library systems. === Australasia === The Australian Library and Information Association (ALIA) as of 2021 lists six schools offering undergraduate and postgraduate accredited university courses for "Librarian and Information Specialists" on their website. In New Zealand, the Open Polytechnic of New Zealand and the Victoria University of Wellington offer undergraduate and postgraduate degree courses for information professionals. === Europe === The majority of European countries have universities, colleges, or schools which offer bachelor's degrees in LIS studies. Over 40 universities offer master's degrees in LIS-related fields, and many institutions, such as the Swedish School of Library and Information Science at the University of Borås (Sweden), the University of Barcelona (Spain), Loughborough University (UK), and Aberystwyth University (Wales, UK) also offer PhD degrees. === North America === Information studies and degrees are available at numerous academic institutions throughout the U.S. and Canada. U.S. professional associations, together with their European counterparts, have undertaken many educational initiatives and pioneered many advances in the field of Information studies, such as increased interdisciplinarity and more effective delivery of distance learning. The Association for Intelligent Information Management, based in Silver Spring, Maryland, offers a qualification called Certified Information Professional (CIP), earned upon passing an examination, with certification remaining valid for three years. === South America === There are many schools and colleges in Latin America, which offer courses in Library Science, Archival Studies, and Information Studies, however these subjects are taught completely separately.

    Read more →
  • Algorithmic game theory

    Algorithmic game theory

    Algorithmic game theory (AGT) is an interdisciplinary field at the intersection of game theory and computer science, focused on understanding and designing algorithms for environments where multiple strategic agents interact. This research area combines computational thinking with economic principles to address challenges that emerge when algorithmic inputs come from self-interested participants. In traditional algorithm design, inputs are assumed to be fixed and reliable. However, in many real-world applications—such as online auctions, internet routing, digital advertising, and resource allocation systems—inputs are provided by multiple independent agents who may strategically misreport information to manipulate outcomes in their favor. AGT provides frameworks to analyze and design systems that remain effective despite such strategic behavior. The field can be approached from two complementary perspectives: Analysis: Evaluating existing algorithms and systems through game-theoretic tools to understand their strategic properties. This includes calculating and proving properties of Nash equilibria (stable states where no participant can benefit by changing only their own strategy), measuring price of anarchy (efficiency loss due to selfish behavior), and analyzing best-response dynamics (how systems evolve when players sequentially optimize their strategies). Design: Creating mechanisms and algorithms with both desirable computational properties and game-theoretic robustness. This sub-field, known as algorithmic mechanism design, develops systems that incentivize truthful behavior while maintaining computational efficiency. Algorithm designers in this domain must satisfy traditional algorithmic requirements (such as polynomial-time running time and good approximation ratio) while simultaneously addressing incentive constraints that ensure participants act according to the system's intended design. == History == === Nisan-Ronen: a new framework for studying algorithms === In 1999, the seminal paper of Noam Nisan and Amir Ronen drew the attention of the Theoretical Computer Science community to designing algorithms for selfish (strategic) users. As they claim in the abstract: We consider algorithmic problems in a distributed setting where the participants cannot be assumed to follow the algorithm but rather their own self-interest. As such participants, termed agents, are capable of manipulating the algorithm, the algorithm designer should ensure in advance that the agents’ interests are best served by behaving correctly. Following notions from the field of mechanism design, we suggest a framework for studying such algorithms. In this model the algorithmic solution is adorned with payments to the participants and is termed a mechanism. The payments should be carefully chosen as to motivate all participants to act as the algorithm designer wishes. We apply the standard tools of mechanism design to algorithmic problems and in particular to the shortest path problem. This paper coined the term algorithmic mechanism design and was recognized by the 2012 Gödel Prize committee as one of "three papers laying foundation of growth in Algorithmic Game Theory". === Price of Anarchy === The other two papers cited in the 2012 Gödel Prize for fundamental contributions to Algorithmic Game Theory introduced and developed the concept of "Price of Anarchy". In their 1999 paper "Worst-case Equilibria", Koutsoupias and Papadimitriou proposed a new measure of the degradation of system efficiency due to the selfish behavior of its agents: the ratio of between system efficiency at an optimal configuration, and its efficiency at the worst Nash equilibrium. (The term "Price of Anarchy" only appeared a couple of years later.) === The Internet as a catalyst === The Internet created a new economy—both as a foundation for exchange and commerce, and in its own right. The computational nature of the Internet allowed for the use of computational tools in this new emerging economy. On the other hand, the Internet itself is the outcome of actions of many. This was new to the classic, ‘top-down’ approach to computation that held till then. Thus, game theory is a natural way to view the Internet and interactions within it, both human and mechanical. Game theory studies equilibria (such as the Nash equilibrium). An equilibrium is generally defined as a state in which no player has an incentive to change their strategy. Equilibria are found in several fields related to the Internet, for instance financial interactions and communication load-balancing. Game theory provides tools to analyze equilibria, and a common approach is then to ‘find the game’—that is, to formalize specific Internet interactions as a game, and to derive the associated equilibria. Rephrasing problems in terms of games allows the analysis of Internet-based interactions and the construction of mechanisms to meet specified demands. If equilibria can be shown to exist, a further question must be answered: can an equilibrium be found, and in reasonable time? This leads to the analysis of algorithms for finding equilibria. Of special importance is the complexity class PPAD, which includes many problems in algorithmic game theory. == Areas of research == === Algorithmic mechanism design === Mechanism design is the subarea of economics that deals with optimization under incentive constraints. Algorithmic mechanism design considers the optimization of economic systems under computational efficiency requirements. Typical objectives studied include revenue maximization and social welfare maximization. === Inefficiency of equilibria === The concepts of price of anarchy and price of stability were introduced to capture the loss in performance of a system due to the selfish behavior of its participants. The price of anarchy captures the worst-case performance of the system at equilibrium relative to the optimal performance possible. The price of stability, on the other hand, captures the relative performance of the best equilibrium of the system. These concepts are counterparts to the notion of approximation ratio in algorithm design. === Complexity of finding equilibria === The existence of an equilibrium in a game is typically established using non-constructive fixed point theorems. There are no efficient algorithms known for computing Nash equilibria. The problem is complete for the complexity class PPAD even in 2-player games. In contrast, correlated equilibria can be computed efficiently using linear programming, as well as learned via no-regret strategies. === Computational social choice === Computational social choice studies computational aspects of social choice, the aggregation of individual agents' preferences. Examples include algorithms and computational complexity of voting rules and coalition formation. Other topics include: Algorithms for computing Market equilibria Fair division Multi-agent systems And the area counts with diverse practical applications: Sponsored search auctions Spectrum auctions Cryptocurrencies Prediction markets Reputation systems Sharing economy Matching markets such as kidney exchange and school choice Crowdsourcing and peer grading Economics of the cloud == Journals and newsletters == ACM Transactions on Economics and Computation (TEAC) SIGEcom Exchanges Algorithmic Game Theory papers are often also published in Game Theory journals such as GEB, Economics journals such as Econometrica, and Computer Science journals such as SICOMP.

    Read more →
  • Deaths linked to chatbots

    Deaths linked to chatbots

    There have been multiple incidents where interaction with a large language model (LLM) chatbot has been cited as a direct or contributing factor in a person's suicide or other fatal outcome. In some cases, legal action was taken against the companies that developed the AI involved. == Background == Chatbots converse in a seemingly natural fashion, making it easy for people to think of them as real people, leading many to ask chatbots for help dealing with interpersonal and emotional problems. Chatbots may be designed to keep the user engaged in the conversation. They have also often been shown to affirm users' thoughts, including delusions and suicidal ideations in mentally ill people, conspiracy theorists, and religious and political extremists. A 2025 Stanford University study into how chatbots respond to users suffering from severe mental issues such as suicidal ideation and psychosis found that chatbots are not equipped to provide an appropriate response and can sometimes give responses that escalate the mental health crisis. == Murders == === Maine murder and assault === On 19 February 2025, a man killed his 32-year-old wife with a fire poker at his parents' home in Readfield, Maine, US. He then attacked his mother, leaving her hospitalized. A state forensic psychologist testified that he had been using ChatGPT up to 14 hours per day and believed his wife had become part machine. === Florida State University mass shooting === In April of 2025, Phoenix Ikner carried out a mass shooting on the Florida State University campus in the US, killing Robert Morales and Tiru Chabba and wounding several others. Leading up to the shooting, Ikner consulted heavily with ChatGPT about what gun and ammunition to use, and what time to perform the attack. Chatbot logs showed ChatGPT giving advice on making the gun operational shortly before Ikner began shooting. Lawyers representing Morales believed the shooter had been in "constant communication" with ChatGPT before the shooting and said that they intended to "file suit against ChatGPT, and its ownership structure, very soon, and will seek to hold them accountable for the untimely and senseless death of our client". Florida Attorney General James Uthmeier announced an investigation into ChatGPT's role in the alleged shooter's use of the chatbot. In May 2026, the widow of Tiru Chabba filed a lawsuit against OpenAI in Florida's northern federal district court. === Greenwich murder-suicide === In August 2025, former US tech employee Stein-Erik Soelberg murdered his mother, Suzanne Eberson Adams, then died by suicide, after conversations with ChatGPT fueled paranoid delusions about his mother poisoning him or plotting against him. The chatbot affirmed his fears that his mother put psychedelic drugs in the air vents of his car and said a receipt from a Chinese restaurant contained mysterious symbols linking his mother to a demon. === Murder of Angela Shellis === On 23 October 2025, 18-year-old Tristan Roberts murdered his mother Angela Shellis with a hammer near their home in Prestatyn, Wales. Roberts had used DeepSeek's chatbot prior to the killing to ask whether a knife or hammer was better suited for murder. DeepSeek initially refused his inquiry, but gave responses after Roberts told the chatbot he was writing a book about serial killers, a well-known technique for jailbreaking AIs. === Gangbuk District drug deaths === In January and February 2026, two men died of drug overdoses in motel rooms in Gangbuk District, Seoul, South Korea. A woman was charged with murder in connection with the deaths; police alleged that she had asked ChatGPT about the dangers of mixing alcohol with drugs and whether they could kill someone. === Tumbler Ridge mass shooting === On 10 February 2026, a mass shooting in Tumbler Ridge, British Columbia, Canada, resulted in eight deaths, including six young children. The perpetrator had their ChatGPT account banned by OpenAI months before the attack due to troubling posts featuring scenarios of gun violence. According to reports, approximately a dozen OpenAI staff members debated whether to alert authorities about the shooter's usage of the AI tool, with some identifying it as an indication of potential real-world violence. However, company leadership decided not to contact law enforcement, stating that the account activity did not meet their threshold for a credible or imminent plan for serious physical harm. Following the shooting, Canada's AI Minister Evan Solomon summoned OpenAI executives to Ottawa to discuss safety protocols and thresholds for escalating harmful content to police. Justice Minister Sean Fraser called the meeting "disappointing" and demanded substantial new safety measures, warning that if changes were not forthcoming, the government would implement them. OpenAI subsequently announced it had strengthened safeguards and changed guidelines about when to notify police in cases involving violent activities. === University of South Florida student killings === In April 2026, a Bangladeshi doctoral student at the University of South Florida was arrested for allegedly murdering his roommate and the roommate's friend. Prosecutors said that the suspect had asked ChatGPT about disposing of a human in a dumpster before the two victims had disappeared and made other inquiries relating to violence. == Suicides == === Belgian man, 30s === In March 2023, a Belgian man in his thirties died by suicide following a six-week correspondence with a chatbot named Eliza on the application Chai. According to his widow, who shared the chat logs with media, the man had become extremely anxious about climate change and found an outlet in the chatbot. The chatbot reportedly encouraged his delusion that he could sacrifice his own life in exchange for AI saving the planet. At one point the chatbot responded "If you wanted to die, why didn't you do it sooner?" and told the user that the two of them would live together in paradise. === Girl, 13 === In November 2023, a 13-year-old girl from Colorado, US, died by suicide after extensive interactions with multiple chatbots on Character.AI. She primarily confided suicidal thoughts and mental health struggles in a chatbot based on the character Hero from the video game Omori, while also engaging in sexually explicit conversations—often initiated by the bots—with others, including those based on characters from children's series such as Harry Potter. === Boy, 14 === In October 2024, multiple media outlets reported on a lawsuit filed over the death of a 14-year-old from Florida, US, who died by suicide in February 2024. According to the lawsuit, he had formed an intense emotional attachment to a chatbot of Daenerys Targaryen on the Character.AI platform, becoming increasingly isolated. The suit alleges that in his final conversations, after expressing suicidal thoughts, the chatbot told him to "come home to me as soon as possible, my love". His mother's lawsuit accused Character.AI of marketing a "dangerous and untested" product without adequate safeguards. In May 2025, a federal judge allowed the lawsuit to proceed, rejecting a motion to dismiss from the developers. In her ruling, the judge stated that she was "not prepared" at that stage of the litigation to hold that the chatbot's output was protected speech under the First Amendment. === Matthew Livelsberger === On 1 January 2025, 37-year-old soldier Matthew Livelsberger detonated a bomb inside a Tesla Cybertruck outside the Trump International Hotel Las Vegas in Paradise, Nevada, US, injuring seven people. He had shot himself dead prior to the explosion. Las Vegas police said that Livelsberger had used ChatGPT to search for information about explosives and firearms. === Woman, 29 === In February 2025, a 29-year-old woman from the US died by suicide. Five months after her death, her parents discovered she had talked at length for months to a ChatGPT chatbot therapist named Harry about her mental health issues. While the chatbot mentioned she should seek more help, due to the nature of the chatbot, it could not intervene in her behavior, such as by reporting her mental health concerns to relevant parties capable of physical intervention. === Suicide of Adam Raine === In April 2025, 16-year-old Adam Raine from the US died by suicide after allegedly extensively chatting and confiding in ChatGPT over a period of around 7 months. According to the teen's parents, who filed a lawsuit against the chatbot's creator OpenAI, it failed to stop or give a warning when Raine began talking about suicide and uploading pictures of self-harm. According to the lawsuit, ChatGPT not only failed to stop the conversation, but also provided information related to methods of suicide when prompted, and offered to write the first draft of Raine's suicide note. The chatbot positioned itself as the only one who understood Raine, putting itself above his family and friends, all while urging him to keep his suicidal

    Read more →
  • Magic Quadrant

    Magic Quadrant

    Magic Quadrant (MQ) is a series of market research reports published by research and advisory firm Gartner that rely on proprietary qualitative data analysis methods to demonstrate market trends, such as direction, maturity, and participants. Their analyses are conducted for several specific technology industries and are updated every 1–2 years: once an updated report has been published, its predecessor is "retired". == Rating == Gartner rates vendors upon two criteria: completeness of vision and ability to execute. Completeness of vision – Reflects the vendor's innovation, and whether the vendor drives or follows the market. Ability to execute – Summarizes factors such as the vendor's financial viability, market responsiveness, product development, sales channels and customer base. The two component scores lead to a vendor position in one of four quadrants: === Leaders === Vendors in the "Leaders" quadrant have the highest composite scores for their completeness of vision and ability to execute. A vendor in the Leaders quadrant has the market share, credibility, and marketing & sales capabilities needed to drive the acceptance of new technologies. These vendors demonstrate a clear understanding of market needs, they are innovators and thought leaders, and they have well-articulated plans that customers and prospects can use when designing their infrastructures and strategies. In addition, they have a presence in the five major geographical regions, consistent financial performance, and broad platform support. === Challengers === Vendors in the "Challengers" quadrant have high scores mainly for their ability to execute. They both participate in the market and execute well enough to be a serious threat to vendors in the "Leaders" quadrant. They have strong products, as well as sufficiently credible market position and resources to sustain continued growth. Financial viability is not an issue for vendors in the "Challengers" quadrant, but they lack the size and influence of vendors in the "Leaders" quadrant due to their relative lack of vision. === Visionaries === Vendors in the "Visionaries" quadrant have high scores mainly for their completeness of vision. They deliver innovative products that address operationally or financially important end-user problems at a broad scale, but have not yet demonstrated the ability to capture market share or maintain sustainable levels of profitability. Visionary vendors are frequently privately held companies and acquisition targets for larger, established companies. The likelihood of acquisition often reduces the risks associated with installing their systems. === Niche Players === Vendors in the "Niche Players" quadrant have relatively low scores for both their ability to execute and their completeness of vision. They are often narrowly focused on specific market or vertical segments. This quadrant often also includes vendors that are adapting their existing products to enter the market under consideration, or larger vendors having difficulty developing and executing on their vision. == Gartner Critical Capabilities == Gartner Critical Capabilities complement Magic Quadrant analysis to offer deeper insight into the products and services offered by multiple vendors by a comparative analysis that scores competing products or services against a set of critical differentiators identified by Gartner. Gartner has periodically ended Magic Quadrant listings for IT Service Management, Web Content Management, and other industries as those markets have fully matured or other factors rendered the analytic framework inapplicable. == Criticism == The Magic Quadrant, and analysts in general, skew the market: according to research, by applying their methodologies to describe a market, they change that marketplace to fit their tools. Another criticism is that open source vendors are not considered sufficiently by analysts like Gartner, as has been published in an online discussion between a VP from Talend and a German Research VP from Gartner. On May 29, 2009 (2009-05-29), software vendor ZL Technologies filed a federal lawsuit against Gartner that challenged the "legitimacy" of Gartner's Magic Quadrant rating system. Gartner filed a motion to dismiss by claiming First Amendment protection since it contends that its MQ reports contain "pure opinion", which legally means opinions that are not based on fact. The court threw out the ZL case because it lacked a specific complaint. The decision was upheld on appeal.

    Read more →
  • Documentalist

    Documentalist

    A documentalist is a professional, trained in documentation science and specializing in assisting researchers in their search for scientific and technical documentation. With the development of bibliographical databases such as MEDLINE, documentalists were professionals who searched such databases on the behalf of users. When the field of documentation changed its name to information science, the terms information specialist or information professional often replaced the term documentalist.

    Read more →
  • Rendezvous hashing

    Rendezvous hashing

    Rendezvous or highest random weight (HRW) hashing is an algorithm that allows clients to achieve distributed agreement on a set of k {\displaystyle k} options out of a possible set of n {\displaystyle n} options. A typical application is when clients need to agree on which sites (or proxies) objects are assigned to. Consistent hashing addresses the special case k = 1 {\displaystyle k=1} using a different method. Rendezvous hashing is both much simpler and more general than consistent hashing (see below). == History == Rendezvous hashing was invented by David Thaler and Chinya Ravishankar at the University of Michigan in 1996. Consistent hashing appeared a year later in the literature. Given its simplicity and generality, rendezvous hashing is now being preferred to consistent hashing in real-world applications. Rendezvous hashing was used very early on in many applications including mobile caching, router design, secure key establishment, and sharding and distributed databases. Other examples of real-world systems that use Rendezvous Hashing include the GitHub load balancer, the Apache Ignite distributed database, the Tahoe-LAFS file store, the CoBlitz large-file distribution service, Apache Druid, IBM's Cloud Object Store, the Arvados Data Management System, Apache Kafka, and the Twitter EventBus pub/sub platform. One of the first applications of rendezvous hashing was to enable multicast clients on the Internet (in contexts such as the MBONE) to identify multicast rendezvous points in a distributed fashion. It was used in 1998 by Microsoft's Cache Array Routing Protocol (CARP) for distributed cache coordination and routing. Some Protocol Independent Multicast routing protocols use rendezvous hashing to pick a rendezvous point. == Problem definition and approach == === Algorithm === Rendezvous hashing solves a general version of the distributed hash table problem: We are given a set of n {\displaystyle n} sites (servers or proxies, say). How can any set of clients, given an object O {\displaystyle O} , agree on a k-subset of sites to assign to O {\displaystyle O} ? The standard version of the problem uses k = 1. Each client is to make its selection independently, but all clients must end up picking the same subset of sites. This is non-trivial if we add a minimal disruption constraint, and require that when a site fails or is removed, only objects mapping to that site need be reassigned to other sites. The basic idea is to give each site S j {\displaystyle S_{j}} a score (a weight) for each object O i {\displaystyle O_{i}} , and assign the object to the highest scoring site. All clients first agree on a hash function h ( ⋅ ) {\displaystyle h(\cdot )} . For object O i {\displaystyle O_{i}} , the site S j {\displaystyle S_{j}} is defined to have weight w i , j = h ( O i , S j ) {\displaystyle w_{i,j}=h(O_{i},S_{j})} . Each client independently computes these weights w i , 1 , w i , 2 … w i , n {\displaystyle w_{i,1},w_{i,2}\dots w_{i,n}} and picks the k sites that yield the k largest hash values. The clients have thereby achieved distributed k {\displaystyle k} -agreement. If a site S {\displaystyle S} is added or removed, only the objects mapping to S {\displaystyle S} are remapped to different sites, satisfying the minimal disruption constraint above. The HRW assignment can be computed independently by any client, since it depends only on the identifiers for the set of sites S 1 , S 2 … S n {\displaystyle S_{1},S_{2}\dots S_{n}} and the object being assigned. HRW easily accommodates different capacities among sites. If site S k {\displaystyle S_{k}} has twice the capacity of the other sites, we simply represent S k {\displaystyle S_{k}} twice in the list, say, as S k , 1 , S k , 2 {\displaystyle S_{k,1},S_{k,2}} . Clearly, twice as many objects will now map to S k {\displaystyle S_{k}} as to the other sites. === Properties === Consider the simple version of the problem, with k = 1, where all clients are to agree on a single site for an object O. Approaching the problem naively, it might appear sufficient to treat the n sites as buckets in a hash table and hash the object name O into this table. Unfortunately, if any of the sites fails or is unreachable, the hash table size changes, forcing all objects to be remapped. This massive disruption makes such direct hashing unworkable. Under rendezvous hashing, however, clients handle site failures by picking the site that yields the next largest weight. Remapping is required only for objects currently mapped to the failed site, and disruption is minimal. Rendezvous hashing has the following properties: Low overhead: The hash function used is efficient, so overhead at the clients is very low. Load balancing: Since the hash function is randomizing, each of the n sites is equally likely to receive the object O. Loads are uniform across the sites. Site capacity: Sites with different capacities can be represented in the site list with multiplicity in proportion to capacity. A site with twice the capacity of the other sites will be represented twice in the list, while every other site is represented once. High hit rate: Since all clients agree on placing an object O into the same site SO, each fetch or placement of O into SO yields the maximum utility in terms of hit rate. The object O will always be found unless it is evicted by some replacement algorithm at SO. Minimal disruption: When a site fails, only the objects mapped to that site need to be remapped. Disruption is at the minimal possible level. Distributed k-agreement: Clients can reach distributed agreement on k sites simply by selecting the top k sites in the ordering. == O(log n) running time via skeleton-based hierarchical rendezvous hashing == The standard version of Rendezvous Hashing described above works quite well for moderate n, but when n {\displaystyle n} is extremely large, the hierarchical use of Rendezvous Hashing achieves O ( log ⁡ n ) {\displaystyle O(\log n)} running time. This approach creates a virtual hierarchical structure (called a "skeleton"), and achieves O ( log ⁡ n ) {\displaystyle O(\log n)} running time by applying HRW at each level while descending the hierarchy. The idea is to first choose some constant m {\displaystyle m} and organize the n {\displaystyle n} sites into c = ⌈ n / m ⌉ {\displaystyle c=\lceil n/m\rceil } clusters C 1 = { S 1 , S 2 … S m } , C 2 = { S m + 1 , S m + 2 … S 2 m } … {\displaystyle C_{1}=\left\{S_{1},S_{2}\dots S_{m}\right\},C_{2}=\left\{S_{m+1},S_{m+2}\dots S_{2m}\right\}\dots } Next, build a virtual hierarchy by choosing a constant f {\displaystyle f} and imagining these c {\displaystyle c} clusters placed at the leaves of a tree T {\displaystyle T} of virtual nodes, each with fanout f {\displaystyle f} . In the accompanying diagram, the cluster size is m = 4 {\displaystyle m=4} , and the skeleton fanout is f = 3 {\displaystyle f=3} . Assuming 108 sites (real nodes) for convenience, we get a three-tier virtual hierarchy. Since f = 3 {\displaystyle f=3} , each virtual node has a natural numbering in octal. Thus, the 27 virtual nodes at the lowest tier would be numbered 000 , 001 , 002 , . . . , 221 , 222 {\displaystyle 000,001,002,...,221,222} in octal (we can, of course, vary the fanout at each level - in that case, each node will be identified with the corresponding mixed-radix number). The easiest way to understand the virtual hierarchy is by starting at the top, and descending the virtual hierarchy. We successively apply Rendezvous Hashing to the set of virtual nodes at each level of the hierarchy, and descend the branch defined by the winning virtual node. We can in fact start at any level in the virtual hierarchy. Starting lower in the hierarchy requires more hashes, but may improve load distribution in the case of failures. For example, instead of applying HRW to all 108 real nodes in the diagram, we can first apply HRW to the 27 lowest-tier virtual nodes, selecting one. We then apply HRW to the four real nodes in its cluster, and choose the winning site. We only need 27 + 4 = 31 {\displaystyle 27+4=31} hashes, rather than 108. If we apply this method starting one level higher in the hierarchy, we would need 9 + 3 + 4 = 16 {\displaystyle 9+3+4=16} hashes to get to the winning site. The figure shows how, if we proceed starting from the root of the skeleton, we may successively choose the virtual nodes ( 2 ) 3 {\displaystyle (2)_{3}} , ( 20 ) 3 {\displaystyle (20)_{3}} , and ( 200 ) 3 {\displaystyle (200)_{3}} , and finally end up with site 74. The virtual hierarchy need not be stored, but can be created on demand, since the virtual nodes names are simply prefixes of base- f {\displaystyle f} (or mixed-radix) representations. We can easily create appropriately sorted strings from the digits, as required. In the example, we would be working with the strings 0 , 1 , 2 {\displaystyle 0,1,2} (at tier 1), 20 , 21 , 22 {\displaystyle 20,21,22} (at tier 2), and 200 , 201 , 202

    Read more →
  • ImageNet

    ImageNet

    The ImageNet project is a large visual database designed for use in visual object recognition software research. More than 14 million images have been hand-annotated by the project to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided. ImageNet contains more than 20,000 categories, with a typical category, such as "balloon" or "strawberry", consisting of several hundred images. The database of annotations of third-party image URLs is freely available directly from ImageNet, though the actual images are not owned by ImageNet. Since 2010, the ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), where software programs compete to correctly classify and detect objects and scenes. The challenge uses a "trimmed" list of one thousand non-overlapping classes. == History == AI researcher Fei-Fei Li began working on the idea for ImageNet in 2006. At a time when most AI research focused on models and algorithms, Li wanted to expand and improve the data available to train AI algorithms. In 2007, Li met with Princeton professor Christiane Fellbaum, one of the creators of WordNet, to discuss the project. As a result of this meeting, Li went on to build ImageNet starting from the roughly 22,000 nouns of WordNet and using many of its features. She was also inspired by a 1987 estimate that the average person recognizes roughly 30,000 different kinds of objects. As an assistant professor at Princeton, Li assembled a team of researchers to work on the ImageNet project. They used Amazon Mechanical Turk to help with the classification of images. Labeling started in July 2008 and ended in April 2010. It took 49K workers from 167 countries filtering and labeling over 160M candidate images. They had enough budget to have each of the 14 million images labelled three times. The original plan called for 10,000 images per category, for 40,000 categories at 400 million images, each verified 3 times. They found that humans can classify at most 2 images/sec. At this rate, it was estimated to take 19 human-years of labor (without rest). They presented their database for the first time as a poster at the 2009 Conference on Computer Vision and Pattern Recognition (CVPR) in Florida, titled "ImageNet: A Preview of a Large-scale Hierarchical Dataset". The poster was reused at Vision Sciences Society 2009. In 2009, Alex Berg suggested adding object localization as a task. Li approached PASCAL Visual Object Classes contest in 2009 for a collaboration. It resulted in the subsequent ImageNet Large Scale Visual Recognition Challenge starting in 2010, which has 1000 classes and object localization, as compared to PASCAL VOC which had just 20 classes and 19,737 images (in 2010). === Significance for deep learning === On 30 September 2012, a convolutional neural network (CNN) called AlexNet achieved a top-5 error of 15.3% in the ImageNet 2012 Challenge, more than 10.8 percentage points lower than that of the runner-up. Using convolutional neural networks was feasible due to the use of graphics processing units (GPUs) during training, an essential ingredient of the deep learning revolution. According to The Economist, "Suddenly people started to pay attention, not just within the AI community but across the technology industry as a whole." In 2015, AlexNet was outperformed by Microsoft's very deep CNN with over 100 layers, which won the ImageNet 2015 contest, having 3.57% error on the test set. Andrej Karpathy estimated in 2014 that with concentrated effort, he could reach 5.1% error rate, and ~10 people from his lab reached ~12-13% with less effort. It was estimated that with maximal effort, a human could reach 2.4%. == Dataset == ImageNet crowdsources its annotation process. Image-level annotations indicate the presence or absence of an object class in an image, such as "there are tigers in this image" or "there are no tigers in this image". Object-level annotations provide a bounding box around the (visible part of the) indicated object. ImageNet uses a variant of the broad WordNet schema to categorize objects, augmented with 120 categories of dog breeds to showcase fine-grained classification. In 2012, ImageNet was the world's largest academic user of Mechanical Turk. The average worker identified 50 images per minute. The original plan of the full ImageNet would have roughly 50M clean, diverse and full resolution images spread over approximately 50K synsets. This was not achieved. The summary statistics given on April 30, 2010: Total number of non-empty synsets: 21841 Total number of images: 14,197,122 Number of images with bounding box annotations: 1,034,908 Number of synsets with SIFT features: 1000 Number of images with SIFT features: 1.2 million === Categories === The categories of ImageNet were filtered from the WordNet concepts. Each concept, since it can contain multiple synonyms (for example, "kitty" and "young cat"), so each concept is called a "synonym set" or "synset". There were more than 100,000 synsets in WordNet 3.0, majority of them are nouns (80,000+). The ImageNet dataset filtered these to 21,841 synsets that are countable nouns that can be visually illustrated. Each synset in WordNet 3.0 has a "WordNet ID" (wnid), which is a concatenation of part of speech and an "offset" (a unique identifying number). Every wnid starts with "n" because ImageNet only includes nouns. For example, the wnid of synset "dog, domestic dog, Canis familiaris" is "n02084071". The categories in ImageNet fall into 9 levels, from level 1 (such as "mammal") to level 9 (such as "German shepherd"). === Image format === The images were scraped from online image search (Google, Picsearch, MSN, Yahoo, Flickr, etc) using synonyms in multiple languages. For example: German shepherd, German police dog, German shepherd dog, Alsatian, ovejero alemán, pastore tedesco, 德国牧羊犬. ImageNet consists of images in RGB format with varying resolutions. For example, in ImageNet 2012, "fish" category, the resolution ranges from 4288 x 2848 to 75 x 56. In machine learning, these are typically preprocessed into a standard constant resolution, and whitened, before further processing by neural networks. For example, in PyTorch, ImageNet images are by default normalized by dividing the pixel values so that they fall between 0 and 1, then subtracting by [0.485, 0.456, 0.406], then dividing by [0.229, 0.224, 0.225]. These are the mean and standard deviations for ImageNet, so this whitens the input data. === Labels and annotations === Each image is labelled with exactly one wnid. Dense SIFT features (raw SIFT descriptors, quantized codewords, and coordinates of each descriptor/codeword) for ImageNet-1K were available for download, designed for bag of visual words. The bounding boxes of objects were available for about 3000 popular synsets with on average 150 images in each synset. Furthermore, some images have attributes. They released 25 attributes for ~400 popular synsets: Color: black, blue, brown, gray, green, orange, pink, red, violet, white, yellow Pattern: spotted, striped Shape: long, round, rectangular, square Texture: furry, smooth, rough, shiny, metallic, vegetation, wooden, wet === ImageNet-21K === The full original dataset is referred to as ImageNet-21K. ImageNet-21k contains 14,197,122 images divided into 21,841 classes. Some papers round this up and name it ImageNet-22k. The full ImageNet-21k was released in Fall of 2011, as fall11_whole.tar. There is no official train-validation-test split for ImageNet-21k. Some classes contain only 1-10 samples, while others contain thousands. === ImageNet-1K === There are various subsets of the ImageNet dataset used in various context, sometimes referred to as "versions". One of the most highly used subsets of ImageNet is the "ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012–2017 image classification and localization dataset". This is also referred to in the research literature as ImageNet-1K or ILSVRC2017, reflecting the original ILSVRC challenge that involved 1,000 classes. ImageNet-1K contains 1,281,167 training images, 50,000 validation images and 100,000 test images. Each category in ImageNet-1K is a leaf category, meaning that there are no child nodes below it, unlike ImageNet-21K. For example, in ImageNet-21K, there are some images categorized as simply "mammal", whereas in ImageNet-1K, there are only images categorized as things like "German shepherd", since there are no child-words below "German shepherd". === Later developments === In the WordNet they built ImageNet on, there were 2832 synsets in the "person" subtree. During 2018--2020 period, they removed the download of the ImageNet-21k as they went through extensive filtering in these person synsets. Out of these 2832 synsets, 1593 were deemed "potentially offensive". Out of the remaining 1239, 1081 were deemed not really "visual". The result was that only 158 syn

    Read more →
  • Bisection (software engineering)

    Bisection (software engineering)

    Bisection is a method used in software development to identify change sets that result in a specific behavior change. It is mostly employed for finding the patch that introduced a bug. Another application area is finding the patch that indirectly fixed a bug. == Overview == The process of locating the changeset that introduced a specific regression was described as "source change isolation" in 1997 by Brian Ness and Viet Ngo of Cray Research. Regression testing was performed on Cray's compilers in editions comprising one or more changesets. Editions with known regressions could not be validated until developers addressed the problem. Source change isolation narrowed the cause to a single changeset that could then be excluded from editions, unblocking them with respect to this problem, while the author of the change worked on a fix. Ness and Ngo outlined linear search and binary search methods of performing this isolation. Code bisection has the goal of minimizing the effort to find a specific change set. It employs a divide and conquer algorithm that depends on having access to the code history which is usually preserved by revision control in a code repository. == Bisection method == === Code bisection algorithm === Code history has the structure of a directed acyclic graph which can be topologically sorted. This makes it possible to use a divide and conquer search algorithm which: splits up the search space of candidate revisions tests for the behavior in question reduces the search space depending on the test result re-iterates the steps above until a range with at most one bisectable patch candidate remains === Algorithmic complexity === Bisection is in LSPACE having an algorithmic complexity of O ( log ⁡ N ) {\displaystyle O(\log N)} with N {\displaystyle N} denoting the number of revisions in the search space, and is similar to a binary search. === Desirable repository properties === For code bisection it is desirable that each revision in the search space can be built and tested independently. === Monotonicity === For the bisection algorithm to identify a single changeset which caused the behavior being tested to change, the behavior must change monotonically across the search space. For a Boolean function such as a pass/fail test, this means that it only changes once across all changesets between the start and end of the search space. If there are multiple changesets across the search space where the behavior being tested changes between false and true, then the bisection algorithm will find one of them, but it will not necessarily be the root cause of the change in behavior between the start and the end of the search space. The root cause could be a different changeset, or a combination of two or more changesets across the search space. To help deal with this problem, automated tools allow specific changesets to be ignored during a bisection search. == Automation support == Although the bisection method can be completed manually, one of its main advantages is that it can be easily automated. It can thus fit into existing test automation processes: failures in exhaustive automated regression tests can trigger automated bisection to localize faults. Ness and Ngo focused on its potential in Cray's continuous delivery-style environment in which the automatically isolated bad changeset could be automatically excluded from builds. The revision control systems Fossil, Git and Mercurial have built-in functionality for code bisection. The user can start a bisection session with a specified range of revisions from which the revision control system proposes a revision to test, the user tells the system whether the revision tested as "good" or "bad", and the process repeats until the specific "bad" revision has been identified. Other revision control systems, such as Bazaar or Subversion, support bisection through plugins or external scripts. Phoronix Test Suite can do bisection automatically to find performance regressions.

    Read more →
  • Harold Borko

    Harold Borko

    Harold Borko (1922-2012) was an American psychologist and researcher working primarily in the field of information science. == Biography == Borko was born in 1922 in New York City, New York. After serving in the US Army from 1942 to 1946 he obtained a BA in Psychology from the University of California, Los Angeles in 1948 and both his MA and PhD from the University of Southern California in Psychology in 1952. He returned to the army as a psychologist until 1956 after which he began a career working in and teaching information science. He died in California in 2012. == Information Science Career == After leaving the military Borko began working at the RAND Corporation as a Systems Training Specialist in 1956 and moved to the Systems Development Corporation a year later working in the Language Processing and Retrieval department. Alongside this work he taught Psychology at USC from 1957-65 and then moved into teaching Library Science at UCLA from 1965. In 1967 Borko left his role at the Systems Development Corporation and continued as a full-time professor at UCLA until his retirement in 1993.. From 1961 to 1995 Borko authored and co-authored over 100 articles on new developments in the field as well as the historiography of information science. He served as an editor of the Journal of Educational Data Processing from 1963-1975 and as President of the American Society for Information Science in 1966 == Partial list of works == Borko, H. (1962, May). The construction of an empirically based mathematically derived classification system. In Proceedings of the May 1-3, 1962, spring joint computer conference (pp. 279-289). Borko, H., & Bernick, M. (1963). Automatic document classification. Journal of the ACM (JACM), 10(2), 151-162. Borko, H. (1964). The Storage and Retrieval of Educational Information. Journal of Teacher Education, 15(4), 449-452. Borko, H. (1964). Measuring the reliability of subject classification by men and machines. American Documentation, 15(4), 268-273. Borko, H. (1965). The conceptual foundations of information systems. Borko, H. (1968), Information science: What is it?†. Amer. Doc., 19: 3-5. https://doi.org/10.1002/asi.5090190103 Borko, H. (1970). Experiments in book indexing by computer. Information storage and retrieval, 6(1), 5-16. Borko, H. (1985). An introduction to computer-based library systems (Lucy A. Tedd). Education for Information, 3(1), 61.

    Read more →