AI Headshot Examples

AI Headshot Examples — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Human Race Machine

    Human Race Machine

    The Human Race Machine (HRM) is a computerized console composed of four different programs. The Human Race Machine program allows participants to see themselves with the facial characteristics of six different races: Asian, White, African, Middle Eastern, and Indian, mapped onto their own face. The Age Machine allows viewers see an aged version of his or her face. A version of this methodology has been used for over twenty years by the FBI and the National Center for Missing and Exploited Children to help locate kidnap victims and missing children. The Couples Machine combines photographs of two people in different percentages to show the appearance of their child. The Anomaly Machine lets viewers see themselves with facial anomalies. The HRM was created by artist Nancy Burson and David Kramlich; it uses morphing technology. It was shown on Oprah on 2006-02-16.

    Read more →
  • List of information schools

    List of information schools

    This list of information schools, sometimes abbreviated to iSchools, includes members of the iSchools organization. The iSchools organization reflects a consortium of over 130 information schools across the globe. == History == The first iSchools Caucus was formed in 1988 by Syracuse, Pittsburgh, and Drexel and was called the Gang of Three (sometimes gang of four with Rutgers). Syracuse renamed the School of Library Science as the School of Information Studies in 1974, and is considered as the first “iSchool” in history. The group was formally named "the iSchools Caucus" or more casually, the iCaucus. By 2003, the group expanded to include the Universities of Michigan, Washington, Illinois, UNC, Florida State, Indiana, and Texas, and was called the Gang of Ten. The current iSchools Caucus organization was formalized by 2005, with additions of UC Berkeley, UC Irvine, UCLA, Penn State, Georgia Tech, Maryland, Toronto, Carnegie Mellon and Singapore Management University. == iSchools organization == The iSchools promote an interdisciplinary approach to understanding the opportunities and challenges of information management, with a core commitment to concepts like universal access and user-centered organization of information. The field is concerned broadly with questions of design and preservation across information spaces, from digital and virtual spaces such as online communities, social networking, the World Wide Web, and databases to physical spaces such as libraries, museums, collections, and other repositories. "School of Information", "Department of Information Studies", or "Information Department" are often the names of the participating organizations. Degree programs at iSchools include course offerings in areas such as information architecture, design, policy, and economics; knowledge management, user experience design, and usability; preservation and conservation; librarianship and library administration; the sociology of information; and human-computer interaction and computer science. === Leadership === The executive committee of the iSchools is made up of the current chair (Ina Fourie, University of Pretoria, South Africa), past chair (Gillian Oliver, Monash University, Australia) and the chair elect (Javed Mostafa, University of Toronto Canada), plus representatives from the three regions (North America, Europe, and Asia-Pacific). The current executive director is Slava Sterzer. == Member institutions == Between 2010 and 2026, the organization expanded globally beyond North America, growing to 133 member schools as of March 2026. For an updated and complete list of member schools, please visit the member database of the iSchools. == iConferences == Members of the iSchools organize a regular academic conference, known as the iConference, hosted by a different member institution each year. September 2005: Pennsylvania State University October 2006: University of Michigan February 2008: University of California, Los Angeles February 2009: University of North Carolina February 2010: University of Illinois at Urbana-Champaign February 2011: University of Washington, Seattle February 2012: University of Toronto February 2013: University of North Texas March 2014: Humboldt-Universität zu Berlin March 2015: University of California, Irvine March 2016: Drexel University March 2017: Wuhan University March 2018: University of Sheffield and Northumbria University March 2019: University of Maryland March 2020: University of Borås (virtual only) March 2021: Renmin University of China (virtual only) February/March 2022: University of Texas at Austin, University College Dublin & Kyushu University (virtual only) March 2023: Universitat Oberta de Catalunya March 2024: Jilin University March 2025: Indiana University March/April 2026: Edinburgh Napier University 2027: Victoria University of Wellington == Other schools of information == Other information schools and programs include: Documentation Research and Training Centre, Indian Statistical Institute, Bangalore San Jose State University, School of Information University of Southern California Library Science Degree Ankara University, Department of Information and Records Management, Ankara/Turkey Marmara University, Department of Information and Records Management, Istanbul/Turkey University of Kelaniya, Department of Library and Information Science, Kelaniya/Sri Lanka University of Colombo, National Institute of Library and Information Science (NILIS), Colombo/Sri Lanka Chicago State University, Department of Information Studies

    Read more →
  • DPVweb

    DPVweb

    DPVweb is a database for virologists working on plant viruses combining taxonomic, bioinformatic and symptom data. == Description == DPVweb is a central web-based source of information about viruses, viroids and satellites of plants, fungi and protozoa. It provides comprehensive taxonomic information, including brief descriptions of each family and genus, and classified lists of virus sequences. It makes use of a large database that also holds detailed, curated, information for all sequences of viruses, viroids and satellites of plants, fungi and protozoa that are complete or that contain at least one complete gene. There are currently about 10,000 such sequences. For comparative purposes, DPVweb also contains a representative sequence of all other fully sequenced virus species with an RNA or single-stranded DNA genome. For each curated sequence the database contains the start and end positions of each feature (gene, non-translated region, etc.), and these have been checked for accuracy. As far as possible, the nomenclature for genes and proteins are standardized within genera and families. Sequences of features (either as DNA or amino acid sequences) can be directly downloaded from the website in FASTA format. The sequence information can also be accessed via client software for personal computers. == History == The Descriptions of Plant Viruses (DPVs) were first published by the Association of Applied Biologists in 1970 as a series of leaflets, each one written by an expert describing a particular plant virus. In 1998 all of the 354 DPVs published in paper were scanned, and converted into an electronic format in a database and distributed on CDROM. In 2001 the descriptions were made available on the new DPVweb site, providing open access to the now 400+ DPVs (currently 415) as well as taxonomic and sequence data on all plant viruses. == Uses == DPVweb is an aid to researchers in the field of plant virology as well as an educational resource for students of virology and molecular biology. The site provides a single point of access for all known plant virus genome sequences making it easy to collect these sequences together for further analysis and comparison. Sequence data from the DPVweb database have proved valuable for a number of projects: survey of codon usage bias amongst all plant viruses, two-way comparisons between comprehensive sets of sequences from the families Flexiviridae and Potyviridae that have helped inform taxonomy and clarify genus and species discrimination criteria, a survey and verification of the polyprotein cleavage sites within the family Potyviridae.

    Read more →
  • Timeline of algorithms

    Timeline of algorithms

    The following timeline of algorithms outlines the development of algorithms (mainly "mathematical recipes") since their inception. == Antiquity == Before – writing about "recipes" (on cooking, rituals, agriculture and other themes) c. 1700–2000 BC – Egyptians develop earliest known algorithms for multiplying two numbers c. 1600 BC – Babylonians develop earliest known algorithms for factorization and finding square roots c. 300 BC – Euclid's algorithm c. 200 BC – the Sieve of Eratosthenes 263 AD – Gaussian elimination described by Liu Hui == Medieval Period == 628 – Chakravala method described by Brahmagupta c. 820 – Al-Khawarizmi described algorithms for solving linear equations and quadratic equations in his Algebra; the word algorithm comes from his name 825 – Al-Khawarizmi described the algorism, algorithms for using the Hindu–Arabic numeral system, in his treatise On the Calculation with Hindu Numerals, which was translated into Latin as Algoritmi de numero Indorum, where "Algoritmi", the translator's rendition of the author's name gave rise to the word algorithm (Latin algorithmus) with a meaning "calculation method" c. 850 – cryptanalysis and frequency analysis algorithms developed by Al-Kindi (Alkindus) in A Manuscript on Deciphering Cryptographic Messages, which contains algorithms on breaking encryptions and ciphers c. 1025 – Ibn al-Haytham (Alhazen), was the first mathematician to derive the formula for the sum of the fourth powers, and in turn, he develops an algorithm for determining the general formula for the sum of any integral powers c. 1400 – Ahmad al-Qalqashandi gives a list of ciphers in his Subh al-a'sha which include both substitution and transposition, and for the first time, a cipher with multiple substitutions for each plaintext letter; he also gives an exposition on and worked example of cryptanalysis, including the use of tables of letter frequencies and sets of letters which can not occur together in one word == Before 1940 == 1540 – Lodovico Ferrari discovered a method to find the roots of a quartic polynomial 1545 – Gerolamo Cardano published Cardano's method for finding the roots of a cubic polynomial 1614 – John Napier develops method for performing calculations using logarithms 1671 – Newton–Raphson method developed by Isaac Newton 1690 – Newton–Raphson method independently developed by Joseph Raphson 1706 – John Machin develops a quickly converging inverse-tangent series for π and computes π to 100 decimal places 1768 – Leonhard Euler publishes his method for numerical integration of ordinary differential equations in problem 85 of Institutiones calculi integralis 1789 – Jurij Vega improves Machin's formula and computes π to 140 decimal places, 1805 – FFT-like algorithm known by Carl Friedrich Gauss 1842 – Ada Lovelace writes the first algorithm for a computing engine 1903 – A fast Fourier transform algorithm presented by Carle David Tolmé Runge 1918 - Soundex 1926 – Borůvka's algorithm 1926 – Primary decomposition algorithm presented by Grete Hermann 1927 – Hartree–Fock method developed for simulating a quantum many-body system in a stationary state. 1934 – Delaunay triangulation developed by Boris Delaunay 1936 – Turing machine, an abstract machine developed by Alan Turing, with others developed the modern notion of algorithm. == 1940s == 1942 – A fast Fourier transform algorithm developed by G.C. Danielson and Cornelius Lanczos 1945 – Merge sort developed by John von Neumann 1947 – Simplex algorithm developed by George Dantzig == 1950s == 1950 – Hamming codes developed by Richard Hamming 1952 – Huffman coding developed by David A. Huffman 1953 – Simulated annealing introduced by Nicholas Metropolis 1954 – Radix sort computer algorithm developed by Harold H. Seward 1964 – Box–Muller transform for fast generation of normally distributed numbers published by George Edward Pelham Box and Mervin Edgar Muller. Independently pre-discovered by Raymond E. A. C. Paley and Norbert Wiener in 1934. 1956 – Kruskal's algorithm developed by Joseph Kruskal 1956 – Ford–Fulkerson algorithm developed and published by R. Ford Jr. and D. R. Fulkerson 1957 – Prim's algorithm developed by Robert Prim 1957 – Bellman–Ford algorithm developed by Richard E. Bellman and L. R. Ford, Jr. 1959 – Dijkstra's algorithm developed by Edsger Dijkstra 1959 – Shell sort developed by Donald L. Shell 1959 – De Casteljau's algorithm developed by Paul de Casteljau 1959 – QR factorization algorithm developed independently by John G.F. Francis and Vera Kublanovskaya 1959 – Rabin–Scott powerset construction for converting NFA into DFA published by Michael O. Rabin and Dana Scott == 1960s == 1960 – Karatsuba multiplication 1961 – CRC (Cyclic redundancy check) invented by W. Wesley Peterson 1962 – AVL trees 1962 – Quicksort developed by C. A. R. Hoare 1962 – Bresenham's line algorithm developed by Jack E. Bresenham 1962 – Gale–Shapley 'stable-marriage' algorithm developed by David Gale and Lloyd Shapley 1964 – Heapsort developed by J. W. J. Williams 1964 – multigrid methods first proposed by R. P. Fedorenko 1965 – Cooley–Tukey algorithm rediscovered by James Cooley and John Tukey 1965 – Levenshtein distance developed by Vladimir Levenshtein 1965 – Cocke–Younger–Kasami (CYK) algorithm independently developed by Tadao Kasami 1965 – Buchberger's algorithm for computing Gröbner bases developed by Bruno Buchberger 1965 – LR parsers invented by Donald Knuth 1966 – Dantzig algorithm for shortest path in a graph with negative edges 1967 – Viterbi algorithm proposed by Andrew Viterbi 1967 – Cocke–Younger–Kasami (CYK) algorithm independently developed by Daniel H. Younger 1968 – A graph search algorithm described by Peter Hart, Nils Nilsson, and Bertram Raphael 1968 – Risch algorithm for indefinite integration developed by Robert Henry Risch 1969 – Strassen algorithm for matrix multiplication developed by Volker Strassen == 1970s == 1970 – Dinic's algorithm for computing maximum flow in a flow network by Yefim (Chaim) A. Dinitz 1970 – Knuth–Bendix completion algorithm developed by Donald Knuth and Peter B. Bendix 1970 – BFGS method of the quasi-Newton class 1970 – Needleman–Wunsch algorithm published by Saul B. Needleman and Christian D. Wunsch 1972 – Edmonds–Karp algorithm published by Jack Edmonds and Richard Karp, essentially identical to Dinic's algorithm from 1970 1972 – Graham scan developed by Ronald Graham 1972 – Red–black trees and B-trees discovered 1973 – RSA encryption algorithm discovered by Clifford Cocks 1973 – Jarvis march algorithm developed by R. A. Jarvis 1973 – Hopcroft–Karp algorithm developed by John Hopcroft and Richard Karp 1974 – Pollard's p − 1 algorithm developed by John Pollard 1974 – Quadtree developed by Raphael Finkel and J.L. Bentley 1975 – Genetic algorithms popularized by John Holland 1975 – Pollard's rho algorithm developed by John Pollard 1975 – Aho–Corasick string matching algorithm developed by Alfred V. Aho and Margaret J. Corasick 1975 – Cylindrical algebraic decomposition developed by George E. Collins 1976 – Salamin–Brent algorithm independently discovered by Eugene Salamin and Richard Brent 1976 – Knuth–Morris–Pratt algorithm developed by Donald Knuth and Vaughan Pratt and independently by J. H. Morris 1977 – Boyer–Moore string-search algorithm for searching the occurrence of a string into another string. 1977 – RSA encryption algorithm rediscovered by Ron Rivest, Adi Shamir, and Len Adleman 1977 – LZ77 algorithm developed by Abraham Lempel and Jacob Ziv 1977 – multigrid methods developed independently by Achi Brandt and Wolfgang Hackbusch 1978 – LZ78 algorithm developed from LZ77 by Abraham Lempel and Jacob Ziv 1978 – Bruun's algorithm proposed for powers of two by Georg Bruun 1979 – Khachiyan's ellipsoid method developed by Leonid Khachiyan 1979 – ID3 decision tree algorithm developed by Ross Quinlan == 1980s == 1980 – Brent's Algorithm for cycle detection Richard P. Brendt 1981 – Quadratic sieve developed by Carl Pomerance 1981 – Smith–Waterman algorithm developed by Temple F. Smith and Michael S. Waterman 1983 – Simulated annealing developed by S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi 1983 – Classification and regression tree (CART) algorithm developed by Leo Breiman, et al. 1984 – LZW algorithm developed from LZ78 by Terry Welch 1984 – Karmarkar's interior-point algorithm developed by Narendra Karmarkar 1984 – ACORN PRNG discovered by Roy Wikramaratna and used privately 1985 – Simulated annealing independently developed by V. Cerny 1985 – Car–Parrinello molecular dynamics developed by Roberto Car and Michele Parrinello 1985 – Splay trees discovered by Sleator and Tarjan 1986 – Blum Blum Shub proposed by L. Blum, M. Blum, and M. Shub 1986 – Push relabel maximum flow algorithm by Andrew Goldberg and Robert Tarjan 1986 – Barnes–Hut tree method developed by Josh Barnes and Piet Hut for fast approximate simulation of n-body problems 1987 – Fast multipole method developed by Leslie Greengard and Vladimir

    Read more →
  • Elastix (image registration)

    Elastix (image registration)

    Elastix is an image registration toolbox built upon the Insight Segmentation and Registration Toolkit (ITK). It is entirely open-source and provides a wide range of algorithms employed in image registration problems. Its components are designed to be modular to ease a fast and reliable creation of various registration pipelines tailored for case-specific applications. It was first developed by Stefan Klein and Marius Staring under the supervision of Josien P.W. Pluim at Image Sciences Institute (ISI). Its first version was command-line based, allowing the final user to employ scripts to automatically process big data-sets and deploy multiple registration pipelines with few lines of code. Nowadays, to further widen its audience, a version called SimpleElastix is also available, developed by Kasper Marstal, which allows the integration of elastix with high level languages, such as Python, Java, and R. == Image registration fundamentals == Image registration is a well-known technique in digital image processing that searches for the geometric transformation that, applied to a moving image, obtains a one-to-one map with a target image. Generally, the images acquired from different sensors (multimodal), time instants (multitemporal), and points of view (multiview) should be correctly aligned to proceed with further processing and feature extraction. Even though there are a plethora of different approaches to image registration, the majority is composed of the same macro building blocks, namely the transformation, the interpolator, the metric, and the optimizer. Registering two or more images can be framed as an optimization problem that requires multiple iterations to converge to the best solution. Starting from an initial transformation computed from the image moments the optimization process searches for the best transformation parameters based on the value of the selected similarity metric. The figure on the right shows the high-level representation of the registration of two images, where the reference remains constant during the entire process, while the moving one will be transformed according to the transformation parameters. In other words, the registration ends when the similarity metric, which is a mathematical function with a certain number of parameters to be optimized, reaches the optimal value which is highly dependent on the specific application. == Main building blocks == Following the structure of the image registration workflow, the elastix toolbox proposes a modular solution that implements for each of the building blocks different algorithms, highly employed in medical image registration, and helps the final users to build their specific pipeline by selecting the most suitable algorithm for each of the main building blocks. Each block is easily configurable both by selecting pre-defined initialization values or by trying multiple sets of parameters and then choosing the most performing one. The registration is performed on images, and the elastix toolbox supports all the data formats supported by ITK, ranging from JPEG and PNG to medical standard formats such as DICOM and NIFTI. It also stores physical pixel spacing, the origin and the relative position to an external world reference system, when provided in the metadata, to facilitate the registration process, especially in medical field applications. === Transformation === The transformation is an essential building block, since it defines the allowable transformations. In image registration, the main distinction can be done between parallel-to-parallel and parallel-to-non parallel (deformable) line mapping transformations. In the elastix toolbox, the final users can select one transformation or compose more transformations either through addition or via composition. Below are reported the different transformation models in order of increasing flexibility, along with the corresponding elastix class names between brackets. Translation (TranslationTransform) allows only translations Rigid (EulerTransform) expands the translation adding rotations and the object is seen as a rigid body Similarity (SimilarityTransform) expands the rigid transformation by introducing isotropic scaling Affine (AffineTransform) expands the rigid transformation allowing both scaling and shear B-splines (BSplineTransform) is a deformable transformation usually preceded by a rigid or affine one Thin-plate splines (SplineKernelTransform) is a deformable transformation belonging to the class of kernel-based transformations that is a composition of and affine and a non-rigid part === Metric === The similarity metric is the mathematical function whose parameters should be optimized to reach the desired registration, and, during the process, it is computed multiple times. Below are reported the available metrics computed employing the reference and the transformed images and the corresponding elastix class names between brackets. Mean squared difference (AdvancedMeanSquares) to be used for mono-modal applications Normalized correlation coefficient (AdvancedNormalizedCorrelation) to be used for images that have an intensity linear relationship Mutual information (AdvancedMattesMutualInformation) to be used for both mono- and multi-modal applications and optimized to reach better performance compared to the normalized version Normalized mutual information (NormalizedMutualInformation) for both mono- and multi-modal applications Kappa statistic (AdvancedKappaStatistic) to be used only for binary images === Sampler === For the computation of the similarity metrics, it is not always necessary to consider all the voxels and, sometimes, it can be useful to use only a fraction of the voxels of the images, i.e. to reduce the execution time for big input images. Below are reported the available criteria for selecting a fraction of the voxels for the similarity metric computation and the corresponding elastix class names between brackets. Full (Full) to employ all the voxels Grid (Grid) to employ a regular grid defined by the user to downsample the image Random (Random) to randomly select a percentage of voxels defined by the users (all voxels have equal probability to be selected) Random coordinate (RandomCoordinate) like the random criterion, but in this case also off-grid positions can be selected to simplify the optimization process === Interpolator === After the application of the transformation, it may occur that the voxels used for the similarity metric computation are at non-voxel positions, so intensity interpolation should be performed to ensure the correctness of the computed values. Below are reported the implemented interpolators and the corresponding elastix class names between brackets. Nearest neighbor (NearestNeighborInterpolator) exploits little resources, but gives low quality results Linear (LinearInterpolator) is sufficient in general applications N-th order B-spline (BSplineInterpolator) can be used to increase the order N, increasing quality and computation time. N=0 and N=1 indicate the nearest neighbor and linear cases respectively. === Optimizer === The optimizer defines the strategy employed for searching the best transformation parameter to reach the correct registration, and it is commonly an iterative strategy. Below are reported some of the implemented optimization strategies. Gradient descent Robbins-Monro, similar to the gradient descent, but employing an approximation of the cost function derivatives A wider range of optimizers is also available, such as Quasi-Newton or evolutionary strategies. === Other features === The elastix software also offers other features that can be employed to speed up the registration procedure and to provide more advanced algorithms to the end-users. Some examples are the introduction of blur and Gaussian pyramid to reduce data complexity, and multi-image and multi-metric framework to deal with more complex applications. == Applications == Elastix has applications mainly in the medical field, where image registration is fundamental to get comprehensive information regarding the analysed anatomical region. It is widely employed in image-guided surgery, tumour monitoring, and treatment assessment. For example, in radiotherapy planning, image registration allows to correctly deliver the treatment and evaluate the obtained results. Thanks to the wide range of implemented algorithms, the use of the elastix software allows physicians and researchers to test different registration pipelines from the simplest to more complex ones, and to save the best one as a configuration file. This file and the fact that the software is completely open-source makes it easy to reproduce the work, that can help supporting the open science paradigm, and allows fast reuse on different patients data. In image-guided surgery, registration time and accuracy are critical points, considering that, during the registration, the patient is on the operating table, and the imag

    Read more →
  • Parchive

    Parchive

    Parchive (a portmanteau of parity archive, and formally known as Parity Volume Set Specification) is an erasure code system that produces par files for checksum verification of data integrity, with the capability to perform data recovery operations that can repair or regenerate corrupted or missing data. Parchive was originally written to solve the problem of reliable file sharing on Usenet, but it can be used for protecting any kind of data from data corruption, disc rot, bit rot, and accidental or malicious damage. Despite the name, Parchive uses more advanced techniques (specifically error correction codes) than simplistic parity methods of error detection. As of 2015, PAR1 is obsolete, PAR2 is mature for widespread use, and PAR3 is a discontinued experimental version developed by MultiPar author Yutaka Sawada. The original SourceForge Parchive project has been inactive since April 30, 2015. A new PAR3 specification has been worked on since April 28, 2019 by PAR2 specification author Michael Nahas. An alpha version of the PAR3 specification has been published on January 29, 2022 while the program itself is being developed. == History == Parchive was intended to increase the reliability of transferring files via Usenet newsgroups. Usenet was originally designed for informal conversations, and the underlying protocol, NNTP was not designed to transmit arbitrary binary data. Another limitation, which was acceptable for conversations but not for files, was that messages were normally fairly short in length and limited to 7-bit ASCII text. Various techniques were devised to send files over Usenet, such as uuencoding and Base64. Later Usenet software allowed 8 bit Extended ASCII, which permitted new techniques like yEnc. Large files were broken up to reduce the effect of a corrupted download, but the unreliable nature of Usenet remained. With the introduction of Parchive, parity files could be created that were then uploaded along with the original data files. If any of the data files were damaged or lost while being propagated between Usenet servers, users could download parity files and use them to reconstruct the damaged or missing files. Parchive included the construction of small index files (.par in version 1 and .par2 in version 2) that do not contain any recovery data. These indexes contain file hashes that can be used to quickly identify the target files and verify their integrity. Because the index files were so small, they minimized the amount of extra data that had to be downloaded from Usenet to verify that the data files were all present and undamaged, or to determine how many parity volumes were required to repair any damage or reconstruct any missing files. They were most useful in version 1 where the parity volumes were much larger than the short index files. These larger parity volumes contain the actual recovery data along with a duplicate copy of the information in the index files (which allows them to be used on their own to verify the integrity of the data files if there is no small index file available). In July 2001, Tobias Rieper and Stefan Wehlus proposed the Parity Volume Set specification, and with the assistance of other project members, version 1.0 of the specification was published in October 2001. Par1 used Reed–Solomon error correction to create new recovery files. Any of the recovery files can be used to rebuild a missing file from an incomplete download. Version 1 became widely used on Usenet, but it did suffer some limitations: It was restricted to handle at most 255 files. The recovery files had to be the size of the largest input file, so it did not work well when the input files were of various sizes. (This limited its usefulness when not paired with the proprietary RAR compression tool.) The recovery algorithm had a bug, due to a flaw in the academic paper on which it was based. It was strongly tied to Usenet and it was felt that a more general tool might have a wider audience. In January 2002, Howard Fukada proposed that a new Par2 specification should be devised with the significant changes that data verification and repair should work on blocks of data rather than whole files, and that the algorithm should switch to using 16 bit numbers rather than the 8 bit numbers that PAR1 used. Michael Nahas and Peter Clements took up these ideas in July 2002, with additional input from Paul Nettle and Ryan Gallagher (who both wrote Par1 clients). Version 2.0 of the Parchive specification was published by Michael Nahas in September 2002. Peter Clements then went on to write the first two Par2 implementations, QuickPar and par2cmdline. Abandoned since 2004, Paul Houle created phpar2 to supersede par2cmdline. Yutaka Sawada created MultiPar to supersede QuickPar. MultiPar uses par2j.exe (which is partially based on par2cmdline's optimization techniques) to use as MultiPar's backend engine. == Versions == Versions 1 and 2 of the file format are incompatible. (However, many clients support both.) === Par1 === For Par1, the files f1, f2, ..., fn, the Parchive consists of an index file (f.par), which is CRC type file with no recovery blocks, and a number of "parity volumes" (f.p01, f.p02, etc.). Given all of the original files except for one (for example, f2), it is possible to create the missing f2 given all of the other original files and any one of the parity volumes. Alternatively, it is possible to recreate two missing files from any two of the parity volumes and so forth. Par1 supports up to a total of 256 source and recovery files. === Par2 === Par2 files generally use this naming/extension system: filename.vol000+01.PAR2, filename.vol001+02.PAR2, filename.vol003+04.PAR2, filename.vol007+06.PAR2, etc. The number after the "+" in the filename indicates how many blocks it contains, and the number after "vol" indicates the number of the first recovery block within the PAR2 file. If an index file of a download states that 4 blocks are missing, the easiest way to repair the files would be by downloading filename.vol003+04.PAR2. However, due to the redundancy, filename.vol007+06.PAR2 is also acceptable. There is also an index file filename.PAR2, it is identical in function to the small index file used in PAR1. Par2 specification supports up to 32,768 source blocks and up to 65,535 recovery blocks. Input files are split into multiple equal-sized blocks so that recovery files do not need to be the size of the largest input file. Although Unicode is mentioned in the PAR2 specification as an option, most PAR2 implementations do not support Unicode. Directory support is included in the PAR2 specification, but most or all implementations do not support it. === Par3 === The Par3 specification was originally planned to be published as an enhancement over the Par2 specification. However, to date, it has remained closed source by specification owner Yutaka Sawada. A discussion on a new format started in the GitHub issue section of the maintained fork par2cmdline on January 29, 2019. The discussion led to a new format which is also named as Par3. The new Par3 format's specification is published on GitHub, but remains being an alpha draft as of January 28, 2022. The specification is written by Michael Nahas, the author of Par2 specification, with the help from Yutaka Sawada, animetosho and malaire. The new format claims to have multiple advantages over the Par2 format, including support for: More than 216 files and more than 216 blocks. Packing small files into one block, as well as deduplication when a block appears in multiple files. UTF-8 file names. File permissions, hard links, symbolic/soft links, and empty directories. Embedding PAR data inside other formats, like ZIP archives or ISO disk images. "Incremental backups", where a user creates recovery files for some file or folder, change some data, and create new recovery files reusing some of the older files. More error correction code algorithms (such as LDPC and sparse random matrix). BLAKE3 hashes, dropping support for the MD5 hashes used in PAR2. == Software == === Multi-platform === par2+tbb (GPLv2) — a concurrent (multithreaded) version of par2cmdline 0.4 using TBB. Only compatible with x86 based CPUs. It is available in the FreeBSD Ports system as par2cmdline-tbb. Original par2cmdline — (obsolete). Available in the FreeBSD Ports system as par2cmdline. par2cmdline maintained fork by BlackIkeEagle. par2cmdline-mt is another multithreaded version of par2cmdline using OpenMP, GPLv2, or later. Currently merged into BlackIkeEagle's fork and maintained there. ParPar (CC0) is a high performance, multithreaded PAR2 client and Node.js library. Does not support verifying or repair, it can currently only create PAR2 archives. par2deep (LGPL-3.0) — Produce, verify and repair par2 files recursively, both on the command line as well as with the aid of a graphical user interface. It is available in the Python Package Index system as par2deep. par2cron (MIT License) is an o

    Read more →
  • AVT Statistical filtering algorithm

    AVT Statistical filtering algorithm

    AVT Statistical filtering algorithm is an approach to improving quality of raw data collected from various sources. It is most effective in cases when there is inband noise present. In those cases AVT is better at filtering data then, band-pass filter or any digital filtering based on variation of. Conventional filtering is useful when signal/data has different frequency than noise and signal/data is separated/filtered by frequency discrimination of noise. Frequency discrimination filtering is done using Low Pass, High Pass and Band Pass filtering which refers to relative frequency filtering criteria target for such configuration. Those filters are created using passive and active components and sometimes are implemented using software algorithms based on Fast Fourier transform (FFT). AVT filtering is implemented in software and its inner working is based on statistical analysis of raw data. When signal frequency/(useful data distribution frequency) coincides with noise frequency/(noisy data distribution frequency) we have inband noise. In this situations frequency discrimination filtering does not work since the noise and useful signal are indistinguishable and where AVT excels. To achieve filtering in such conditions there are several methods/algorithms available which are briefly described below. == Averaging algorithm == Collect n samples of data Calculate average value of collected data Present/record result as actual data == Median algorithm == Collect n samples of data Sort the data in ascending or descending order. Note that order does not matter Select the data that happen to be in n/2 position and present/record it as final result representing data sample == AVT algorithm == AVT algorithm stands for Antonyan Vardan Transform and its implementation explained below. Collect n samples of data Calculate the standard deviation and average value Drop any data that is greater or less than average ± one standard deviation Calculate average value of remaining data Present/record result as actual value representing data sample This algorithm is based on amplitude discrimination and can easily reject any noise that is not like actual signal, otherwise statistically different than 1 standard deviation of the signal. Note that this type of filtering can be used in situations where the actual environmental noise is not known in advance. Notice that it is preferable to use the median in above steps than average. Originally the AVT algorithm used average value to compare it with results of median on the data window. == Filtering algorithms comparison == Using a system that has signal value of 1 and has noise added at 0.1% and 1% levels will simplify quantification of algorithm performance. The R script is used to create pseudo random noise added to signal and analyze the results of filtering using several algorithms. Please refer to "Reduce Inband Noise with the AVT Algorithm" article for details. This graphs show that AVT algorithm provides best results compared with Median and Averaging algorithms while using data sample size of 32, 64 and 128 values. Note that this graph was created by analyzing random data array of 10000 values. Sample of this data is graphically represented below. From this graph it is apparent that AVT outperforms other filtering algorithms by providing 5% to 10% more accurate data when analyzing same datasets. Considering random nature of noise used in this numerical experiment that borderlines worst case situation where actual signal level is below ambient noise the precision improvements of processing data with AVT algorithm are significant. == AVT algorithm variations == === Cascaded AVT === In some situations better results can be obtained by cascading several stages of AVT filtering. This will produce singular constant value which can be used for equipment that has known stable characteristics like thermometers, thermistors and other slow acting sensors. === Reverse AVT === Collect n samples of data Calculate the standard deviation and average value Drop any data that is within one standard deviation ± average band Calculate average value of remaining data Present/record result as actual data This is useful for detecting minute signals that are close to background noise level. == Possible applications and uses == Use to filter data that is near or below noise level Used in planet detection to filter out raw data from the Kepler space telescope Filter out noise from sound sources where all other filtering methods (Low-pass filter, High-pass filter, Band-pass filter, Digital filter) fail. Pre-process scientific data for data analysis (Smoothness) before plotting see (Plot (graphics)) Used in SETI (Search for extraterrestrial intelligence) for detecting/distinguishing extraterrestrial signals from cosmic background Use AVT as image filtering algorithm to detect altered images. This image of Jupiter generated from this program, detecting alterations in original picture that was modified to be visually appealing by applying filters. Another version of this comparison is the Reverse AVT filter applied to the same original Jupiter Image, where we only see that altered portion as Noise that was eliminated by AVT algorithm. Use AVT as image filtering algorithm to estimate data density from images. Picture of Pillars of Creation Nebula shows data density in filtered images from Hubble and Webb. Note that image on the left has big patches of missing data marked with simpler color patterns.

    Read more →
  • Information access

    Information access

    Information access is the freedom or ability to identify, obtain and make use of database or information effectively. There are various research efforts in information access for which the objective is to simplify and make it more effective for human users to access and further process large and unwieldy amounts of data and information. == Technology == Several technologies applicable to the general area are Information Retrieval, Text Mining, Machine Translation, and Text Categorisation. During discussions on free access to information as well as on information policy, information access is understood as concerning the insurance of free and closed access to information. Information access covers many issues including copyright, open source, privacy, and security. == Groups == Groups such as the American Library Association, the American Association of Law Libraries, Ralph Nader's Taxpayers Assets Project have advocated for free access to legal information. The vendor neutral citation movement in the legal field is working to ensure that courts will accept citations from cases on the web which do not have the traditional (copyrighted) page numbers from the West Publishing company. There is a worldwide Free Access to Law Movement which advocates free access to legal information. The Wired article "Who Owns The Law" is an introduction to the access to legal information issue. Postsecondary organizations such as K-12 work to share information. They feel it is a legal and moral obligation to provide access (including to people with disabilities or impairments) to information through the services and programs they offer. Some effects of charging for information access, such as literature searches for physicians, is studied in the article "Fee or Free: The Effect of Charging on Information Demand". In this study, a $5 charge resulted in a 77% decrease in searches.

    Read more →
  • Network Abstraction Layer

    Network Abstraction Layer

    The Network Abstraction Layer (NAL) is a part of the H.264/AVC and HEVC video coding standards. The main goal of the NAL is the provision of a "network-friendly" video representation addressing "conversational" (video telephony) and "non conversational" (storage, broadcast, or streaming) applications. NAL has achieved a significant improvement in application flexibility relative to prior video coding standards. == Introduction == An increasing number of services and growing popularity of high definition TV are creating greater needs for higher coding efficiency. Moreover, other transmission media such as cable modem, xDSL, or UMTS offer much lower data rates than broadcast channels, and enhanced coding efficiency can enable the transmission of more video channels or higher quality video representations within existing digital transmission capacities. Video coding for telecommunication applications has diversified from ISDN and T1/E1 service to embrace PSTN, mobile wireless networks, and LAN/Internet network delivery. Throughout this evolution, continued efforts have been made to maximize coding efficiency while dealing with the diversification of network types and their characteristic formatting and loss/error robustness requirements. The H.264/AVC and HEVC standards are designed for technical solutions including areas like broadcasting (over cable, satellite, cable modem, DSL, terrestrial, etc.) interactive or serial storage on optical and magnetic devices, conversational services, video-on-demand or multimedia streaming, multimedia messaging services, etc. Moreover, new applications may be deployed over existing and future networks. This raises the question about how to handle this variety of applications and networks. To address this need for flexibility and customizability, the design covers a NAL that formats the Video Coding Layer (VCL) representation of the video and provides header information in a manner appropriate for conveyance by a variety of transport layers or storage media. The NAL is designed in order to provide "network friendliness" to enable simple and effective customization of the use of VCL for a broad variety of systems. The NAL facilitates the ability to map VCL data to transport layers such as: RTP/IP for any kind of real-time wire-line and wireless Internet services. File formats, e.g., ISO MP4 for storage and MMS. H.32X for wireline and wireless conversational services. MPEG-2 systems for broadcasting services, etc. The full degree of customization of the video content to fit the needs of each particular application is outside the scope of the video coding standardization effort, but the design of the NAL anticipates a variety of such mappings. Some key concepts of the NAL are NAL units, byte stream, and packet formats uses of NAL units, parameter sets, and access units. A short description of these concepts is given below. == NAL units == The coded video data is organized into NAL units, each of which is effectively a packet that contains an integer number of bytes. The first byte of each H.264/AVC NAL unit is a header byte that contains an indication of the type of data in the NAL unit. For HEVC the header was extended to two bytes. All the remaining bytes contain payload data of the type indicated by the header. The NAL unit structure definition specifies a generic format for use in both packet-oriented and bitstream-oriented transport systems, and a series of NAL units generated by an encoder is referred to as a NAL unit stream. == NAL Units in Byte-Stream Format Use == Some systems require delivery of the entire or partial NAL unit stream as an ordered stream of bytes or bits within which the locations of NAL unit boundaries need to be identifiable from patterns within the coded data itself. For use in such systems, the H.264/AVC and HEVC specifications define a byte stream format. In the byte stream format, each NAL unit is prefixed by a specific pattern of three bytes called a start code prefix. The boundaries of the NAL unit can then be identified by searching the coded data for the unique start code prefix pattern. The use of emulation prevention bytes guarantees that start code prefixes are unique identifiers of the start of a new NAL unit. A small amount of additional data (one byte per video picture) is also added to allow decoders that operate in systems that provide streams of bits without alignment to byte boundaries to recover the necessary alignment from the data in the stream. Additional data can also be inserted in the byte stream format that allows expansion of the amount of data to be sent and can aid in achieving more rapid byte alignment recovery, if desired. == NAL Units in Packet-Transport System Use == In other systems (e.g., IP/RTP systems), the coded data is carried in packets that are framed by the system transport protocol, and identification of the boundaries of NAL units within the packets can be established without use of start code prefix patterns. In such systems, the inclusion of start code prefixes in the data would be a waste of data carrying capacity, so instead the NAL units can be carried in data packets without start code prefixes. == VCL and Non-VCL NAL Units == NAL units are classified into VCL and non-VCL NAL units. VCL NAL units contain the data that represents the values of the samples in the video pictures. Non-VCL NAL units contain any associated additional information such as parameter sets (important header data that can apply to a large number of VCL NAL units) and supplemental enhancement information (timing information and other supplemental data that may enhance usability of the decoded video signal but are not necessary for decoding the values of the samples in the video pictures). == Parameter Sets == A parameter set contains shared configuration data that is carried in non-VCL NAL units. Parameter sets are typically reused when decoding many coded pictures within a video sequence. Each VCL NAL unit references a picture parameter set (PPS), which in turn references a sequence parameter set (SPS). There are two types of parameter sets: Sequence parameter set (SPS), which specifies mostly constant configuration such as resolution, bit depth, or chroma format. (For a concrete implementation, see FFmpeg's SPS struct.) Picture parameter set (PPS), which applies on top of an SPS, and specifies configuration such as QP offsets. (For a concrete implementation, see FFmpeg's PPS struct.) The sequence and picture parameter-set mechanism decouples the transmission of infrequently changing information from the transmission of coded representations of the values of the samples in the video pictures. Each VCL NAL unit contains an identifier that refers to the content of the relevant picture parameter set and each picture parameter set contains an identifier that refers to the content of the relevant sequence parameter set. In this manner, a small amount of data (the identifier) can be used to refer to a larger amount of information (the parameter set) without repeating that information within each VCL NAL unit. Sequence and picture parameter sets can be sent well ahead of the VCL NAL units that they apply to, and can be repeated to provide robustness against data loss. In some applications, parameter sets may be sent within the channel that carries the VCL NAL units (termed "in-band" transmission). In other applications, it can be advantageous to convey the parameter sets "out-of-band" using a more reliable transport mechanism than the video channel itself. == Access Units == A set of NAL units in a specified form is referred to as an access unit. The decoding of each access unit results in one decoded picture. Each access unit contains a set of VCL NAL units that together compose a primary coded picture. It may also be prefixed with an access unit delimiter to aid in locating the start of the access unit. Some supplemental enhancement information containing data such as picture timing information may also precede the primary coded picture. The primary coded picture consists of a set of VCL NAL units consisting of slices or slice data partitions that represent the samples of the video picture. Following the primary coded picture may be some additional VCL NAL units that contain redundant representations of areas of the same video picture. These are referred to as redundant coded pictures, and are available for use by a decoder in recovering from loss or corruption of the data in the primary coded pictures. Decoders are not required to decode redundant coded pictures if they are present. Finally, if the coded picture is the last picture of a coded video sequence (a sequence of pictures that is independently decodable and uses only one sequence parameter set), an end of sequence NAL unit may be present to indicate the end of the sequence; and if the coded picture is the last coded picture in the entire NAL unit stream, an end of stream NAL unit may be present to

    Read more →
  • Single customer view

    Single customer view

    A single customer view is an aggregated, consistent and holistic representation of the data held by an organisation about its customers that can be viewed in one place, such as a single page. The advantage to an organisation of attaining this unified view comes from the ability it gives to analyse past behaviour in order to better target and personalise future customer interactions. A single customer view is also considered especially relevant where organisations engage with customers through multichannel marketing, since customers expect those interactions to reflect a consistent understanding of their history and preferences. However, some commentators have challenged the idea that a single view of customers across an entire organisation is either natural or meaningful, proposing that the priority should instead be consistency between the multiple views that arise in different contexts. Where representations of a customer are held in more than one data set, achieving a single customer view can be difficult: firstly because customer identity must be traceable between the records held in those systems, and secondly because anomalies or discrepancies in the customer data must be data cleansed for data quality. As such, the acquisition by an organisation of a single customer view is one potential outcome of successful master data management. Since 31 December, 2010, maintaining a single customer view, and submitting it within 72 hours, has become mandatory for financial institutions in the United Kingdom due to new rules introduced by the Financial Services Compensation Scheme.

    Read more →
  • Information quality

    Information quality

    Information quality (IQ) is a contextual property of or a perspective to the content within information systems. There exist two complementary yet partially conflicting definitions of high-quality: firstly, information is considered high quality if it is fit for its intended purpose ; secondly, it is deemed high quality if it conforms to specified requirements . The primary distinction between these definitions is that Juran's perspective focuses on the suitability of information for its intended purpose, which can be measured by the success of its application even without direct access to or exact knowledge of the data. For example, a black-box AI with access to English Wikipedia can work well for users' purposes but using Estonian Wikipedia fails for the same purposes. Given that the AI remains the same, it can be concluded that English version data would be of higher quality in comparison to Estonian version, even without exact comparison of data contents and their properties in each version. In contrast, Crosby emphasizes adherence to predefined specifications, assuming specific criteria rather than measuring the success of its use; for instance, information in Wikipedia could be proven to be good based on criteria such as existing peer validation and academic references, even if the AI results are poor. This approach falls into problems when data is not completely accessible or all quality properties cannot be known and measured leading to false impression of quality due to lacking and misleading metrics. Numerous IQ frameworks and methodologies provide tangible approach to assess and measure DQ/IQ in a robust and rigorous manner. == Conceptual problems == Although the foundational definitions are usable for most everyday purposes, specialists often use more complex models for information quality. It has been suggested, however, that higher the quality the greater will be the confidence in meeting more general, less specific contexts. == Dimensions and metrics of information quality == "Information quality" is a measure of its fitness for use or conformance to requirements. In this way, "quality" is considered contextual and it can then vary across users and uses of the information. The exact degree of quality is often described with dimensions such as accuracy, timeliness, completeness, and similar scales. Although a huge amount of academic research has been directed to these dimensions, there does not exist consensus on their definitions or practical usefulness . Historically, Richard Wang and Diane Strong proposed a list of dimensions or elements used in assessing Information Quality is: Intrinsic IQ: accuracy, objectivity, believability, reputation Contextual IQ: relevance, value-added, timeliness, completeness, amount of information Representational IQ: interpretability, format, coherence, compatibility Accessibility IQ: accessibility, access security Other authors propose similar but different lists of dimensions for analysis, and emphasize measurement and reporting as information quality metrics. Larry English prefers the term "characteristics" to dimensions. However, a considerable amount of information quality research involves investigating and describing various categories of desirable attributes (or dimensions) of data. Research has recently shown the huge diversity of terms and classification structures used. === Quality metrics === Source: Authority/verifiability Authority refers to the expertise or recognized official status of a source. Consider the reputation of the author and publisher. When working with legal or government information, consider whether the source is the official provider of the information. Verifiability refers to the ability of a reader to verify the validity of the information irrespective of how authoritative the source is. To verify the facts is part of the duty of care of the journalistic deontology, as well as, where possible, to provide the sources of information so that they can be verified Scope of coverage Scope of coverage refers to the extent to which a source explores a topic. Consider time periods, geography or jurisdiction and coverage of related or narrower topics. Composition and organization Composition and organization has to do with the ability of the information source to present its particular message in a coherent, logically sequential manner. Objectivity Objectivity is the bias or opinion expressed when a writer interprets or analyze facts. Consider the use of persuasive language, the source's presentation of other viewpoints, its reason for providing the information and advertising. Integrity Adherence to moral and ethical principles; soundness of moral character The state of being whole, entire, or undiminished Comprehensiveness Of large scope; covering or involving much; inclusive: a comprehensive study. Comprehending mentally; having an extensive mental grasp. Insurance. covering or providing broad protection against loss. Validity Validity of some information has to do with the degree of obvious truthfulness which the information carries Uniqueness As much as 'uniqueness' of a given piece of information is intuitive in meaning, it also significantly implies not only the originating point of the information but also the manner in which it is presented and thus the perception which it conjures. The essence of any piece of information we process consists to a large extent of those two elements. Timeliness Timeliness refers to information that is current at the time of publication. Consider publication, creation and revision dates. Beware of Web site scripting that automatically reflects the current day's date on a page. Reproducibility (utilized primarily when referring to instructive information) Means that documented methods are capable of being used on the same data set to achieve a consistent result. == Professional associations == IQ International—the International Association for Information and Data Quality IQ International is a not-for-profit, vendor neutral, professional association formed in 2004, dedicated to building the information and data quality profession. CDOIQ Society Chief Data Officers and Information Quality Society is a global professional society supporting data leaders with networking, meetings, best practices, experience, certification, and training. == Information quality conferences == A number of major conferences relevant to information quality are held annually: Annual MIT Chief Data Officer & Information Quality (CDOIQ) Symposium Annual conferences held at the Massachusetts Institute of Technology, Cambridge, MA, USA Data Governance and Information Quality Conference Commercial conferences held each year in the USA Data Quality Asia Pacific Commercial conference held annually in Sydney or Melbourne, Australia Enterprise Data and Business Intelligence Conference Europe Commercial conferences held annually in London, England. Information and Data Quality Conference Not for profit conference run annually by IQ International (the International Association for Information and Data Quality) in the USA International Conference on Information Quality Academic Conference launched through MITIQ held annually at a University Master Data Management & Data Governance Conferences Six major conferences are run annually by the MDM Institute in venues such as London, San Francisco, Sydney, Toronto, Madrid, Frankfurt, Shanghai and New York City.

    Read more →
  • Ontology alignment

    Ontology alignment

    Ontology alignment, or ontology matching, is the process of determining correspondences between concepts in ontologies. A set of correspondences is also called an alignment. The phrase takes on a slightly different meaning, in computer science, cognitive science or philosophy. == Computer science == For computer scientists, concepts are expressed as labels for data. Historically, the need for ontology alignment arose out of the need to integrate heterogeneous databases, ones developed independently and thus each having their own data vocabulary. In the Semantic Web context involving many actors providing their own ontologies, ontology matching has taken a critical place for helping heterogeneous resources to interoperate. Ontology alignment tools find classes of data that are semantically equivalent, for example, "truck" and "lorry". The classes are not necessarily logically identical. According to Euzenat and Shvaiko (2007), there are three major dimensions for similarity: syntactic, external, and semantic. Coincidentally, they roughly correspond to the dimensions identified by Cognitive Scientists below. A number of tools and frameworks have been developed for aligning ontologies, some with inspiration from Cognitive Science and some independently. Ontology alignment tools have generally been developed to operate on database schemas, XML schemas, taxonomies, formal languages, entity-relationship models, dictionaries, and other label frameworks. They are usually converted to a graph representation before being matched. Since the emergence of the Semantic Web, such graphs can be represented in the Resource Description Framework line of languages by triples of the form , as illustrated in the Notation 3 syntax. In this context, aligning ontologies is sometimes referred to as "ontology matching". The problem of Ontology Alignment has been tackled recently by trying to compute matching first and mapping (based on the matching) in an automatic fashion. Systems like DSSim, X-SOM or COMA++ obtained at the moment very high precision and recall. The Ontology Alignment Evaluation Initiative aims to evaluate, compare and improve the different approaches. === Formal definition === Given two ontologies i = ⟨ C i , R i , I i , T i , V i ⟩ {\displaystyle i=\langle C_{i},R_{i},I_{i},T_{i},V_{i}\rangle } and j = ⟨ C j , R j , I j , T j , V j ⟩ {\displaystyle j=\langle C_{j},R_{j},I_{j},T_{j},V_{j}\rangle } where C {\displaystyle C} is the set of classes, R {\displaystyle R} is the set of relations, I {\displaystyle I} is the set of individuals, T {\displaystyle T} is the set of data types, and V {\displaystyle V} is the set of values, we can define different types of (inter-ontology) relationships. Such relationships will be called, all together, alignments and can be categorized among different dimensions: similarity vs logic: this is the difference between matchings (predicating about the similarity of ontology terms), and mappings (logical axioms, typically expressing logical equivalence or inclusion among ontology terms) atomic vs complex: whether the alignments we considered are one-to-one, or can involve more terms in a query-like formulation (e.g., LAV/GAV mapping) homogeneous vs heterogeneous: do the alignments predicate on terms of the same type (e.g., classes are related only to classes, individuals to individuals, etc.) or we allow heterogeneity in the relationship? type of alignment: the semantics associated to an alignment. It can be subsumption, equivalence, disjointness, part-of or any user-specified relationship. Subsumption, atomic, homogeneous alignments are the building blocks to obtain richer alignments, and have a well defined semantics in every Description Logic. Let's now introduce more formally ontology matching and mapping. An atomic homogeneous matching is an alignment that carries a similarity degree s ∈ [ 0 , 1 ] {\displaystyle s\in [0,1]} , describing the similarity of two terms of the input ontologies i {\displaystyle i} and j {\displaystyle j} . Matching can be either computed, by means of heuristic algorithms, or inferred from other matchings. Formally we can say that, a matching is a quadruple m = ⟨ i d , t i , t j , s ⟩ {\displaystyle m=\langle id,t_{i},t_{j},s\rangle } , where t i {\displaystyle t_{i}} and t j {\displaystyle t_{j}} are homogeneous ontology terms, s {\displaystyle s} is the similarity degree of m {\displaystyle m} . A (subsumption, homogeneous, atomic) mapping is defined as a pair μ = ⟨ t i , t j ⟩ {\displaystyle \mu =\langle t_{i},t_{j}\rangle } , where t i {\displaystyle t_{i}} and t j {\displaystyle t_{j}} are homogeneous ontology terms. == Cognitive science == For cognitive scientists interested in ontology alignment, the "concepts" are nodes in a semantic network that reside in brains as "conceptual systems." The focal question is: if everyone has unique experiences and thus different semantic networks, then how can we ever understand each other? This question has been addressed by a model called ABSURDIST (Aligning Between Systems Using Relations Derived Inside Systems for Translation). Three major dimensions have been identified for similarity as equations for "internal similarity, external similarity, and mutual inhibition." == Ontology alignment methods == Two sub research fields have emerged in ontology mapping, namely monolingual ontology mapping and cross-lingual ontology mapping. The former refers to the mapping of ontologies in the same natural language, whereas the latter refers to "the process of establishing relationships among ontological resources from two or more independent ontologies where each ontology is labelled in a different natural language". Existing matching methods in monolingual ontology mapping are discussed in Euzenat and Shvaiko (2007). Approaches to cross-lingual ontology mapping are presented in Fu et al. (2011).

    Read more →
  • Resisting AI

    Resisting AI

    Resisting AI: An Anti-fascist Approach to Artificial Intelligence is a book on artificial intelligence (AI) by Dan McQuillan, published in 2022 by Bristol University Press. == Content == Resisting AI takes the form of an extended essay, which contrasts optimistic visions about AI's potential by arguing that AI may best be seen as a continuation and reinforcement of bureaucratic forms of discrimination and violence, ultimately fostering authoritarian outcomes. For McQuillan, AI's promise of objective calculability is antithetical to an egalitarian and just society. McQuillan uses the expression "AI violence" to describe how – based on opaque algorithms – various actors can discriminate against categories of people in accessing jobs, loans, medical care, and other benefits. The book suggests that AI has a political resonance with soft eugenic approaches to the valuation of life by modern welfare states, and that AI exhibits eugenic features in its underlying logic, as well as in its technical operations. The parallel is with historical eugenicists achieving saving to the state by sterilizing defectives so the state would not have to care for their offspring. The analysis of McQuillan goes beyond the known critique of AI systems fostering precarious labour markets, addressing "necropolitics", the politics of who is entitled to live, and who to die. Although McQuillan offers a brief history of machine learning at the beginning of the book – with its need for "hidden and undercompensated labour", he is concerned more with the social impacts of AI rather than with its technical aspects. McQuillan sees AI as the continuation of existing bureaucratic systems that already marginalize vulnerable groups – aggravated by the fact that AI systems trained on existing data are likely to reinforce existing discriminations, e.g. in attempting to optimize welfare distribution based on existing data patterns, ultimately creating a system of "self-reinforcing social profiling". In elaborating on the continuation between existing bureaucratic violence and AI, McQuillan connects to Hannah Arendt's concept of the thoughtless bureaucrat in Eichmann in Jerusalem: A Report on the Banality of Evil, which now becomes the algorithm that, lacking intent, cannot be accountable, and is thus endowed with an "algorithmic thoughtlessness". McQuillan defends the "fascist" in the title of the work by arguing that while not all AI is fascist, this emerging technology of control may end up being deployed by fascist or authoritarian regimes. For McQuillan, AI can support the diffusion of states of exception, as a technology impossible to properly regulate and a mechanism for multiplying exceptions more widely. An example of a scenario where AI systems of surveillance could bring discrimination to a new high is the initiative to create LGBT-free zones in Poland. Skeptical of ethical regulations to control the technology, McQuillan suggests people's councils and workers' councils, and other forms of citizens' agency to resist AI. A chapter titled "Post-Machine Learning" makes an appeal for resistance via currents of thought from feminist science (standpoint theory), post-normal science (extended peer communities), and new materialism; McQuillan encourages the reader to question the meaning of "objectivity" and calls for the necessity of alternative ways of knowing. Among the virtuous examples of resistance – possibly to be adopted by the AI workers themselves – McQuillan notes the Lucas Plan of the workers of Lucas Aerospace Corporation, in which a workforce declared redundant took control, reorienting the enterprise toward useful products. McQuillan advocates for what he calls decomputing, an opposition to the sweeping application and expansion of artificial intelligence. Similar to degrowth, the approach criticizes AI as an outgrowth of the systemic issues within capitalist systems. McQuillan argues that a different future is possible, in which distance between people is reduced rather than increased through AI intermediaries. The work of McQuillan warns against "watered-down forms of engagement" with AI, such as citizen juries, which superficially look like democratic deliberation but may actually obscure important decisions about AI that are outside the purview of the engagement situation (McQuillan 2022, 128). In an interview about the book, McQuillan describes himself as an "AI abolitionist". == Reception == The book has been praised for how it "masterfully disassembles AI as an epistemological, social, and political paradigm". On the critical side, a review in the academic journal Justice, Power and Resistance took exception to the "nightmarish visions of Big Brother" offered by McQuillan, and argued that while many elements of AI may pose concern, a critique should not be based on a caricature of what AI is, concluding that McQuillan's work is "less of a theory and more of a Manifesto". Another review notes "a disconnect between the technical aspects of AI and the socio-political analysis McQuillan provides." Although the book was published before the ChatGPT and large language model debate heated up, the book has not lost relevance to the AI discussion. It is noted for suggesting a link between beliefs in artificial intelligence and beliefs in a racialised and gendered visions of intelligence overall, whereby a certain type of rational, measurable intelligence is privileged, leading to "historical notions of hierarchies of being". The blog Reboot praised McQuillan for offering a theory of harm of AI (why AI could end up hurting people and society) that does not just encourage tackling in isolation specific predicted problems with AI-centric systems: bias, non-inclusiveness, exploitativeness, environmental destructiveness, opacity, and non-contestability. For educational policies could also look at AI following the reading of McQuillan: In his book Resisting AI, Dan McQuillan argues that "When we're thinking about the actuality of AI, we can't separate the calculations in the code from the social context of its application" .... McQuillan's particular concern is how many contemporary applications of AI are amplifying existing inequalities and injustices as well as deepening social divisions and instabilities. His book makes a powerful case for anticipating these effects and actively resisting them for the good of societies. Videos and podcasts with an interest in AI and emerging technology have discussed the book.

    Read more →
  • Sedona Canada Principles

    Sedona Canada Principles

    The Sedona Canada Principles are a set of authoritative guidelines published by The Sedona Conference to aid members of the Canadian legal community involved in the identification, collection, preservation, review and production of electronically stored information (ESI). The principles were drafted by a small group of lawyers, judges and technologists called the Sedona Working Group 7 or Sedona Canada. Sedona Canada is an offshoot of The Sedona Conference which is an American "non-profit ... research and educational institute dedicated to the advanced study of law and policy in the areas of antitrust law, complex litigation, and intellectual property rights". == Background == Civil procedure in Canada is jurisdictional with each province following its own rules of civil procedure. However, each province must address the fact that due to the advancement of technology the discovery process enshrined in the rules of civil procedure can be potentially derailed due to the sheer volume of electronically stored information (ESI). When dealing with litigation matters that involve electronically stored information (ESI), the discovery process is commonly called e-discovery. The problems associated with e-discovery in Canada led to the creation of the Sedona Canada Principles. Rule 29.1.03(4) of the wikibooks:Ontario Rules of Civil Procedure specifically refers to the Sedona Canada Principles in referencing Principles re Electronic Discovery although it has been reported that this rule has been largely ignored in practice. == Summary == The Sedona Canada Principles largely refer to the processes found in the Electronic Discovery Reference Model. The principles urge proportionality due to the potentially enormous volumes of documents that may be discoverable when dealing with ESI. They also encourage good faith in the document preservation stage and regular meetings between parties to discuss the scope of the litigation. Parties are urged to be aware of the potential costs involved in producing relevant ESI but are advised that only reasonably accessible ESI need be produced. The principles stipulate that parties should not be required to search for or collect deleted material unless there is an agreement or court order related to those terms. The use of electronic tools and processes such as data sampling and web harvesting are acceptable practices. Parties are encouraged to agree early in the litigation process on production format required for the exchange of relevant documents as part of the discovery process (native files, pdf, tiff, metadata requirements etc.). Agreements or direction should be sought, if necessary, with respect to privilege or other confidential information related to production of electronic documents and data. Parties should be aware that legal precedents can be formed as a result of e-discovery practices and sanctions can be considered for a party's failure to meet their discovery obligations unless it can be demonstrated that the failure was not intentional. All parties must bear the “reasonable” costs associated with e-discovery but other arrangements can be agreed upon by the parties or by court order. == Caselaw == In Warman v. National Post Company proportionality was at issue in a case where the plaintiff was suing the defendant for libel. A motion was brought by the defendant to have the plaintiff provide a mirror image of his hard drive in an effort to prove an internet article was indeed authored by the plaintiff. Issues of proportionality and the work of the Sedona Conference and Sedona Canada Principles were factored in to the decision to grant the defendant only limited access to the hard drive. In Innovative Health Group Inc. v. Calgary Health Region the plaintiff's legal obligation to produce imaged hard drives is in question. Justice Conrad refers to the advice of Sedona Canada on proportionality and problems associated with time and expense related to the difficulties associated with electronically stored information. In York University v. Michael Markicevic Justice Brown specifically refers to the need for the parties to agree upon a formal e-discovery plan to be drafted in consultation with Sedona Canada Principles. In Friends of Lansdowne v. Ottawa Master MacLeod refers to the need for Sedona Canada principles and states “This is particularly true in the current information age when e-mail is ubiquitous and multiple copies or variants of messages may be held on various kinds of data storage devices including individual hard drives, e-mail and Blackberry servers. Even documents that ultimately exist in paper form normally begin their life on computers and negotiations frequently involve exchanges of electronic drafts. To find every scrap of paper and every electronic trace of relevant information has become a nightmarish task that threatens to render any kind of litigation extravagantly expensive.” == Criticism == Critics of the Sedona Canada Principles believe they should address system integrity and that the true history of any file preserved cannot be identified without proof of the integrity of the electronic record systems management it comes from. Other criticism is more directed to the Sedona Canada working group and complaints that it is insular and irrelevant.

    Read more →
  • Archival bond

    Archival bond

    The archival bond is a concept in archival theory referring to the relationship that each archival record has with the other records produced as part of the same transaction or activity and located within the same grouping. These bonds are a core component of each individual record and are necessary for transforming a document into a record, as a document will only acquire meaning (and become a record) through its interrelationships with other records. == Description == The concept of the archival bond is primarily associated with the work of Luciana Duranti along with Heather MacNeil, as part of research into the integrity of electronic records. Duranti resumed and extended the concept of vincolo archivistico (archival bond), first expressed in 1937 by archivist Giorgio Cencetti of the Italian archival school. This bond emerges from the fact that electronic records are not physically arranged like traditional records. For traditional, analog records, their bond is implicit in their arrangement. But for electronic records, this bond must be made explicit due to the lack of a single sequential order of records in a digital environment. The archival bond was one of the core concepts of the subsequent International Research on Permanent Authentic Records in Electronic Systems (InterPARES) project and can be found in the InterPARES glossary. As Duranti notes, the archival bond is not to be confused with the broader term "context" as context exists independently of a record, while "the archival bond is an essential part of the record, which would not exist without it."

    Read more →