Navigational database

Navigational database

A navigational database is a type of database in which records or objects are found primarily by following references from other objects. The term was popularized by the title of Charles Bachman's 1973 Turing Award paper, The Programmer as Navigator. This paper emphasized the fact that the new disk-based database systems allowed the programmer to choose arbitrary navigational routes following relationships from record to record, contrasting this with the constraints of earlier magnetic-tape and punched card systems where data access was strictly sequential. One of the earliest navigational databases was Integrated Data Store (IDS), which was developed by Bachman for General Electric in the 1960s. IDS became the basis for the CODASYL database model in 1969. Although Bachman described the concept of navigation in abstract terms, the idea of navigational access came to be associated strongly with the procedural design of the CODASYL Data Manipulation Language. Writing in 1982, for example, Tsichritzis and Lochovsky state that "The notion of currency is central to the concept of navigation." By the notion of currency, they refer to the idea that a program maintains (explicitly or implicitly) a current position in any sequence of records that it is processing, and that operations such as GET NEXT and GET PRIOR retrieve records relative to this current position, while also changing the current position to the record that is retrieved. Navigational database programming thus came to be seen as intrinsically procedural; and moreover to depend on the maintenance of an implicit set of global variables (currency indicators) holding the current state. As such, the approach was seen as diametrically opposed to the declarative programming style used by the relational model. The declarative nature of relational languages such as SQL offered better programmer productivity and a higher level of data independence (that is, the ability of programs to continue working as the database structure evolves.) Navigational interfaces, as a result, were gradually eclipsed during the 1980s by declarative query languages. During the 1990s it started becoming clear that for certain applications handling complex data (for example, spatial databases and engineering databases), the relational calculus had limitations. At that time, a reappraisal of the entire database market began, with several companies describing the new systems using the marketing term NoSQL. Many of these systems introduced data manipulation languages which, while far removed from the CODASYL DML with its currency indicators, could be understood as implementing Bachman's "navigational" vision. Some of these languages are procedural; others (such as XPath) are entirely declarative. Offshoots of the navigational concept, such as the graph database, found new uses in modern transaction processing workloads. == Description == Navigational access is traditionally associated with the network model and hierarchical model of database, and conventionally describes data manipulation APIs in which records (or objects) are processed one at a time, iteratively. The essential characteristic as described by Bachman, however, is finding records by virtue of their relationship to other records: so an interface can still be navigational if it has set-oriented features. From this viewpoint, the key difference between navigational data manipulation languages and relational languages is the use of explicit named relationships rather than value-based joins: for department with name="Sales", find all employees in set department-employees versus find employees, departments where employee.department-code = department.code and department.name="Sales". In practice, however, most navigational APIs have been procedural: the above query would be executed using procedural logic along the lines of the following pseudo-code: On this viewpoint, the key difference between navigational APIs and the relational model (implemented in relational databases) is that relational APIs use "declarative" or logic programming techniques that ask the system what to fetch, while navigational APIs instruct the system in a sequence of steps how to reach the required records. Most criticisms of navigational APIs fall into one of two categories: Usability: application code quickly becomes unreadable and difficult to debug Data independence: application code needs to change whenever the data structure changes For many years the primary defence of navigational APIs was performance. Database systems that support navigational APIs often use internal storage structures that contain physical links or pointers from one record to another. While such structures may allow very efficient navigation, they have disadvantages because it becomes difficult to reorganize the physical placement of data. It is quite possible to implement navigational APIs without low-level pointer chasing (Bachman's paper envisaged logical relationships being implemented just as in relational systems, using primary keys and foreign keys), so the two ideas should not be conflated. But without the performance benefits of low-level pointers, navigational APIs become harder to justify. Hierarchical models often construct primary keys for records by concatenating the keys that appear at each level in the hierarchy. Such composite identifiers are found in computer file names (/usr/david/docs/index.txt), in URIs, in the Dewey decimal system, and for that matter in postal addresses. Such a composite key can be considered as representing a navigational path to a record; but equally, it can be considered as a simple primary key allowing associative access. As relational systems came to prominence in the 1980s, navigational APIs (and in particular, procedural APIs) were criticized and fell out of favour. The 1990s, however, brought a new wave of object-oriented databases that often provided both declarative and procedural interfaces. One explanation for this is that they were often used to represent graph-structured information (for example spatial data and engineering data) where access is inherently recursive: the mathematics originally underpinning SQL (specifically, first-order predicate calculus) does not have sufficient power to support recursive queries, even those as simple as a transitive closure. More recent SQL implementations do support hierarchical and recursive queries. A current example of a popular navigational API can be found in the Document Object Model (DOM) often used in web browsers and closely associated with JavaScript. The DOM is essentially an in-memory hierarchical database with an API that is both procedural and navigational. By contrast, the same data (XML or HTML) can be accessed using XPath, which can be categorized as declarative and navigational: data is accessed by following relationships, but the calling program does not issue a sequence of instructions to be followed in order. Languages such as SPARQL used to retrieve Linked Data from the Semantic Web are also simultaneously declarative and navigational. == Examples == IBM Information Management System IDMS

Morphobank

MorphoBank is a web application for collaborative evolutionary research, specifically phylogenetic systematics or cladistics, on the phenotype. Historically, scientists conducting research on phylogenetic systematics have worked individually or in small groups employing traditional single-user software applications such as MacClade, Mesquite and Nexus Data Editor. As the hypotheses under study have grown more complex, large research teams have assembled to tackle the problem of discovering the Tree of Life for the estimated 4-100 million living species(Wilson 2003, pp. 77–80) and the many thousands more extinct species known from fossils. Because the phenotype is fundamentally visual, and as phenotype-based phylogenetic studies have continued to increase in size, it becomes important that observations be backed up by labeled images. Traditional desktop software applications currently in wide use do not provide robust support for team-based research or for image manipulation and storage. MorphoBank is a particularly important tool for the growing scientific field of phenomics. The development of MorphoBank, which began in 2001, has been funded by the National Science Foundation's Directorates for Geosciences, Biological Sciences and Computer and Information Science and Engineering. The significance of the scientific work on MorphoBank has been featured in the New York Times(here and here), among other publications. == Advantages == Teams of scientists studying phylogenetics to build the Tree of Life assemble large spreadsheets of observations about species (referred to as "matrices"). These teams require simultaneous access by each team member to a single and secure copy of the team's data during a scientific research project. This single copy of the data also changes with great frequency during the data collection phase. Images that can be very helpful for documenting homology statements must be displayed, labeled and shared as homology statements develop. This cannot be accomplished elegantly with a desktop software package alone because in a desktop environment each collaborator is working on his own private copy of project data. Changes made by one participant cannot automatically propagate to others, preventing collaborators from seeing each other's data edits until they are manually (and due to the effort involved, often only periodically) merged into a single "true" dataset. In all but the smallest and most disciplined of teams, file version control and the reconciliation of changes made on multiple copies of the data emerge quickly as significant drags on productivity. MorphoBank is an attempt to address these issues by leveraging the ubiquity of the web and modern web-based application techniques, including Ajax, web service layers, and rich web applications to provide a full-featured, net-accessible collaborative workspace for phylogenetic research. In particular, MorphoBank makes it easy to: Share all kinds of data with geographically separated team members, including taxonomy, character and specimen data, media (including images, video and audio), phylogenetic matrices (including data in the widely used NEXUS and TNT format) and other data such as documents and genetic sequences. Label high-resolution images using a web-based image annotation application. Collaboratively edit project data such as phylogenetic matrices using a built-in web-based matrix editor. The editor allows the linking of labeled images to individual cells of a matrix. Manage access to project data. Access ranges from full-access for team members to anonymous read-only access for potential reviewers. Publish completed project data on the web in support of a published paper with a persistent URL. Search The Encyclopedia of Life for taxon exemplar images. Store high resolution CT data Create ontologies for updating and populating matrix cells. These tasks are difficult or impossible in most existing software applications. == History == In 2001 the National Science Foundation (NSF) sponsored a workshop, at the American Museum of Natural History in New York to develop the outlines of a web-based system for a collaborative, media-rich research tool for morphological phylogenetics. An application prototype presented at the workshop was later refined with feedback from the workshop and became MorphoBank version 1.0. A grant from the US National Oceanic and Atmospheric Administration funded further revisions resulting in version 2.0, released in 2005. Current support from the NSF is funding current feature enhancements to MorphoBank. MorphoBank was hosted by Stony Brook University until late October 2021 and received back up support from the American Museum of Natural History. The current version is 3.0. Rationale for the software was described in the journal Cladistics. MorphoBank has also received support from NESCENT and the San Diego Supercomputer Center. Since 2018, MorphoBank has been supported in part by Phoenix Bioinformatics, a non-profit company founded to sustain databases for the basic sciences. A permanent move of MorphoBank from Stony Brook University to Phoenix Bioinformatics was complete in late October 2021. The San Diego Supercomputer Center has previously provided technical and hosting resources to the MorphoBank project. == Usage == MorphoBank hosts the products of peer-reviewed scientific research on phenotypes. An increasing volume of systematics data is "born digital" and MorphoBank is well suited to handle this type of material. On August 24, 2007, 62 active research projects were hosted by MorphoBank, as well as 6 completed (and published) projects. By 2017 over 2000 scientists and their students were registered content builders (users are not required to register and are even more numerous) and has more than 500 publicly available projects with approximately 80,000 images that are the products of scientific research. Over 1,500 active research projects are hosted by MorphoBank. The software has been used to assemble phylogenetic research on such groups as mammals, from bats to whales, bivalve molluscs, arachnids, fossil plants and living and extinct amniotes. It has also been used more broadly in evolutionary and paleontological research to host curated images associated with published research on lacewing insects geckos, raptor birds, dinosaurs, frogs and nematodes. MorphoBank is increasingly used in conjunction with the Paleobiology Database. Example published projects: Project 1097: Blank CE, 2013 Origin and early evolution of photosynthetic eukaryotes in freshwater environments – reinterpreting proterozoic paleobiology and biogeochemical processes in light of trait evolution Project 2520: Carvalho, T. P., R. E. Reis, and J. P. Friel, 2017 A new species of Hoplomyzon (Siluriformes: Aspredinidae) from Maracaibo Basin, Venezuela: osteological description using high-resolution Project 2651: Baron, M. G., Norman, D. B., Barrett, P. M., 2017 A new hypothesis of dinosaur relationships and early dinosaur evolution MorphoBank has been particularly important to the Assembling the Tree of Life initiative sponsored by the National Science Foundation. MorphoBank is well-suited to such projects because of its tools for merging taxonomic, character and matrix-based data, as well as its collaborative features. Highlights of this research include a collaborative matrix on mammal evolution published in Science that included over 4,000 phenomic characters scored for over 80 species, a matrix on extant baleen whales featuring nearly 600 images, and more.

Margaret Mitchell (scientist)

Margaret Mitchell is a computer scientist who works on algorithmic bias and fairness in machine learning. She is most well known for her work on automatically removing undesired biases concerning demographic groups from machine learning models, as well as more transparent reporting of their intended use. == Education == Mitchell obtained a bachelor's degree in linguistics from Reed College, Portland, Oregon, in 2005. After having worked as a research assistant at the OGI School of Science and Engineering for two years, she subsequently obtained a Master's in Computational Linguistics from the University of Washington in 2009. She enrolled in a PhD program at the University of Aberdeen, where she wrote a doctoral thesis on the topic of Generating Reference to Visible Objects, graduating in 2013. == Career and research == Mitchell is best known for her work on fairness in machine learning and methods for mitigating algorithmic bias. This includes her work on introducing the concept of 'Model Cards' for more transparent model reporting, and methods for debiasing machine learning models using adversarial learning. Margaret Mitchell created the framework for recognizing and avoiding biases by testing with a variable for the group of interest, predictor and an adversary. In 2012, Mitchell joined the Human Language Technology Center of Excellence at Johns Hopkins University as a postdoctoral researcher, before taking up a position at Microsoft Research in 2013. At Microsoft, Mitchell was the research lead of the Seeing AI project, an app that offers support for the visually impaired by narrating texts and images. In November 2016, she became a senior research scientist at Google Research and Machine intelligence. While at Google, she founded and co-led the Ethical Artificial Intelligence team together with Timnit Gebru. In May 2018, she represented Google in the Partnership on AI. In February 2018, she gave a TED talk on "How we can build AI to help humans, not hurt us". In January 2021, after Timnit Gebru's termination from Google, Mitchell reportedly used a script to search through her corporate account and download emails that allegedly documented discriminatory incidents involving Gebru. An automated system locked Mitchell's account in response. In response to media attention Google claimed that she "exfiltrated thousands of files and shared them with multiple external accounts". After a five-week investigation, Mitchell was fired. Prior to her dismissal, Mitchell had been a vocal advocate for diversity at Google, and had voiced concerns about research censorship at the company. In late 2021, she joined AI start-up Hugging Face. Mitchell is a co-founder of Widening NLP, a special interest group within the Association for Computational Linguistics (ACL) seeking to increase the proportion of women and minorities working in natural language processing; and Computational Linguistics and Clinical Psychology, an annual workshop within the ACL that brings together clinicians and computational linguists to advance the state of the art in clinical psychology.

Seppo Linnainmaa

Seppo Ilmari Linnainmaa (born 28 September 1945) is a Finnish mathematician and computer scientist known for creating the modern version of backpropagation. == Biography == He was born in Pori. He received his MSc in 1970 and introduced a reverse mode of automatic differentiation in his MSc thesis. In 1974 he obtained the first doctorate ever awarded in computer science at the University of Helsinki. In 1976, he became Assistant Professor. From 1984 to 1985 he was Visiting Professor at the University of Maryland, USA. From 1986 to 1989 he was Chairman of the Finnish Artificial Intelligence Society. From 1989 to 2007, he was Research Professor at the VTT Technical Research Centre of Finland. He retired in 2007. == Backpropagation == Explicit, efficient error backpropagation in arbitrary, discrete, possibly sparsely connected, neural networks-like networks was first described in Linnainmaa's 1970 master's thesis, albeit without reference to NNs, when he introduced the reverse mode of automatic differentiation (AD), in order to efficiently compute the derivative of a differentiable composite function that can be represented as a graph, by recursively applying the chain rule to the building blocks of the function. Linnainmaa published it first, following Gerardi Ostrowski who had used it in the context of certain process models in chemical engineering some five years earlier, but didn't publish.

Vera Demberg

Vera Demberg (born 1981) is a German computational linguist and professor of computer science and computational linguistics at Saarland University. Her research interests include cognitive models of human language comprehension, natural language generation, experimental psycholinguistics, multimodal language processing in a dual-task setting, and experimental and computational discourse research and pragmatics. == Career and research == Vera Demberg studied computational linguistics at the Institute for Machine Language Processing at the University of Stuttgart from 2001 to 2006. She then completed a Master's degree in Artificial Intelligence at the University of Edinburgh from 2004 to 2005. She received her Ph.D. from the Department of Computer Science there from 2006 to 2010. Her dissertation paper, titled “Broad-Coverage Model of Prediction in Human Sentence Processing”, was awarded the Cognitive Science Society's “Glushko Dissertation Prize in Cognitive Science” in 2011. In her work, she designed a model of human sentence processing that can be used to predict difficulties in processing at the syntactic level. From 2010 to 2016, Vera Demberg led an independent research group on cognitive models of human language processing and their application to speech dialog systems in the Cluster of Excellence “Multimodal Computing and Interaction” at the University of Saarland. In 2016, she was appointed there to a professorship in computer science and computational linguistics. Demberg's professorship is in the Department of Computer Science (Faculty of Mathematics and Computer Science). She is also a co-opted professor in the Department of Linguistics and Language Technology (Faculty of Philosophy). Since 2020, she has led the ERC Starting Grant “Individualized Interaction in Discourse”. The project conducts research on how to make linguistic interaction with computer systems more natural. She has authored and co-authored numerous papers on the study of computational linguistics and natural language processing. According to Google Scholar, Vera Demberg has an H-index of 30. == Publications == Vera Demberg has authored more than 200 papers; please refer to her scholar page at https://scholar.google.com/citations?user=l2CFSAMAAAAJ == Awards == 2011: Cognitive Science Society Glushko Dissertation Prize in Cognitive Science 2020: ERC Starting Grant “Individualized Interaction in Discourse” 2024: Member of the Academy of Sciences and Literature

Hierarchical Risk Parity

Hierarchical Risk Parity (HRP) is an advanced investment portfolio optimization framework developed in 2016 by Marcos López de Prado at Guggenheim Partners and Cornell University. HRP is a probabilistic graph-based alternative to the prevailing mean-variance optimization (MVO) framework developed by Harry Markowitz in 1952, and for which he received the Nobel Prize in economic sciences. HRP algorithms apply discrete mathematics and machine learning techniques to create diversified and robust investment portfolios that outperform MVO methods out-of-sample. HRP aims to address the limitations of traditional portfolio construction methods, particularly when dealing with highly correlated assets. Following its publication, HRP has been implemented in numerous open-source libraries, and received multiple extensions. == Key features == HRP portfolios have been proposed as a robust alternative to traditional quadratic optimization methods, including the Critical Line Algorithm (CLA) of Markowitz. HRP addresses three central issues commonly associated with quadratic optimizers: numerical instability, excessive concentration in a small number of assets, and poor out-of-sample performance. HRP leverages techniques from graph theory and machine learning to construct diversified portfolios using only the information embedded in the covariance matrix. Unlike quadratic programming methods, HRP does not require the covariance matrix to be invertible. Consequently, HRP remains applicable even in cases where the covariance matrix is ill-conditioned or singular—conditions under which standard optimizers fail. Monte Carlo simulations indicate that HRP achieves lower out-of-sample variance than CLA, despite the fact that minimizing variance is the explicit optimization objective of CLA. Furthermore, HRP portfolios exhibit lower realized risk compared to those generated by traditional risk parity methodologies. Empirical backtests have demonstrated that HRP would have historically outperformed conventional portfolio construction techniques. Algorithms within the HRP framework are characterized by the following features: Machine Learning Approach: HRP employs hierarchical clustering, a machine learning technique, to group similar assets based on their correlations. This allows the algorithm to identify the underlying hierarchical structure of the portfolio, and avoid that errors spread through the entire network. Risk-Based Allocation: The algorithm allocates capital based on risk, ensuring that assets only compete with similar assets for representation in the portfolio. This approach leads to better diversification across different risk sources, while avoiding the instability associated with noisy returns estimates. Covariance Matrix Handling: Unlike traditional methods like Mean-Variance Optimization, HRP does not require inverting the covariance matrix. This makes it more stable and applicable to portfolios with a large number of assets, particularly when the covariance matrix's condition number is high. == The problem: Markowitz's Curse == Portfolio construction is perhaps the most recurrent financial problem. On a daily basis, investment managers must build portfolios that incorporate their views and forecasts on risks and returns. Despite the theoretical elegance of Markowitz's mean-variance framework, its practical implementation is hindered by several limitations that undermine the reliability of solutions derived from the Critical Line Algorithm (CLA). A principal concern is the high sensitivity of optimal portfolios to small perturbations in expected returns: even minor forecasting errors can result in significantly different allocations (Michaud, 1998). Given the inherent difficulty of producing accurate return forecasts, numerous researchers have advocated for approaches that forgo expected returns entirely and instead rely solely on the covariance structure of asset returns. This has given rise to risk-based allocation methods, among which risk parity is a widely cited example (Jurczenko, 2015). While eliminating return forecasts mitigates some instability, it does not eliminate it. Quadratic programming techniques employed in portfolio optimization require the inversion of a positive-definite covariance matrix, meaning all eigenvalues must be strictly positive. When the matrix is numerically ill-conditioned—that is, when the ratio of its largest to smallest eigenvalue (its condition number) is large—matrix inversion becomes unreliable and prone to significant numerical errors (Bailey and López de Prado, 2012). The condition number of a covariance, correlation, or any symmetric (and thus diagonalizable) matrix is defined as the absolute value of the ratio between its largest and smallest eigenvalues in modulus. The figure on the right presents the sorted eigenvalues of several correlation matrices; the condition number is represented by the ratio of the first to last eigenvalues in each sequence. A diagonal correlation matrix, which is equal to its own inverse, exhibits the minimum possible condition number. As the number of correlated (or multicollinear) assets in a portfolio increases, the condition number rises. At high levels, this leads to severe numerical instability, whereby slight modifications in any matrix entry may result in drastically different inverses. This phenomenon, often referred to as Markowitz’s curse, encapsulates the paradox wherein increased correlation among assets heightens the theoretical need for diversification, yet simultaneously increases the likelihood of unstable optimization outcomes. Consequently, the potential benefits of diversification are frequently overshadowed by estimation errors. These problems are exacerbated as the dimensionality of the covariance matrix increases. The estimation of each covariance term consumes degrees of freedom, and in general, a minimum of 1 2 N ( N + 1 ) {\displaystyle {\frac {1}{2}}N(N+1)} independent and identically distributed (IID) observations is required to estimate a non-singular covariance matrix of dimension N {\displaystyle N} . For example, constructing an invertible covariance matrix of dimension 50 necessitates at least five years of daily IID observations. However, empirical evidence suggests that the correlation structure of financial assets is highly unstable over such extended periods. These difficulties are highlighted by the observation that even naïve allocation strategies—such as equally weighted portfolios—have frequently outperformed both mean-variance and risk-based optimizations in out-of-sample tests (De Miguel et al., 2009). == The solution: Hierarchical Risk Parity == The HRP algorithm addresses Markowitz's curse in three steps: Hierarchical Clustering: Assets are grouped into clusters based on their correlations, forming a hierarchical tree structure. Quasi-Diagonalization: The correlation matrix is reordered based on the clustering results, revealing a block diagonal structure. Recursive Bisection: Weights are assigned to assets through a top-down approach, splitting the portfolio into smaller sub-portfolios and allocating capital based on inverse variance. === Step 1: Hierarchical clustering === Given a T × N {\displaystyle T\times N} matrix of asset returns X {\displaystyle X} , where each column represents a time series of returns for one of N {\displaystyle N} assets over T {\displaystyle T} time periods, a hierarchical clustering process can be used to construct a tree-based representation of asset relationships. First, we compute the N × N {\displaystyle N\times N} correlation matrix ρ = ρ i , j i , j = 1 . . . N {\displaystyle \rho ={\rho _{i,j}}\;{i,j=1\;...\;N}} , where ρ i , j = c o r r ( X i , X j ) {\displaystyle \rho _{i,j}=\mathrm {corr} (X_{i},X_{j})} . From this, a pairwise distance matrix D = d i , j {\displaystyle D={d_{i,j}}} is defined using the transformation: d i , j = 1 2 ( 1 − ρ i , j ) {\displaystyle d_{i,j}={\sqrt {{\frac {1}{2}}(1-\rho _{i,j})}}} This distance function defines a proper metric space, satisfying non-negativity, identity of indiscernibles, symmetry, and the triangle inequality. Next, a secondary distance matrix D ~ = d ~ i , j {\displaystyle {\tilde {D}}={{\tilde {d}}_{i,j}}} is computed, where each entry measures the Euclidean distance between the distance profiles of two assets: d ~ i , j = ∑ n = 1 N ( d n , i − d n , j ) 2 {\displaystyle {\tilde {d}}_{i,j}={\sqrt {\sum _{n=1}^{N}(d_{n,i}-d_{n,j})^{2}}}} While d i , j {\displaystyle d_{i,j}} reflects correlation-based proximity between two assets, d ~ i , j {\displaystyle {\tilde {d}}_{i,j}} quantifies dissimilarity across the entire system, as it depends on all pairwise distances. Hierarchical clustering proceeds by identifying the pair ( i , j ) {\displaystyle (i,j)} with the smallest value of d ~ i , j {\displaystyle {\tilde {d}}_{i,j}} (for i ≠ j {\displaystyle i\neq j} ), and forming a new cluster u [ 1 ] = ( i , j ) {\displaystyle u[1]=(i,j)} .

How to Choose an AI Avatar Generator

Trying to pick the best AI avatar generator? An AI avatar generator is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI avatar generator slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.