AI Chatbot Social Network

AI Chatbot Social Network — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • VACUUM

    VACUUM

    VACUUM is a set of normative guidance principles for achieving training and test dataset quality for structured datasets in data science and machine learning. The garbage-in, garbage out principle motivates a solution to the problem of data quality but does not offer a specific solution. Unlike the majority of the ad-hoc data quality assessment metrics often used by practitioners VACUUM specifies qualitative principles for data quality management and serves as a basis for defining more detailed quantitative metrics of data quality. VACUUM is an acronym that stands for: valid accurate consistent uniform unified model

    Read more →
  • BioCreative

    BioCreative

    BioCreAtIvE (A critical assessment of text mining methods in molecular biology) consists in a community-wide effort for evaluating information extraction and text mining developments in the biological domain. It was preceded by the Knowledge Discovery and Data Mining (KDD) Challenge Cup for detection of gene mentions. == Community Challenges == === First edition (2004-2005) === Three main tasks were posed at the first BioCreAtIvE challenge: the entity extraction task, the gene name normalization task, and the functional annotation of gene products task. The data sets produced by this contest serve as a Gold Standard training and test set to evaluate and train Bio-NER tools and annotation extraction tools. === Second edition (2006-2007) === The second BioCreAtIvE challenge (2006-2007) had also 3 tasks: detection of gene mentions, extraction of unique idenfiers for genes and extraction information related to physical protein-protein interactions. It counted with participation of 44 teams from 13 countries. === Third edition (2011-2012) === The third edition of BioCreative included for the first time the InterActive Task (IAT), designed to evaluate the practical usability of text mining tools in real-world biocuration tasks. === Fifth edition (2016) === BioCreative V had 5 different tracks, including an interactive task (IAT) for usability of text mining systems and a track using the BioC format for curating information for BioGRID.

    Read more →
  • Energy informatics

    Energy informatics

    Energy informatics is a research field covering the use of information and communication technology to address energy utilization and management challenges. Methods used for "smart" implementations often combine IoT sensors with artificial intelligence and machine learning. Energy Informatics is founded on flow networks that are the major suppliers and consumers of energy. Their efficiency can be improved by collecting and analyzing information. == Application areas == The field among other consider application areas within: Smart Buildings by developing ICT-centred solutions for improving the energy-efficiency of buildings. Smart Cities by investigating the synergies between demand patterns and supply availability of energy flows in cities and communities to improve energy efficiency, increase integration of renewable sources, and provide resilience towards system faults caused by extreme situations, like hurricanes and flooding. Smart Industries including the development of ICT-centred solutions for improving the energy efficiency and predictability of energy intensive industrial processes, without compromising process and product quality. Smart Energy Networks by developing ICT-centred solutions for coordinating the supply and demand in environmentally sustainable energy networks.

    Read more →
  • Universal Data Element Framework

    Universal Data Element Framework

    The Universal Data Element Framework (UDEF) was a controlled vocabulary developed by The Open Group. It provided a framework for categorizing, naming, and indexing data. It assigned to every item of data a structured alphanumeric tag plus a controlled vocabulary name that describes the meaning of the data. This allowed relating data elements to similar elements defined by other organizations. UDEF defined a Dewey-decimal like code for each concept. For example, an "employee number" is often used in human resource management. It has a UDEF tag a.5_12.35.8 and a controlled vocabulary description "Employee.PERSON_Employer.Assigned.IDENTIFIER". UDEF has been superseded by the Open Data Element Framework (ODEF). == Examples == In an application used by a hospital, the last name and first name of several people could include the following example concepts: Patient Person Family Name – find the word “Patient” under the UDEF object “Person” and find the word “Family” under the UDEF property “Name” Patient Person Given Name – find the word “Patient” under the UDEF object “Person” and find the word “Given” under the UDEF property “Name” Doctor Person Family Name – find the word “Doctor” under the UDEF object “Person” and find the word “Family” under the UDEF property “Name” Doctor Person Given Name – find the word “Doctor” under the UDEF object “Person” and find the word “Given” under the UDEF property “Name” For the examples above, the following UDEF IDs are available: “Patient Person Family Name” the UDEF ID is “au.5_11.10” “Patient Person Given Name” the UDEF ID is “au.5_12.10” “Doctor Person Family Name” the UDEF ID is “aq.5_11.10” “Doctor Person Given Name” the UDEF ID is “aq.5_12.10”

    Read more →
  • Robot learning

    Robot learning

    Robot learning is a research field at the intersection of machine learning and robotics. It studies techniques allowing a robot to acquire novel skills or adapt to its environment through learning algorithms. The embodiment of the robot, situated in a physical embedding, provides at the same time specific difficulties (e.g. high-dimensionality, real time constraints for collecting data and learning) and opportunities for guiding the learning process (e.g. sensorimotor synergies, motor primitives). Example of skills that are targeted by learning algorithms include sensorimotor skills such as locomotion, grasping, active object categorization, as well as interactive skills such as joint manipulation of an object with a human peer, and linguistic skills such as the grounded and situated meaning of human language. Learning can happen either through autonomous self-exploration or through guidance from a human teacher, like for example in robot learning by imitation. Robot learning can be closely related to adaptive control, reinforcement learning as well as developmental robotics which considers the problem of autonomous lifelong acquisition of repertoires of skills. While machine learning is frequently used by computer vision algorithms employed in the context of robotics, these applications are usually not referred to as "robot learning". == Imitation learning == Many research groups are developing techniques where robots learn by imitating. This includes various techniques for learning from demonstration (sometimes also referred to as "programming by demonstration") and observational learning. == Sharing learned skills and knowledge == In Tellex's "Million Object Challenge", the goal is robots that learn how to spot and handle simple items and upload their data to the cloud to allow other robots to analyze and use the information. RoboBrain is a knowledge engine for robots which can be freely accessed by any device wishing to carry out a task. The database gathers new information about tasks as robots perform them, by searching the Internet, interpreting natural language text, images, and videos, object recognition as well as interaction. The project is led by Ashutosh Saxena at Stanford University. RoboEarth is a project that has been described as a "World Wide Web for robots" − it is a network and database repository where robots can share information and learn from each other and a cloud for outsourcing heavy computation tasks. The project brings together researchers from five major universities in Germany, the Netherlands and Spain and is backed by the European Union. Google Research, DeepMind, and Google X have decided to allow their robots share their experiences. == Vision-language-action model == Research groups and companies are developing vision-language-action models, foundation models that allow robotic control through the combination of vision and language. Google DeepMind, Figure AI and Hugging Face are actively working on that.

    Read more →
  • AVT Statistical filtering algorithm

    AVT Statistical filtering algorithm

    AVT Statistical filtering algorithm is an approach to improving quality of raw data collected from various sources. It is most effective in cases when there is inband noise present. In those cases AVT is better at filtering data then, band-pass filter or any digital filtering based on variation of. Conventional filtering is useful when signal/data has different frequency than noise and signal/data is separated/filtered by frequency discrimination of noise. Frequency discrimination filtering is done using Low Pass, High Pass and Band Pass filtering which refers to relative frequency filtering criteria target for such configuration. Those filters are created using passive and active components and sometimes are implemented using software algorithms based on Fast Fourier transform (FFT). AVT filtering is implemented in software and its inner working is based on statistical analysis of raw data. When signal frequency/(useful data distribution frequency) coincides with noise frequency/(noisy data distribution frequency) we have inband noise. In this situations frequency discrimination filtering does not work since the noise and useful signal are indistinguishable and where AVT excels. To achieve filtering in such conditions there are several methods/algorithms available which are briefly described below. == Averaging algorithm == Collect n samples of data Calculate average value of collected data Present/record result as actual data == Median algorithm == Collect n samples of data Sort the data in ascending or descending order. Note that order does not matter Select the data that happen to be in n/2 position and present/record it as final result representing data sample == AVT algorithm == AVT algorithm stands for Antonyan Vardan Transform and its implementation explained below. Collect n samples of data Calculate the standard deviation and average value Drop any data that is greater or less than average ± one standard deviation Calculate average value of remaining data Present/record result as actual value representing data sample This algorithm is based on amplitude discrimination and can easily reject any noise that is not like actual signal, otherwise statistically different than 1 standard deviation of the signal. Note that this type of filtering can be used in situations where the actual environmental noise is not known in advance. Notice that it is preferable to use the median in above steps than average. Originally the AVT algorithm used average value to compare it with results of median on the data window. == Filtering algorithms comparison == Using a system that has signal value of 1 and has noise added at 0.1% and 1% levels will simplify quantification of algorithm performance. The R script is used to create pseudo random noise added to signal and analyze the results of filtering using several algorithms. Please refer to "Reduce Inband Noise with the AVT Algorithm" article for details. This graphs show that AVT algorithm provides best results compared with Median and Averaging algorithms while using data sample size of 32, 64 and 128 values. Note that this graph was created by analyzing random data array of 10000 values. Sample of this data is graphically represented below. From this graph it is apparent that AVT outperforms other filtering algorithms by providing 5% to 10% more accurate data when analyzing same datasets. Considering random nature of noise used in this numerical experiment that borderlines worst case situation where actual signal level is below ambient noise the precision improvements of processing data with AVT algorithm are significant. == AVT algorithm variations == === Cascaded AVT === In some situations better results can be obtained by cascading several stages of AVT filtering. This will produce singular constant value which can be used for equipment that has known stable characteristics like thermometers, thermistors and other slow acting sensors. === Reverse AVT === Collect n samples of data Calculate the standard deviation and average value Drop any data that is within one standard deviation ± average band Calculate average value of remaining data Present/record result as actual data This is useful for detecting minute signals that are close to background noise level. == Possible applications and uses == Use to filter data that is near or below noise level Used in planet detection to filter out raw data from the Kepler space telescope Filter out noise from sound sources where all other filtering methods (Low-pass filter, High-pass filter, Band-pass filter, Digital filter) fail. Pre-process scientific data for data analysis (Smoothness) before plotting see (Plot (graphics)) Used in SETI (Search for extraterrestrial intelligence) for detecting/distinguishing extraterrestrial signals from cosmic background Use AVT as image filtering algorithm to detect altered images. This image of Jupiter generated from this program, detecting alterations in original picture that was modified to be visually appealing by applying filters. Another version of this comparison is the Reverse AVT filter applied to the same original Jupiter Image, where we only see that altered portion as Noise that was eliminated by AVT algorithm. Use AVT as image filtering algorithm to estimate data density from images. Picture of Pillars of Creation Nebula shows data density in filtered images from Hubble and Webb. Note that image on the left has big patches of missing data marked with simpler color patterns.

    Read more →
  • E-Science librarianship

    E-Science librarianship

    E-Science librarianship refers to a role for librarians in e-Science. == Early scholars == Early references to e-Science and librarianship involve information studies scholars researching cyberinfrastructure and emerging networked information and knowledge communities. Notably Christine Borgman, Professor and Presidential Chair in Information Studies at the University of California, Los Angeles (UCLA) was a key player in bringing e-Science, and the idea of networked knowledge communities, to the attention of the library profession. In 2004, as a visiting fellow at the Oxford Internet Institute, she conducted research and lectured publicly on e-Science, Digital Libraries, and Knowledge Communities. In 2007 Anna K. Gold, formerly of MIT and Cal Poly, San Luis Obispo, authored a series of articles in D-Lib Magazine that opened the door for academic libraries to begin exploring roles, skills, and strategies for engaging in e-Science: Cyberinfrastructure, Data, and Libraries, Part 1: A Cyberinfrastructure Primer for Librarians and Cyberinfrastructure, Data, and Libraries, Part 2: Libraries and the Data Challenge: Roles and Actions for Libraries. == Academic research and health sciences libraries == In 2007, the Association of Research Libraries (ARL) e-Science task force issued its report on e-Science and librarianship. The ARL's report encouraged its member libraries to position themselves to engage with researchers involved in e-Science (eScience) by cultivating new research support strategies and developing their digital scholarship infrastructure. E-Science has multiple attributes; Tony and Jessie Hey framed e-Science for the library community by characterizing it as a research methodology: "e-Science is not a new scientific discipline in its own right: e-Science is shorthand for the set of tools and technologies required to support collaborative, networked science". In addition to academic libraries' interests in providing support for their researchers engaging in e-Science, the health sciences library community also emerged as a major proponent for creating librarian positions for supporting the information needs of large-scale, networked, research collaborations on their campuses. Neil Rambo, current director of NYU's Health Sciences Library and former director of University of Washington Health Sciences Library, was the first to use the term in the Journal of the Medical Library Association, in his 2009 editorial e-Science and the Biomedical Library. Rambo's definition of e-Science highlighted the potential e-Science held for creating data as a research product: "E-science is a new research methodology, fueled by networked capabilities and the practical possibility of gathering and storing vast amounts of data." In response to this article the University of Massachusetts Medical School Lamar Soutter Library and National Network of Libraries of Medicine, New England Region encouraged health sciences libraries to cooperate to identify skills and develop a program for training e-Science Librarians. Then, in 2013, Shannon Bohle, an archivist who was employed in the library at Cold Spring Harbor Laboratory, an NCI-designated basic cancer research facility, used experience gained there and previous papers and presentations about preserving scientific archival materials to expand the traditional definition of e-Science by including the terms, principles, and practices used in archival science. These included in the definition the "long-term storage and accessibility of all materials generated through the scientific process," as well as examples of material types traditionally preserved in archives, like "electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, pre-prints," as well as library materials ("print and/or electronic publications"). == Roles == Many areas of science are about to be transformed by the availability of vast amounts of new scientific data that can potentially provide insights at a level of detail never before envisaged. However, this new data dominant era brings new challenges for the scientists and they will need the skills and technologies both of computer scientists and of the library community to manage, search and curate these new data resources. Libraries will not be immune from change in this new world of research. Karen Williams identifies roles in the following areas for librarians in the developing world of e-Science. Campus Engagement Content/Collection Development and Management Teaching and Learning Scholarly Communication E-Scholarship and Digital Tools Reference/Help Services Outreach Fund Raising Exhibit and Event Planning Leadership == Challenges for research libraries == E-science tends toward inter- and multidisciplinary approaches that depend on computation and computer science. Research libraries have traditionally been discipline focused and, although increasingly technologically sophisticated, do not have systems of the scale or complexity of the e-science environment. E-science is data intensive, but research libraries have not typically been responsible for scientific data. E-science is frequently conducted in a team context, often distributed across multiple institutions and on a global scale. The primary constituency of libraries generally comprises those affiliated with the local institution. Licenses for electronic content are typically restricted to a particular institutional community, and the infrastructure to move institutional licenses into a multi-institutional environment is not well developed. E-science challenges all these traditional paradigms of research library organization and services. == Skills == Garritano & Carlson were among the first to outline a skill set for librarians seeking to support the data needs of e-Science; they identified five skill categories librarians new to this area should expect to adapt or develop when participating on such projects: Library and information science expertise Subject expertise Partnerships and outreach (both internal and external) Participating in sponsored research Balancing workload An example of librarians reconfiguring traditional librarian skills to meet the needs of researchers engaging in e-Science is Witt & Carlson's adaptation of the traditional reference interview into a "data interview" in order to provide effective data management and e-Science services. This interview consists of ten practical queries necessary for understanding the provenance and expectations for the preservation of datasets typical of e-Science that also help illustrate some of the educational tools and skills needed by a librarian new to e-Science. "What is the story of the data? What form and format are the data in? What is the expected lifespan of the dataset? How could the data be used, reused, and repurposed? How large is the dataset, and what is its rate of growth? Who are the potential audiences for the data? Who owns the data? Does the dataset include any sensitive information? What publications or discoveries have resulted from the data? How should the data be made accessible?" == Resources == In 2009 the Lamar Soutter Library at the University of Massachusetts Medical School (UMMS) and the National Network of Libraries of Medicine, New England Region (NN/LM NER) funded an e-Science program for building the skills highlighted above for librarians. Elaine Russo Martin, Director of Library Services at the Lamar Soutter Library and Director of the NN/LM NER developed this comprehensive e-Science program to build librarians' subject expertise in the sciences, developing their data management skills, and their familiarity with cyberinfrastructure and e-Science. Three major products of this program are the e-Science web portal for librarians, the E-Science Symposium, and the New England Collaborative Data Management Curriculum (NECDMC). This portal includes educational resources for specific tools and subject/discipline tutorials and modules to assist librarians new to e-Science. UMMS and NN/LM NER also publish an open access journal called the Journal of eScience Librarianship.

    Read more →
  • Controlled vocabulary

    Controlled vocabulary

    A controlled vocabulary provides a way to organize knowledge for subsequent retrieval. Controlled vocabularies are used in subject indexing schemes, subject headings, thesauri, taxonomies and other knowledge organization systems. Controlled vocabulary schemes mandate the use of predefined, preferred terms that have been preselected by the designers of the schemes, in contrast to natural language vocabularies, which have no such restriction. == In library and information science == In library and information science, controlled vocabulary is a carefully selected list of words and phrases, which are used to tag units of information (document or work) so that they may be more easily retrieved by a search. Controlled vocabularies solve the problems of homographs, synonyms and polysemes by a bijection between concepts and preferred terms. In short, controlled vocabularies reduce unwanted ambiguity inherent in normal human languages where the same concept can be given different names and ensure consistency. For example, in the Library of Congress Subject Headings (a subject heading system that uses a controlled vocabulary), preferred terms—subject headings in this case—have to be chosen to handle choices between variant spellings of the same word (American versus British), choice among scientific and popular terms (cockroach versus Periplaneta americana), and choices between synonyms (automobile versus car), among other difficult issues. Choices of preferred terms are based on the principles of user warrant (what terms users are likely to use), literary warrant (what terms are generally used in the literature and documents), and structural warrant (terms chosen by considering the structure, scope of the controlled vocabulary). Controlled vocabularies also typically handle the problem of homographs with qualifiers. For example, the term pool has to be qualified to refer to either swimming pool or the game pool to ensure that each preferred term or heading refers to only one concept. === Types used in libraries === There are two main kinds of controlled vocabulary tools used in libraries: subject headings and thesauri. While the differences between the two are diminishing, there are still some minor differences: Historically, subject headings were designed to describe books in library catalogs by catalogers while thesauri were used by indexers to apply index terms to documents and articles. Subject headings tend to be broader in scope describing whole books, while thesauri tend to be more specialized covering very specific disciplines. Because of the card catalog system, subject headings tend to have terms that are in indirect order (though with the rise of automated systems this is being removed), while thesaurus terms are always in direct order. Subject headings tend to use more pre-coordination of terms such that the designer of the controlled vocabulary will combine various concepts together to form one preferred subject heading. (e.g., children and terrorism) while thesauri tend to use singular direct terms. Thesauri list not only equivalent terms but also narrower, broader terms and related terms among various preferred and non-preferred (but potentially synonymous) terms, while historically most subject headings did not. For example, the Library of Congress Subject Heading itself did not have much syndetic structure until 1943, and it was not until 1985 when it began to adopt the thesauri type term "Broader term" and "Narrow term". The terms are chosen and organized by trained professionals (including librarians and information scientists) who possess expertise in the subject area. Controlled vocabulary terms can accurately describe what a given document is actually about, even if the terms themselves do not occur within the document's text. Well known subject heading systems include the Library of Congress system, Medical Subject Headings (MeSH) created by the United States National Library of Medicine, and Sears. Well known thesauri include the Art and Architecture Thesaurus and the ERIC Thesaurus. When selecting terms for a controlled vocabulary, the designer has to consider the specificity of the term chosen, whether to use direct entry, inter consistency and stability of the language. Lastly the amount of pre-coordination (in which case the degree of enumeration versus synthesis becomes an issue) and post-coordination in the system is another important issue. Controlled vocabulary elements (terms/phrases) employed as tags, to aid in the content identification process of documents, or other information system entities (e.g. DBMS, Web Services) qualifies as metadata. == Indexing languages == There are three main types of indexing languages. Controlled indexing language – only approved terms can be used by the indexer to describe the document Natural language indexing language – any term from the document in question can be used to describe the document Free indexing language – any term (not only from the document) can be used to describe the document When indexing a document, the indexer also has to choose the level of indexing exhaustivity, the level of detail in which the document is described. For example, using low indexing exhaustivity, minor aspects of the work will not be described with index terms. In general the higher the indexing exhaustivity, the more terms indexed for each document. In recent years free text search as a means of access to documents has become popular. This involves using natural language indexing with an indexing exhaustively set to maximum (every word in the text is indexed). These methods have been compared in some studies, such as the 2007 article, "A Comparative Evaluation of Full-text, Concept-based, and Context-sensitive Search". === Advantages === Controlled vocabularies are often claimed to improve the accuracy of free text searching, such as to reduce irrelevant items in the retrieval list. These irrelevant items (false positives) are often caused by the inherent ambiguity of natural language. Take the English word football for example. Football is the name given to a number of different team sports. Worldwide the most popular of these team sports is association football, which also happens to be called soccer in several countries. The word football is also applied to rugby football (rugby union and rugby league), American football, Australian rules football, Gaelic football, and Canadian football. A search for football therefore will retrieve documents that are about several completely different sports. Controlled vocabulary solves this problem by tagging the documents in such a way that the ambiguities are eliminated. Compared to free text searching, the use of a controlled vocabulary can dramatically increase the performance of an information retrieval system, if performance is measured by precision (the percentage of documents in the retrieval list that are actually relevant to the search topic). In some cases controlled vocabulary can enhance recall as well, because unlike natural language schemes, once the correct preferred term is searched, there is no need to search for other terms that might be synonyms of that term. === Disadvantages === A controlled vocabulary search may lead to unsatisfactory recall, in that it will fail to retrieve some documents that are actually relevant to the search question. This is particularly problematic when the search question involves terms that are sufficiently tangential to the subject area such that the indexer might have decided to tag it using a different term (but the searcher might consider the same). Essentially, this can be avoided only by an experienced user of controlled vocabulary whose understanding of the vocabulary coincides with that of the indexer. Another possibility is that the article is just not tagged by the indexer because indexing exhaustivity is low. For example, an article might mention football as a secondary focus, and the indexer might decide not to tag it with "football" because it is not important enough compared to the main focus. But it turns out that for the searcher that article is relevant and hence recall fails. A free text search would automatically pick up that article regardless. On the other hand, free text searches have high exhaustivity (every word is searched) so although it has much lower precision, it has potential for high recall as long as the searcher overcome the problem of synonyms by entering every combination. Controlled vocabularies may become outdated rapidly in fast developing fields of knowledge, unless the preferred terms are updated regularly. Even in an ideal scenario, a controlled vocabulary is often less specific than the words of the text itself. Indexers trying to choose the appropriate index terms might misinterpret the author, while this precise problem is not a factor in a free text, as it uses the author's own words. The use of controlled vocabularies can be costly compared to free

    Read more →
  • Automated machine learning

    Automated machine learning

    Automated machine learning (AutoML) is the process of automating the tasks of applying machine learning to real-world problems. It is the combination of automation and ML. AutoML potentially includes every stage from beginning with a raw dataset to building a machine learning model ready for deployment. AutoML was proposed as an artificial intelligence-based solution to the growing challenge of applying machine learning. The high degree of automation in AutoML aims to allow non-experts to make use of machine learning models and techniques without requiring them to become experts in machine learning. Automating the process of applying machine learning end-to-end additionally offers the advantages of producing simpler solutions, faster creation of those solutions, and models that often outperform hand-designed models. Common techniques used in AutoML include hyperparameter optimization, meta-learning and neural architecture search. == Comparison to the standard approach == In a typical machine learning application, practitioners have a set of input data points to be used for training. The raw data may not be in a form that all algorithms can be applied to. To make the data amenable for machine learning, an expert may have to apply appropriate data pre-processing, feature engineering, feature extraction, and feature selection methods. After these steps, practitioners must then perform algorithm selection and hyperparameter optimization to maximize the predictive performance of their model. If deep learning is used, the architecture of the neural network must also be chosen manually by the machine learning expert. Each of these steps may be challenging, resulting in significant hurdles to using machine learning. AutoML aims to simplify these steps for non-experts, and to make it easier for them to use machine learning techniques correctly and effectively. AutoML plays an important role within the broader approach of automating data science, which also includes challenging tasks such as data engineering, data exploration and model interpretation and prediction. == Targets of automation == Automated machine learning can target various stages of the machine learning process. Steps to automate are: Data preparation and ingestion (from raw data and miscellaneous formats) Column type detection; e.g., Boolean, discrete numerical, continuous numerical, or text Column intent detection; e.g., target/label, stratification field, numerical feature, categorical text feature, or free text feature Task detection; e.g., binary classification, regression, clustering, or ranking Feature engineering Feature selection Feature extraction Meta-learning and transfer learning Detection and handling of skewed data and/or missing values Model selection - choosing which machine learning algorithm to use, often including multiple competing software implementations Ensembling - a form of consensus where using multiple models often gives better results than any single model Hyperparameter optimization of the learning algorithm and featurization Neural architecture search Pipeline selection under time, memory, and complexity constraints Selection of evaluation metrics and validation procedures Problem checking Leakage detection Misconfiguration detection Analysis of obtained results Creating user interfaces and visualizations == Challenges and Limitations == There are a number of key challenges being tackled around automated machine learning. A big issue surrounding the field is referred to as "development as a cottage industry". This phrase refers to the issue in machine learning where development relies on manual decisions and biases of experts. This is contrasted to the goal of machine learning which is to create systems that can learn and improve from their own usage and analysis of the data. Basically, it's the struggle between how much experts should get involved in the learning of the systems versus how much freedom they should be giving the machines. However, experts and developers must help create and guide these machines to prepare them for their own learning. To create this system, it requires labor intensive work with knowledge of machine learning algorithms and system design. Additionally, other challenges include meta-learning and computational resource allocation.

    Read more →
  • Information architecture

    Information architecture

    Information architecture is the structural design of shared information environments, in particular the organisation of websites and software to support usability and findability. The term information architecture was coined by Richard Saul Wurman. Since its inception, information architecture has become an emerging community of practice focused on applying principles of design, architecture and information science in digital spaces. Typically, a model or concept of information is used and applied to activities which require explicit details of complex information systems. These activities include library systems and database development. == Definition == The term information architecture has different meanings in different branches of information systems or information technology. === User experience === In user experience design, information architecture has been described as the structural design of shared information environments, comprising the study and practice of organising and labelling web sites, intranets, online communities, and software to support user experience, in particular, the findability and usability of information. It has also been described as an emerging community of practice focused on bringing principles of design and architecture to the digital landscape. === Information systems === Technically speaking, information architecture comprises the combination of organization, labeling, search and navigation systems within websites and intranets, serving as a navigational aid to the content of information-rich systems. === Data architecture === Information architecture can be described as a subset of data architecture where usable data is constructed, designed, and arranged in a fashion most useful to the users of data. === Systems design === In the field of systems design, for example, information architecture is a component of enterprise architecture that deals with the information component when describing the structure of an enterprise. Some system design practitioners regard information architecture as strictly the application of information science to web design, which considers such issues as classification and information retrieval, and not factors like user experience and information design. == Principles == Principles of information architecture include the following: The principle of objects The principle of choices The principle of disclosure The principle of exemplars The principle of front doors The principle of multiple classification The principle of focused navigation The principle of growth == History == Richard Saul Wurman is credited with coining the term information architecture in relation to the design of information. From 1998 to 2015, Peter Morville and Louis Rosenfeld were co-authors of Information Architecture for the World Wide Web. Other authors include Jesse James Garrett and Christina Wodtke.

    Read more →
  • Enterprise bus matrix

    Enterprise bus matrix

    The enterprise bus matrix is a data warehouse planning tool and model created by Ralph Kimball, and is part of the data warehouse bus architecture. The matrix is the logical definition of one of the core concepts of Kimball's approach to dimensional modeling conformed dimension. The bus matrix defines part of the data warehouse bus architecture and is an output of the business requirements phase in the Kimball lifecycle. It is applied in the following phases of dimensional modeling and development of the data warehouse. The matrix can be categorized as a hybrid model, being part technical design tool, part project management tool and part communication tool == Background == The need for an enterprise bus matrix stems from the way one goes about creating the overall data warehouse environment. Historically there have been two approaches: a structured, centralized and planned approach and a more loosely defined, department specific approach, in which solutions are developed in a more independent matter. Autonomous projects can result in a range of isolated stove pipe data marts. Naturally each approach has its issues; the visionary approach often struggles with long delivery cycles and lack of reaction time as needs emerge and scope issues arise. On the other hand, the development of isolated data marts leads to stovepipe systems that lack synergy in development. Over time this approach will lead to a so-called data-mart-in-a-box architecture where interoperability and lack of cohesion is apparent, and can hinder the realization of an overall enterprise data warehouse. As an attempt to handle this issue, Ralph Kimball introduced the enterprise bus. == Description == The bus matrix purpose is one of high abstraction and visionary planning on the data warehouse architectural level. By dictating coherency in the development and implementation of an overall data warehouse the bus architecture approach enables an overall vision of the broader enterprise integration and consistency while at the same time dividing the problem into more manageable parts – all in a technology and software independent manner. The bus matrix and architecture builds upon the concept of conformed dimensions, creating a structure of common dimensions that ideally can be used across the enterprise by all business processes related to the data warehouse and the corresponding fact tables from which they derive their context. According to Kimball and Margy Ross's article “Differences of Opinion” "The Enterprise Data warehouse built on the bus architecture ”identifies and enforces the relationship between business process metrics (facts) and descriptive attributes (dimensions)”. The concept of a bus is well known in the language of information technology, and is what reflects the conformed dimension concept in the data warehouse, creating the skeletal structure where all parts of a system connect, ensuring interoperability and consistency of data, and at the same time considers future expansion. This makes the conformed dimensions act as the integration ‘glue’, creating a robust backbone of the enterprise Data Warehouse.

    Read more →
  • Microsoft Office PerformancePoint Server

    Microsoft Office PerformancePoint Server

    Microsoft Office PerformancePoint Server is a business intelligence software product released in 2007 by Microsoft. The product was generally an integration of the acquisitions from ProClarity - the Planning Server and Monitoring Server - into Microsoft's SharePoint server product line. Although discontinued in 2009, the dashboard, scorecard, and analytics capabilities of PerformancePoint Server were incorporated into SharePoint 2010 and later versions. PerformancePoint Server also provided a planning and budgeting component directly integrated with Excel. == History == Microsoft offered preview releases of PerformancePoint Server starting in mid-2006. Previews of the product were formed from Business Scorecard Manager 2005 and the Planning Server component. Acquisitions ProClarity and Great Plains brought additional analytics and planning/reporting capabilities, as well as companion products ProClarity 6.3 and FRx. PerformancePoint Server was officially released in November 2007. Microsoft discontinued PerformancePoint Server as an independent product in 2009 and folded its dashboard, scorecard and analytics capabilities into PerformancePoint Services in SharePoint Server 2010. == Monitoring Server Component == Business monitoring capabilities, including dashboards, scorecards & key performance indicators, navigable reports for deeper analysis, strategy maps, and linked filtering, are provided by PerformancePoint's Monitoring Server component. A Dashboard Designer application that is distributed from Monitoring Server enables business analysts or IT Administrators to: create & test data source connections create views that use those data connections assemble the views into a dashboard deploy the dashboard as a SharePoint page Dashboard Designer saved content and security information back to the Monitoring Server. Data source connections, such as OLAP cubes or relational tables, were also made through Monitoring Server. After a dashboard has been published to the Monitoring Server database, it would be deployed as a SharePoint page and shared with other users as such. When the pages were opened in a web browser, Monitoring Server updated the data in the views by connecting back to the original data sources. == Planning Server Component == PerformancePoint's Planning Server component supported maintenance of logical business models, budget & approval workflows, enterprise data sources, and it followed Generally Accepted Accounting Principles. Planning Server made use of Excel for input and line-of-business reporting, as well as SQL Server for storing and processing business models. == Management Reporter Component == The Management Reporter component was designed to perform financial reporting and can read PerformancePoint Planning models directly. A development kit was also available to allow this component to read other models.

    Read more →
  • The 2028 Global Intelligence Crisis

    The 2028 Global Intelligence Crisis

    The 2028 Global Intelligence Crisis is a report authored by James van Geelen and Alap Shah and published by Citrini Research in February 2026, on the impact of artificial intelligence on humanity's future. Written in the form of a scenario analysis, it was viewed millions of times online and reportedly caused a fall in the stock market prices of major tech and financial firms. It also received criticism among others, for its allegedly flawed economic logic. The 'thought exercise', as the authors called it, painted a gloomy picture for the near future, where outputs keep growing while consumer's ability to spend collapses. "...driven by ai agents that don’t sleep, take sick days or require health insurance”, "outputs that are shown in national accounts increases, "but never circulates through the real economy"(which the report calls 'Ghost GDP'), the authors argued. In other words, the authors predict a scenario where the owners of the AI firms will accumulate a vast fortune but there will be scant demand from consumers as AI would cause massive unemployment. The authors caution the reader that what they make is a scenario and not a prediction. In the scenario they visualise, any service whose value proposition is “I will navigate complexity that you find tedious” is getting disrupted. The reports argues that the unique ability of human beings to analyse, decide, create, persuade, and coordinate was “the thing that could not be replicated at scale,” and call the historical scarcity of this precious entity 'friction'. When this friction becomes zero, a gamut of changes occur which then triggers a cascading of changes across the economy. ”Travel booking platforms are an early casualty; Financial advice. tax prep., and routine legal work follow suit. National unemployment rate go as high 10.2% and the S&P 500 goes for a massive 38% peak-to-trough crash. In contrast to the previous technological revolutions the high-earning professionals suffers more and get forced to take up roles in the gig economy. Labour supply becomes abundant and this cuts wages all across the economy. The dent in income for the employees then affects other sectors of the economy such as the residential mortgage market. The losses for the software companies triggers loan defaults and heralds peril for the private credit sector.

    Read more →
  • Unrestricted algorithm

    Unrestricted algorithm

    An unrestricted algorithm is an algorithm for the computation of a mathematical function that puts no restrictions on the range of the argument or on the precision that may be demanded in the result. The idea of such an algorithm was put forward by C. W. Clenshaw and F. W. J. Olver in a paper published in 1980. In the problem of developing algorithms for computing, as regards the values of a real-valued function of a real variable (e.g., g[x] in "restricted" algorithms), the error that can be tolerated in the result is specified in advance. An interval on the real line would also be specified for values when the values of a function are to be evaluated. Different algorithms may have to be applied for evaluating functions outside the interval. An unrestricted algorithm envisages a situation in which a user may stipulate the value of x and also the precision required in g(x) quite arbitrarily. The algorithm should then produce an acceptable result without failure.

    Read more →
  • Information behavior

    Information behavior

    Information behavior is a field of information science research that seeks to understand the way people search for and use information in various contexts. It can include information seeking and information retrieval, but it also aims to understand why people seek information and how they use it. The term 'information behavior' was coined by Thomas D. Wilson in 1982 and sparked controversy upon its introduction. The term has now been adopted and Wilson's model of information behavior is widely cited in information behavior literature. In 2000, Wilson defined information behavior as "the totality of human behavior in relation to sources and channels of information". A variety of theories of information behavior seek to understand the processes that surround information seeking. An analysis of the most cited publications on information behavior during the early 21st century shows its theoretical nature. Information behavior research can employ various research methodologies grounded in broader research paradigms from psychology, sociology and education. In 2003, a framework for information-seeking studies was introduced that aims to guide the production of clear, structured descriptions of research objects and positions information-seeking as a concept within information behavior. == Concepts of information behavior == === Information need === Information need is a concept introduced by Wilson. Understanding the information need of an individual involved three elements: Why the individual decides to look for information, What purpose the information they find will serve, and How the information is used once it is retrieved === Information-seeking behavior === Information-seeking behavior is a more specific concept of information behavior. It specifically focuses on searching, finding, and retrieving information. Information-seeking behavior research can focus on improving information systems or, if it includes information need, can also focus on why the user behaves the way they do. A review study on information search behavior of users highlighted that behavioral factors, personal factors, product/service factors and situational factors affect information search behavior. Information-seeking behavior can be more or less explicit on the part of users: users might seek to solve some task or to establish some piece of knowledge which can be found in the data in question, or alternatively the search process itself is part of the objective of the user, in use cases for exploring visual content or for familiarising oneself with the content of an information service. In the general case, information-seeking needs to be understood and analysed as a session rather than as a one-off transaction with a search engine, and in a broader context which includes user high-level intentions in addition to the immediate information need. === Information use === An information need is the recognition that a gap exists in one’s knowledge, prompting a desire to seek information to fill that gap. It often arises when a person encounters a problem or question they cannot resolve with their current understanding. === Information poverty and barriers === Introduced by Elfreda Chatman in 1987, information poverty is informed by the understanding that information is not equally accessible to all people. Information poverty does not describe a lack of information, but rather a worldview in which one's own experiences inside their own small world may create a distrust in the information provided by those outside their own lived experiences. == Metatheories == In Library and Information Science (LIS), a metatheory is described "a set of assumptions that orient and direct theorizing about a given phenomenon". Library and information science researchers have adopted a number of different metatheories in their research. A common concern among LIS researchers, and a prominent discussion in the field, is the broad spectrum of theories that inform the study of information behavior, information users, or information use. This variation has been noted as a cause of concern because it makes individual studies difficult to compare or synthesize if they are not guided by the same theory. This sentiment has been expressed in studies of information behavior literature from the early 1980s and more recent literature reviews have declared it necessary to refine their reviews to specific contexts or situations due to the sheer breadth of information behavior research available. Below are descriptions of some, but not all, metatheories that have guided LIS research. === Cognitivist approach === A cognitive approach to understanding information behavior is grounded in psychology. It holds the assumption that a person's thinking influences how they seek, retrieve, and use information. Researchers that approach information behavior with the assumption that it is influenced by cognition, seek to understand what someone is thinking while they engage in information behavior and how those thoughts influence their behavior. Wilson's attempt to understand information-seeking behavior by defining information need includes a cognitive approach. Wilson theorizes that information behavior is influenced by the cognitive need of an individual. By understanding the cognitive information need of an individual, we may gain insight into their information behavior. Nigel Ford takes a cognitive approach to information-seeking, focusing on the intellectual processes of information-seeking. In 2004, Ford proposed an information-seeking model using a cognitive approach that focuses on how to improve information retrieval systems and serves to establish information-seeking and information behavior as concepts in and of themselves, rather than synonymous terms. === Constructionist approach === The constructionist approach to information behavior has roots in the humanities and social sciences. It relies on social constructionism, which assumes that a person's information behavior is influenced by their experiences in society. In order to understand information behavior, constructionist researchers must first understand the social discourse that surrounds the behavior. The most popular thinker referenced in constructionist information behavior research is Michel Foucault, who famously rejected the concept of a universal human nature. The constructionist approach to information behavior research creates space for contextualizing the behavior based on the social experiences of the individual. One study that approaches information behavior research through the social constructionist approach is a study of the information behavior of a public library knitting group. The authors use a collectivist theory to frame their research, which denies the universality of information behavior and focuses on "understanding the ways that discourse communities collectively construct information needs, seeking, sources, and uses". === Constructivist approach === The constructivist approach is born out of education and sociology in which, "individuals are seen as actively constructing an understanding of their worlds, heavily influenced by the social world(s) in which they are operating". Constructivist approaches to information behavior research generally treat the individual's reality as constructed within their own mind rather than built by the society in which they live. The constructivist metatheory makes space for the influence of society and culture with social constructivism, "which argues that, while the mind constructs reality in its relationship to the world, this mental process is significantly informed by influences received from societal conventions, history and interaction with significant others". == Theories == A common concern among LIS researchers, and a prominent discussion in the field, is the broad spectrum of theories that inform LIS research. This variation has been noted as a cause of concern because it makes individual studies difficult to compare if they are not guided by the same theory. Recent studies have shown that the impact of these theories and theoretical models is very limited. LIS researchers have applied concepts and theories from many disciplines, including sociology, psychology, communication, organizational behavior, and computer science. === Wilson's theory of information behavior (1981) === The term was coined by Thomas D. Wilson in his 1981 paper, on the grounds that the current term, 'information needs' was unhelpful since 'need' could not be directly observed, while how people behaved in seeking information could be observed and investigated. However, there is increasing work in the information-searching field that is relating behaviors to underlying needs. In 2000, Wilson described information behavior as the totality of human behavior in relation to sources and channels of information, including both active and passive information-seeking, and information use. He described info

    Read more →