AI Content Generation Tools

AI Content Generation Tools — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Data administration

    Data administration

    Data administration or data resource management is an organizational function working in the areas of information systems and computer science that plans, organizes, describes and controls data resources. Data resources are usually stored in databases under a database management system or other software such as electronic spreadsheets. In many smaller organizations, data administration is performed occasionally, or is a small component of the database administrator’s work. In the context of information systems development, data administration ideally begins at system conception, ensuring there is a data dictionary to help maintain consistency, avoid redundancy, and model the database so as to make it logical and usable, by means of data modeling, including database normalization techniques. == Data resource management == According to the Data Management Association (DAMA), data resource management is "the development and execution of architectures, policies, practices and procedures that properly manage the full data lifecycle needs of an enterprise". Data Resource management may be thought of as a managerial activity that applies information system and other data management tools to the task of managing an organization’s data resource to meet a company’s business needs, and the information they provide to their shareholders. From the perspective of database design, it refers to the development and maintenance of data models to facilitate data sharing between different systems, particularly in a corporate context. Data Resource Management is also concerned with both data quality and compatibility between data models. Since the beginning of the information age, businesses need all types of data on their business activity. With each data created, when a business transaction is made, need data is created. With these data, new direction is needed that focuses on managing data as a critical resource of the organization to directly support its business activities. The data resource must be managed with the same intensity and formality that other critical resources are managed. Organizations must emphasize the information aspect of information technology, determine the data needed to support the business, and then use appropriate technology to build and maintain a high-quality data resource that provides that support. Data resource quality is a measure of how well the organization's data resource supports the current and the future business information demand of the organization. The data resource cannot support just the current business information demand while sacrificing the future business information demand. It must support both the current and the future business information demand. The ultimate data resource quality is stability across changing business needs and changing technology. A corporate data resource must be developed within single, organization-wide common data architecture. A data architecture is the science and method of designing and constructing a data resource that is business driven, based on real-world objects and events as perceived by the organization, and implemented into appropriate operating environments. It is the overall structure of a data resource that provides a consistent foundation across organizational boundaries to provide easily identifiable, readily available, high-quality data to support the business information demand. The common data architecture is a formal, comprehensive data architecture that provides a common context within which all data at an organization's disposal are understood and integrated. It is subject oriented, meaning that it is built from data subjects that represent business objects and business events in the real world that are of interest to the organization and about which data are captured and maintained.

    Read more →
  • Model compression

    Model compression

    Model compression is a machine learning technique for reducing the size of trained models. Large models can achieve high accuracy, but often at the cost of significant resource requirements. Compression techniques aim to compress models without significant performance reduction. Smaller models require less storage space, and consume less memory and compute during inference. Compressed models enable deployment on resource-constrained devices such as smartphones, embedded systems, edge computing devices, and consumer electronics computers. Efficient inference is also valuable for large corporations that serve large model inference over an API, allowing them to reduce computational costs and improve response times for users. Model compression is not to be confused with knowledge distillation, in which a smaller "student" model is trained to imitate the input-output behavior of a larger "teacher" model (as opposed to using the "teacher"'s trained parameters or the "teacher"'s training targets). == Techniques == Several techniques are employed for model compression. === Pruning === Pruning sparsifies a large model by setting some parameters to exactly zero. This effectively reduces the number of parameters. This allows the use of sparse matrix operations, which are faster than dense matrix operations. Pruning criteria can be based on magnitudes of parameters, the statistical pattern of neural activations, Hessian values, etc. === Quantization === Quantization reduces the numerical precision of weights and activations. For example, instead of storing weights as 32-bit floating-point numbers, they can be represented using 8-bit integers. Low-precision parameters take up less space, and takes less compute to perform arithmetic with. It is also possible to quantize some parameters more aggressively than others, so for example, a less important parameter can have 8-bit precision while another, more important parameter, can have 16-bit precision. Inference with such models requires mixed-precision arithmetic. Quantized models can also be used during training (rather than after training). PyTorch implements automatic mixed-precision (AMP), which performs autocasting, gradient scaling, and loss scaling. === Low-rank factorization === Weight matrices can be approximated by low-rank matrices. Let W {\displaystyle W} be a weight matrix of shape m × n {\displaystyle m\times n} . A low-rank approximation is W ≈ U V T {\displaystyle W\approx UV^{T}} , where U {\displaystyle U} and V {\displaystyle V} are matrices of shapes m × k , n × k {\displaystyle m\times k,n\times k} . When k {\displaystyle k} is small, this both reduces the number of parameters needed to represent W {\displaystyle W} approximately, and accelerates matrix multiplication by W {\displaystyle W} . Low-rank approximations can be found by singular value decomposition (SVD). The choice of rank for each weight matrix is a hyperparameter, and jointly optimized as a mixed discrete-continuous optimization problem. The rank of weight matrices may also be pruned after training, taking into account the effect of activation functions like ReLU on the implicit rank of the weight matrices. == Training == Model compression may be decoupled from training, that is, a model is first trained without regard for how it might be compressed, then it is compressed. However, it may also be combined with training. The "train big, then compress" method trains a large model for a small number of training steps (less than it would be if it were trained to convergence), then heavily compress the model. It is found that at the same compute budget, this method results in a better model than lightly compressed, small models. In Deep Compression, the compression has three steps. First loop (pruning): prune all weights lower than a threshold, then finetune the network, then prune again, etc. Second loop (quantization): cluster weights, then enforce weight sharing among all weights in each cluster, then finetune the network, then cluster again, etc. Third step: Use Huffman coding to losslessly compress the model. The SqueezeNet paper reported that Deep Compression achieved a compression ratio of 35 on AlexNet, and a ratio of ~10 on SqueezeNets.

    Read more →
  • Gibberlink

    Gibberlink

    GibberLink is an acoustic data transmission project, with an open-source client available on GitHub, in which two conversational AI agents switch from speaking to one another in a Human-listenable language (such as English) to their own unique language that consists of a sound-level protocol after confirming they are both AI agents. The project was created by Anton Pidkuiko and Boris Starkov. == Reception == The project won the global top prize at the ElevenLabs Worldwide Hackathon. It has also been cited as raising questions around AI ethics and oversight. On February 23, 2025, a YouTube video of two independent conversational ElevenLabs AI agents being prompted to chat about booking a hotel (one as a caller, one as a receptionist) received coverage for going viral. In this video, both agents are prompted to switch to ggwave data-over-sound protocol when they identify the other side as AI, and keep speaking in English otherwise.

    Read more →
  • Three-factor learning

    Three-factor learning

    In neuroscience and machine learning, three-factor learning is the combination of Hebbian plasticity with a third modulatory factor to stabilise and enhance synaptic learning. This third factor can represent various signals such as reward, punishment, error, surprise, or novelty, often implemented through neuromodulators. == Description == Three-factor learning introduces the concept of eligibility traces, which flag synapses for potential modification pending the arrival of the third factor, and helps temporal credit assignement by bridging the gap between rapid neuronal firing and slower behavioral timescales, from which learning can be done. Biological basis for Three-factor learning rules have been supported by experimental evidence. This approach addresses the instability of classical Hebbian learning by minimizing autocorrelation and maximizing cross-correlation between inputs.

    Read more →
  • Corel Designer

    Corel Designer

    Corel DESIGNER is a vector-based graphics program. It was originally developed by Micrografx, which was bought by Corel in 2001. The last version developed by Micrografx was 9.0 in 2001. This program was later sold as Corel DESIGNER 9. There are still a number of users who continue working with version 9.0, because newer versions of the product are based on a modified CorelDRAW rather than the original product. Corel DESIGNER is effective for the creation of engineering drawings, but also offers many functions for graphic design. Starting with version X5, Corel DESIGNER Technical Suite includes Corel Designer, CorelDRAW and Corel Photo-Paint. X6 was the last release for Windows XP. == Release history and file formats ==

    Read more →
  • Autognostics

    Autognostics

    Autognostics is a new paradigm that describes the capacity for computer networks to be self-aware. It is considered one of the major components of Autonomic Networking. == Introduction == One of the most important characteristics of today's Internet that has contributed to its success is its basic design principle: a simple and transparent core with intelligence at the edges (the so-called "end-to-end principle"). Based on this principle, the network carries data without knowing the characteristics of that data (e.g., voice, video, etc.) - only the end-points have application-specific knowledge. If something goes wrong with the data, only the edge may be able to recognize that since it knows about the application and what the expected behavior is. The core has no information about what should happen with that data - it only forwards packets. Although an effective and beneficial attribute, this design principle has also led to many of today's problems, limitations, and frustrations. Currently, it is almost impossible for most end-users to know why certain network-based applications do not work well and what they need to do to make it better. Also, network operators who interact with the core in low-level terms such as router configuration have problems expressing their high-level goals into low-level actions. In high-level terms, this may be summarized as a weak coupling between the network and application layers of the overall system. As a consequence of the Internet end-to-end principle, the network performance experienced by a particular application is difficult to attribute based on the behavior of the individual elements. At any given moment, the measure of performance between any two points is typically unknown and applications must operate blindly. As a further consequence, changes to the configuration of given element, or changes in the end-to-end path, cannot easily be validated. Optimization and provisioning cannot then be automated except against only the simplest design specifications. There is an increasing interest in Autonomic Networking research, and a strong conviction that an evolution from the current networking status quo is necessary. Although to date there have not been any practical implementations demonstrating the benefits of an effective autonomic networking paradigm, there seems to be a consensus as to the characteristics which such implementations would need to demonstrate. These specifically include continuous monitoring, identifying, diagnosing and fixing problems based on high-level policies and objectives. Autognostics, as a major part of the autonomic networking concept, intends to bring networks to a new level of awareness and eliminate the lack of visibility which currently exists in today's networks. == Definition == Autognostics is a new paradigm that describes the capacity for computer networks to be self-aware, in part and as a whole, and dynamically adapt to the applications running on them by autonomously monitoring, identifying, diagnosing, resolving issues, subsequently verifying that any remediation was successful, and reporting the impact with respect to the application's use (i.e., providing visibility into the changes to networks and their effects). Although similar to the concept of network awareness, i.e., the capability of network devices and applications to be aware of network characteristics (see References section below), it is noteworthy that autognostics takes that concept one step further. The main difference is the auto part of autognostics, which entails that network devices are self-aware of network characteristics, and have the capability to adapt themselves as a result of continuous monitoring and diagnostics. == Path to autognostics == Autognostics, or in other words deep self-knowledge, can be best described as the ability of a network to know itself and the applications that run on it. This knowledge is used to autonomously adapt to dynamic network and application conditions such as utilization, capacity, quality of service/application/user experience, etc. In order to achieve autognosis, networks need a means to: Continuously monitor/test the network for application-specific performance Analyze the monitoring/test data to detect problems (e.g., performance degradation) Diagnose, identify and localize sources of degradation Automatically take actions to resolve problems via remediation/provisioning Verify the problems have been resolved (potentially rolling back changes if ineffective) Subsequently, continue to monitor/test for performance

    Read more →
  • Double descent

    Double descent

    Double descent in statistics and machine learning is the phenomenon where a model's error rate on the test set initially decreases with the number of parameters, then peaks, then decreases again. This phenomenon has been considered surprising, as it contradicts assumptions about overfitting in classical machine learning. The increase usually occurs near the interpolation threshold, where the number of parameters is the same as the number of training data points (the model is just large enough to fit the training data). Or, more precisely, it is the maximum number of samples on which the model/training procedure achieves approximately on average 0 training error. == History == Early observations of what would later be called double descent in specific models date back to 1989. The term "double descent" was coined by Belkin et. al. in 2019, when the phenomenon gained popularity as a broader concept exhibited by many models. The latter development was prompted by a perceived contradiction between the conventional wisdom that too many parameters in the model result in a significant overfitting error (an extrapolation of the bias–variance tradeoff), and the empirical observations in the 2010s that some modern machine learning techniques tend to perform better with larger models. == Theoretical models == Double descent occurs in linear regression with isotropic Gaussian covariates and isotropic Gaussian noise. A model of double descent at the thermodynamic limit has been analyzed using the replica trick, and the result has been confirmed numerically. A number of works have suggested that double descent can be explained using the concept of effective dimension: While a network may have a large number of parameters, in practice only a subset of those parameters are relevant for generalization performance, as measured by the local Hessian curvature. This explanation is formalized through PAC-Bayes compression-based generalization bounds, which show that less complex models are expected to generalize better under a Solomonoff prior.

    Read more →
  • Surrogate model

    Surrogate model

    A surrogate model is an engineering method used when an outcome of interest cannot be easily measured or computed, so an approximate mathematical model of the outcome is used instead. Most engineering design problems require experiments and/or simulations to evaluate design objective and constraint functions as a function of design variables. For example, in order to find the optimal airfoil shape for an aircraft wing, an engineer simulates the airflow around the wing for different shape variables (e.g., length, curvature, material, etc.). For many real-world problems, however, a single simulation can take many minutes, hours, or even days to complete. As a result, routine tasks such as design optimization, design space exploration, sensitivity analysis and "what-if" analysis become impossible since they require thousands or even millions of simulation evaluations. One way of alleviating this burden is by constructing approximation models, known as surrogate models, metamodels or emulators, that mimic the behavior of the simulation model as closely as possible while being computationally cheaper to evaluate. Surrogate models are constructed using a data-driven, bottom-up approach. The exact, inner working of the simulation code is not assumed to be known (or even understood), relying solely on the input-output behavior. A model is constructed based on modeling the response of the simulator to a limited number of intelligently chosen data points. This approach is also known as behavioral modeling or black-box modeling, though the terminology is not always consistent. When only a single design variable is involved, the process is known as curve fitting. Though using surrogate models in lieu of experiments and simulations in engineering design is more common, surrogate modeling may be used in many other areas of science where there are expensive experiments and/or function evaluations. == Goals == The scientific challenge of surrogate modeling is the generation of a surrogate that is as accurate as possible, using as few simulation evaluations as possible. The process comprises three major steps which may be interleaved iteratively: Sample selection (also known as sequential design, optimal experimental design (OED) or active learning) Construction of the surrogate model and optimizing the model parameters (i.e., bias-variance tradeoff) Appraisal of the accuracy of the surrogate. The accuracy of the surrogate depends on the number and location of samples (expensive experiments or simulations) in the design space. A systematic data representation during training can improve model scalability, thereby reducing the need for expensive simulations. Various design of experiments (DOE) techniques cater to different sources of errors, in particular, errors due to noise in the data or errors due to an improper surrogate model. == Types of surrogate models == Popular surrogate modeling approaches are: polynomial response surfaces; kriging; more generalized Bayesian approaches; gradient-enhanced kriging (GEK); radial basis function; support vector machines; space mapping; artificial neural networks and Bayesian networks. Other methods recently explored include Fourier surrogate modeling , random forests, convolutional neural networks, and generative adversarial networks. For some problems, the nature of the true function is not known a priori, and therefore it is not clear which surrogate model will be the most accurate one. In addition, there is no consensus on how to obtain the most reliable estimates of the accuracy of a given surrogate. Many other problems have known physics properties. In these cases, physics-based surrogates such as space-mapping based models are commonly used. == Invariance properties == Recently proposed comparison-based surrogate models (e.g., ranking support vector machines) for evolutionary algorithms, such as CMA-ES, allow preservation of some invariance properties of surrogate-assisted optimizers: Invariance with respect to monotonic transformations of the function (scaling) Invariance with respect to orthogonal transformations of the search space (rotation) == Applications == An important distinction can be made between two different applications of surrogate models: design optimization and design space approximation (also known as emulation). In surrogate model-based optimization, an initial surrogate is constructed using some of the available budgets of expensive experiments and/or simulations. The remaining experiments/simulations are run for designs which the surrogate model predicts may have promising performance. The process usually takes the form of the following search/update procedure. Initial sample selection (the experiments and/or simulations to be run) Construct surrogate model Search surrogate model (the model can be searched extensively, e.g., using a genetic algorithm, as it is cheap to evaluate) Run and update experiment/simulation at new location(s) found by search and add to sample Iterate steps 2 to 4 until out of time or design is "good enough" Depending on the type of surrogate used and the complexity of the problem, the process may converge on a local or global optimum, or perhaps none at all. In design space approximation, one is not interested in finding the optimal parameter vector, but rather in the global behavior of the system. Here the surrogate is tuned to mimic the underlying model as closely as needed over the complete design space. Such surrogates are a useful, cheap way to gain insight into the global behavior of the system. Optimization can still occur as a post-processing step, although with no update procedure (see above), the optimum found cannot be validated. == Surrogate modeling software == Surrogate Modeling Toolbox (SMT: https://github.com/SMTorg/smt) is a Python package that contains a collection of surrogate modeling methods, sampling techniques, and benchmarking functions. This package provides a library of surrogate models that is simple to use and facilitates the implementation of additional methods. SMT is different from existing surrogate modeling libraries because of its emphasis on derivatives, including training derivatives used for gradient-enhanced modeling, prediction derivatives, and derivatives with respect to the training data. It also includes new surrogate models that are not available elsewhere: kriging by partial-least squares reduction and energy-minimizing spline interpolation. Python library SAMBO Optimization supports sequential optimization with arbitrary models, with tree-based models and Gaussian process models built in. Surrogates.jl is a Julia packages which offers tools like random forests, radial basis methods and kriging. == Surrogate-Assisted Evolutionary Algorithms (SAEAs) == SAEAs are an advanced class of optimization techniques that integrate evolutionary algorithms (EAs) with surrogate models. In traditional EAs, evaluating the fitness of candidate solutions often requires computationally expensive simulations or experiments. SAEAs address this challenge by building a surrogate model, which is a computationally inexpensive approximation of the objective function or constraint functions. The surrogate model serves as a substitute for the actual evaluation process during the evolutionary search. It allows the algorithm to quickly estimate the fitness of new candidate solutions, thereby reducing the number of expensive evaluations needed. This significantly speeds up the optimization process, especially in cases where the objective function evaluations are time-consuming or resource-intensive. SAEAs typically involve three main steps: (1) building the surrogate model using a set of initial sampled data points, (2) performing the evolutionary search using the surrogate model to guide the selection, crossover, and mutation operations, and (3) periodically updating the surrogate model with new data points generated during the evolutionary process to improve its accuracy. By balancing exploration (searching new areas in the solution space) and exploitation (refining known promising areas), SAEAs can efficiently find high-quality solutions to complex optimization problems. They have been successfully applied in various fields, including engineering design, machine learning, and computational finance, where traditional optimization methods may struggle due to the high computational cost of fitness evaluations.

    Read more →
  • Managed private cloud

    Managed private cloud

    Managed private cloud (also known as "hosted private cloud" or "single-tenant SaaS") refers to a principle in software architecture where a single instance of the software runs on a server, serves a single client organization (tenant), and is managed by a third party. The third-party provider is responsible for providing the hardware for the server and also for preliminary maintenance. This is in contrast to multitenancy, where multiple client organizations share a single server, or an on-premises deployment, where the client organization hosts its software instance. Managed private clouds also fall under the larger umbrella of cloud computing. == Adoption == The need for private clouds arose due to enterprises requiring a dedicated service and infrastructure for their cloud computing needs, such as for business-critical operations, improved security, and better control over their resources. Managed private cloud adoption is a popular choice among organizations. It has been on the rise due to enterprises requiring a dedicated cloud environment and preferring to avoid having to deal with management, maintenance, or future upgrade costs for the associated infrastructure and services. Such operational costs are unavoidable in on-premises private cloud data centers. == Advantages and challenges of managed private cloud == A managed private cloud cuts down on upkeep costs by outsourcing infrastructure management and maintenance to the managed cloud provider. It is easier to integrate an organization's existing software, services, and applications into a dedicated cloud hosting infrastructure which can be customized to the client's needs instead of a public cloud platform, whose hardware or infrastructure/software platform cannot be individualized to each client. Customers who choose a managed private cloud deployment usually choose them because of their desire for efficient cloud deployment, but also have the need for service customization or integration only available in a single-tenant environment. This chart shows the key benefits of the different types of deployments, and shows the overlap between these cloud solutions. This chart shows key drawbacks. Since deployments are done in a single-tenant environment, it is usually cost-prohibitive for small and medium-sized businesses. While server upkeep and maintenance are handled by the service provider, including network management and security, the client is charged for all such services. It is up to the potential client to determine if a managed private cloud solution aligns with their business objectives and budget. While the service provider maintains the upkeep of servers, network, and platform infrastructure, sensitive data is typically not stored on managed private clouds as it may leave business-critical information prone to breaches via third-party attacks on the cloud service provider. Common customizations and integrations include: Active Directory Single Sign-on Learning Management Systems Video Teleconferencing == Deployment strategies and service providers == Software companies have taken a variety of strategies in the Managed Private Cloud realm. Some software organizations have provided managed private cloud options internally, such as Microsoft. Companies that offer an on-premises deployment option, by definition, enable third-party companies to market Managed Private Cloud solutions. A few managed private cloud service providers are: Adobe Connect: Adobe Connect may be purchased for on-premises deployment, multi-tenant hosted deployment, managed private cloud as ACMS, or managed by third-party managed private cloud provider ConnectSolutions. Rackspace CenturyLink Microsoft licenses for Lync, SharePoint and Exchange may be purchased for on-premises deployment, a multi-tenant hosted deployment via Office 365, or managed by third-party cloud hosting from Azaleos, ConnectSolutions and others.

    Read more →
  • AI washing

    AI washing

    AI washing is a deceptive marketing tactic that consists of promoting a product or a service by overstating the role of artificial intelligence (AI) and the integration of it. Companies often involve in the practice to mislead customers to boost their offerings, and to secure funding from investors. The practice raises concerns regarding transparency, and legal issues. == Definition == AI washing is a deceptive marketing practice. It involves promoting a product or a service by overstating the role of artificial intelligence (AI) and its integration in the design and manufacture of the same. The practice raises concerns regarding transparency, compliance with security regulations, and consumer trust in the AI industry potentially hampering legitimate advancements in AI. The term was first defined by the AI Now Institute, a research institute based at New York University in 2019. The term is derived from greenwashing, another deceptive marketing technique that misrepresents a product's environmental impact in a similar manner. AI washing might involve a company claiming to have used AI in the development or enhancement of its products or services without its actual involvement, or using buzzwords such as "smart" or "AI-powered" without the product actually offering it or making use of it. A company may overstate the usage of AI or misuse the term, which is also construed as AI washing. In 2026, The Washington Post defined AI washing as "a trend for bosses to blame layoffs on the productive capabilities of AI and its ability to replace workers, even when job cuts may have little to do with the technology". == Usage and effects == AI washing can lead to deception of customers and misleading of investors. It is also an illegal and unethical practice that lacks transparency regarding disclosing the details of a product or a service. Companies get involved in such a practice often in response to competition who might have used AI in their offerings. It might also be used as a ploy to secure funding and investment, assuming that it will attract them towards it. AI washing has been compared to dot-com bubble, when businesses appended "dot-com" to the end of the business name to boost their valuation. In September 2023, Coca-Cola released a new product called Coca-Cola Y3000, and the company stated that the Y3000 flavor had been "co-created with human and artificial intelligence". The company was accused of AI washing due to no proof of AI involvement in the creation of the product, and critics believed that AI was used as a way to grab consumer attention more than it was used in the actual product creation. In 2026, mass tech layoffs were attributed to AI washing from AI innovation instead of balance sheet restructuring. == Mitigation == Companies are expected to be transparent and clearer in communicating the usage of AI in their products or services. Consumers can mitigate the same by requesting for hard evidence from the companies regarding the usage of AI tools. Customers should evaluate the product or service as a whole rather than being swayed by the usage of AI. Informed decision making and purchasing can keep them from falling for such marketing gimmicks. The United States Securities and Exchange Commission (SEC) imposes penalties for companies indulging in such practices. In March 2024, the SEC imposed the first civil penalties on two companies for misleading statements about their use of AI, and in July 2024, it charged a corporate executive from a supposed AI hiring startup with fraud for the usage of buzzwords related to AI.

    Read more →
  • Journal of Machine Learning Research

    Journal of Machine Learning Research

    The Journal of Machine Learning Research is a peer-reviewed open access scientific journal covering machine learning. It was established in 2000 and the first editor-in-chief was Leslie Kaelbling. The current editors-in-chief are Francis Bach (Inria) and David Blei (Columbia University). == History == The journal was established as an open-access alternative to the journal Machine Learning. In 2001, forty editorial board members of Machine Learning resigned, saying that in the era of the Internet, it was detrimental for researchers to continue publishing their papers in expensive journals with pay-access archives. The open access model employed by the Journal of Machine Learning Research allows authors to publish articles for free and retain copyright, while archives are freely available online. Print editions of the journal were published by MIT Press until 2004 and by Microtome Publishing thereafter. From its inception, the journal received no revenue from the print edition and paid no subvention to MIT Press or Microtome Publishing. In response to the prohibitive costs of arranging workshop and conference proceedings publication with traditional academic publishing companies, the journal launched a proceedings publication arm in 2007 and now publishes proceedings for several leading machine learning conferences, including the International Conference on Machine Learning, COLT, AISTATS, and workshops held at the Conference on Neural Information Processing Systems.

    Read more →
  • Bag-of-words model

    Bag-of-words model

    The bag-of-words (BoW) model is a model of text which uses an unordered collection (a "bag") of words. It is used in natural language processing and information retrieval (IR). It disregards word order (and thus most of syntax or grammar) but captures multiplicity. The bag-of-words model is commonly used in methods of document classification where, for example, the (frequency of) occurrence of each word is used as a feature for training a classifier. It has also been used for computer vision. An early reference to "bag of words" in a linguistic context can be found in Zellig Harris's 1954 article on Distributional Structure. == Definition == The following models a text document using bag-of-words. Here are two simple text documents: Based on these two text documents, a list is constructed as follows for each document: Representing each bag-of-words as a JSON object, and attributing to the respective JavaScript variable: Each key is the word, and each value is the number of occurrences of that word in the given text document. The order of elements is free, so, for example {"too":1,"Mary":1,"movies":2,"John":1,"watch":1,"likes":2,"to":1} is also equivalent to BoW1. It is also what we expect from a strict JSON object representation. Note: if another document is like a union of these two, its JavaScript representation will be: So, as we see in the bag algebra, the "union" of two documents in the bags-of-words representation is, formally, the disjoint union, summing the multiplicities of each element. === Word order === The BoW representation of a text removes all word ordering. For example, the BoW representation of "man bites dog" and "dog bites man" are the same, so any algorithm that operates with a BoW representation of text must treat them in the same way. Despite this lack of syntax or grammar, BoW representation is fast and may be sufficient for simple tasks that do not require word order. For instance, for document classification, if the words "stocks" "trade" "investors" appears multiple times, then the text is likely a financial report, even though it would be insufficient to distinguish between Yesterday, investors were rallying, but today, they are retreating.andYesterday, investors were retreating, but today, they are rallying.and so the BoW representation would be insufficient to determine the detailed meaning of the document. == Implementations == Implementations of the bag-of-words model might involve using frequencies of words in a document to represent its contents. The frequencies can be "normalized" by the inverse of document frequency, or tf–idf. Additionally, for the specific purpose of classification, supervised alternatives have been developed to account for the class label of a document. Lastly, binary (presence/absence or 1/0) weighting is used in place of frequencies for some problems (e.g., this option is implemented in the WEKA machine learning software system). == Hashing trick == A common alternative to using dictionaries is the hashing trick, where words are mapped directly to indices with a hash function. When using a hash function, no memory is required to store a dictionary. In practice, hashing simplifies the implementation of bag-of-words models and improves scalability. Collisions can occur when two words are hashed to the same index, but this happens infrequently and may function as a form of regularization.

    Read more →
  • Morphobank

    Morphobank

    MorphoBank is a web application for collaborative evolutionary research, specifically phylogenetic systematics or cladistics, on the phenotype. Historically, scientists conducting research on phylogenetic systematics have worked individually or in small groups employing traditional single-user software applications such as MacClade, Mesquite and Nexus Data Editor. As the hypotheses under study have grown more complex, large research teams have assembled to tackle the problem of discovering the Tree of Life for the estimated 4-100 million living species(Wilson 2003, pp. 77–80) and the many thousands more extinct species known from fossils. Because the phenotype is fundamentally visual, and as phenotype-based phylogenetic studies have continued to increase in size, it becomes important that observations be backed up by labeled images. Traditional desktop software applications currently in wide use do not provide robust support for team-based research or for image manipulation and storage. MorphoBank is a particularly important tool for the growing scientific field of phenomics. The development of MorphoBank, which began in 2001, has been funded by the National Science Foundation's Directorates for Geosciences, Biological Sciences and Computer and Information Science and Engineering. The significance of the scientific work on MorphoBank has been featured in the New York Times(here and here), among other publications. == Advantages == Teams of scientists studying phylogenetics to build the Tree of Life assemble large spreadsheets of observations about species (referred to as "matrices"). These teams require simultaneous access by each team member to a single and secure copy of the team's data during a scientific research project. This single copy of the data also changes with great frequency during the data collection phase. Images that can be very helpful for documenting homology statements must be displayed, labeled and shared as homology statements develop. This cannot be accomplished elegantly with a desktop software package alone because in a desktop environment each collaborator is working on his own private copy of project data. Changes made by one participant cannot automatically propagate to others, preventing collaborators from seeing each other's data edits until they are manually (and due to the effort involved, often only periodically) merged into a single "true" dataset. In all but the smallest and most disciplined of teams, file version control and the reconciliation of changes made on multiple copies of the data emerge quickly as significant drags on productivity. MorphoBank is an attempt to address these issues by leveraging the ubiquity of the web and modern web-based application techniques, including Ajax, web service layers, and rich web applications to provide a full-featured, net-accessible collaborative workspace for phylogenetic research. In particular, MorphoBank makes it easy to: Share all kinds of data with geographically separated team members, including taxonomy, character and specimen data, media (including images, video and audio), phylogenetic matrices (including data in the widely used NEXUS and TNT format) and other data such as documents and genetic sequences. Label high-resolution images using a web-based image annotation application. Collaboratively edit project data such as phylogenetic matrices using a built-in web-based matrix editor. The editor allows the linking of labeled images to individual cells of a matrix. Manage access to project data. Access ranges from full-access for team members to anonymous read-only access for potential reviewers. Publish completed project data on the web in support of a published paper with a persistent URL. Search The Encyclopedia of Life for taxon exemplar images. Store high resolution CT data Create ontologies for updating and populating matrix cells. These tasks are difficult or impossible in most existing software applications. == History == In 2001 the National Science Foundation (NSF) sponsored a workshop, at the American Museum of Natural History in New York to develop the outlines of a web-based system for a collaborative, media-rich research tool for morphological phylogenetics. An application prototype presented at the workshop was later refined with feedback from the workshop and became MorphoBank version 1.0. A grant from the US National Oceanic and Atmospheric Administration funded further revisions resulting in version 2.0, released in 2005. Current support from the NSF is funding current feature enhancements to MorphoBank. MorphoBank was hosted by Stony Brook University until late October 2021 and received back up support from the American Museum of Natural History. The current version is 3.0. Rationale for the software was described in the journal Cladistics. MorphoBank has also received support from NESCENT and the San Diego Supercomputer Center. Since 2018, MorphoBank has been supported in part by Phoenix Bioinformatics, a non-profit company founded to sustain databases for the basic sciences. A permanent move of MorphoBank from Stony Brook University to Phoenix Bioinformatics was complete in late October 2021. The San Diego Supercomputer Center has previously provided technical and hosting resources to the MorphoBank project. == Usage == MorphoBank hosts the products of peer-reviewed scientific research on phenotypes. An increasing volume of systematics data is "born digital" and MorphoBank is well suited to handle this type of material. On August 24, 2007, 62 active research projects were hosted by MorphoBank, as well as 6 completed (and published) projects. By 2017 over 2000 scientists and their students were registered content builders (users are not required to register and are even more numerous) and has more than 500 publicly available projects with approximately 80,000 images that are the products of scientific research. Over 1,500 active research projects are hosted by MorphoBank. The software has been used to assemble phylogenetic research on such groups as mammals, from bats to whales, bivalve molluscs, arachnids, fossil plants and living and extinct amniotes. It has also been used more broadly in evolutionary and paleontological research to host curated images associated with published research on lacewing insects geckos, raptor birds, dinosaurs, frogs and nematodes. MorphoBank is increasingly used in conjunction with the Paleobiology Database. Example published projects: Project 1097: Blank CE, 2013 Origin and early evolution of photosynthetic eukaryotes in freshwater environments – reinterpreting proterozoic paleobiology and biogeochemical processes in light of trait evolution Project 2520: Carvalho, T. P., R. E. Reis, and J. P. Friel, 2017 A new species of Hoplomyzon (Siluriformes: Aspredinidae) from Maracaibo Basin, Venezuela: osteological description using high-resolution Project 2651: Baron, M. G., Norman, D. B., Barrett, P. M., 2017 A new hypothesis of dinosaur relationships and early dinosaur evolution MorphoBank has been particularly important to the Assembling the Tree of Life initiative sponsored by the National Science Foundation. MorphoBank is well-suited to such projects because of its tools for merging taxonomic, character and matrix-based data, as well as its collaborative features. Highlights of this research include a collaborative matrix on mammal evolution published in Science that included over 4,000 phenomic characters scored for over 80 species, a matrix on extant baleen whales featuring nearly 600 images, and more.

    Read more →
  • Attention (machine learning)

    Attention (machine learning)

    In machine learning, attention is a method that determines the importance of each component in a sequence relative to the other components in that sequence. In natural language processing, importance is represented by "soft" weights assigned to each word in a sentence. More generally, attention encodes vectors called token embeddings across a fixed-width sequence that can range from tens to millions of tokens in size. Unlike "hard" weights, which are computed during the backwards training pass, "soft" weights exist only in the forward pass and therefore change with every step of the input. Earlier designs implemented the attention mechanism in a serial recurrent neural network (RNN) language translation system, but a more recent design, namely the transformer, removed the slower sequential RNN and relied more heavily on the faster parallel attention scheme. Inspired by ideas about attention in humans, the attention mechanism was developed to address the weaknesses of using information from the hidden layers of recurrent neural networks. Recurrent neural networks favor information contained in words at the end of a sentence and thus deemed more recent, thereby tending to attenuate the significance and associated predictive weight assigned to information earlier in the sentence. Attention allows a token equal access to any part of a sentence directly, rather than only through the previous state. == History == Additional surveys of the attention mechanism in deep learning are provided by Niu et al. and Soydaner. The major breakthrough came with self-attention, where each element in the input sequence attends to all others, enabling the model to capture global dependencies. This idea was central to the Transformer architecture, which replaced recurrence with attention mechanisms. As a result, Transformers became the foundation for models like BERT, T5 and generative pre-trained transformers (GPT). == Overview == The modern era of machine attention was revitalized by grafting an attention mechanism (Fig 1. orange) to an Encoder-Decoder. Figure 2 shows the internal step-by-step operation of the attention block (A) in Fig 1. === Interpreting attention weights === In translating between languages, alignment is the process of matching words from the source sentence to words of the translated sentence. Networks that perform verbatim translation without regard to word order would show the highest scores along the (dominant) diagonal of the matrix. The off-diagonal dominance shows that the attention mechanism is more nuanced. Consider an example of translating I love you to French. On the first pass through the decoder, 94% of the attention weight is on the first English word I, so the network offers the word je. On the second pass of the decoder, 88% of the attention weight is on the third English word you, so it offers t'. On the last pass, 95% of the attention weight is on the second English word love, so it offers aime. In the I love you example, the second word love is aligned with the third word aime. Stacking soft row vectors together for je, t', and aime yields an alignment matrix: Sometimes, alignment can be multiple-to-multiple. For example, the English phrase look it up corresponds to cherchez-le. Thus, "soft" attention weights work better than "hard" attention weights (setting one attention weight to 1, and the others to 0), as we would like the model to make a context vector consisting of a weighted sum of the hidden vectors, rather than "the best one", as there may not be a best hidden vector. == Variants == Many variants of attention implement soft weights, such as fast weight programmers, or fast weight controllers (1992). A "slow" neural network outputs the "fast" weights of another neural network through outer products. The slow network learns by gradient descent. It was later renamed as "linearized self-attention". Bahdanau-style attention, also referred to as additive attention, Luong-style attention, which is known as multiplicative attention, Early attention mechanisms similar to modern self-attention were proposed using recurrent neural networks. However, the highly parallelizable self-attention was introduced in 2017 and successfully used in the Transformer model, positional attention and factorized positional attention. For convolutional neural networks, attention mechanisms can be distinguished by the dimension on which they operate, namely: spatial attention, channel attention, or combinations. These variants recombine the encoder-side inputs to redistribute those effects to each target output. Often, a correlation-style matrix of dot products provides the re-weighting coefficients. In the figures below, W is the matrix of context attention weights, similar to the formula in Overview section above. == Optimizations == === Flash attention === The size of the attention matrix is proportional to the square of the number of input tokens. Therefore, when the input is long, calculating the attention matrix requires a lot of GPU memory. Flash attention is an implementation that reduces the memory needs and increases efficiency without sacrificing accuracy. It achieves this by partitioning the attention computation into smaller blocks that fit into the GPU's faster on-chip memory, reducing the need to store large intermediate matrices and thus lowering memory usage while increasing computational efficiency. === FlexAttention === FlexAttention is an attention kernel developed by Meta that allows users to modify attention scores prior to softmax and dynamically chooses the optimal attention algorithm. == Applications == Attention is widely used in natural language processing, computer vision, and speech recognition. In NLP, it improves context understanding in tasks like question answering and summarization. In vision, visual attention helps models focus on relevant image regions, enhancing object detection and image captioning. === Attention maps as explanations for vision transformers === From the original paper on vision transformers (ViT), visualizing attention scores as a heat map (called saliency maps or attention maps) has become an important and routine way to inspect the decision making process of ViT models. One can compute the attention maps with respect to any attention head at any layer, while the deeper layers tend to show more semantically meaningful visualization. Attention rollout is a recursive algorithm to combine attention scores across all layers, by computing the dot product of successive attention maps. Because vision transformers are typically trained in a self-supervised manner, attention maps are generally not class-sensitive. When a classification head is attached to the ViT backbone, class-discriminative attention maps (CDAM) combines attention maps and gradients with respect to the class [CLS] token. Some class-sensitive interpretability methods originally developed for convolutional neural networks can be also applied to ViT, such as GradCAM, which back-propagates the gradients to the outputs of the final attention layer. Using attention as basis of explanation for the transformers in language and vision is not without debate. While some pioneering papers analyzed and framed attention scores as explanations, higher attention scores do not always correlate with greater impact on model performances. == Mathematical representation == === Standard scaled dot-product attention === For matrices: Q ∈ R m × d k , K ∈ R n × d k {\displaystyle Q\in \mathbb {R} ^{m\times d_{k}},K\in \mathbb {R} ^{n\times d_{k}}} and V ∈ R n × d v {\displaystyle V\in \mathbb {R} ^{n\times d_{v}}} , the scaled dot-product, or QKV attention, is defined as: Attention ( Q , K , V ) = softmax ( Q K T d k ) V ∈ R m × d v {\displaystyle {\text{Attention}}(Q,K,V)={\text{softmax}}\left({\frac {QK^{T}}{\sqrt {d_{k}}}}\right)V\in \mathbb {R} ^{m\times d_{v}}} where T {\displaystyle {}^{T}} denotes transpose and the softmax function is applied independently to every row of its argument. The matrix Q {\displaystyle Q} contains m {\displaystyle m} queries, while matrices K , V {\displaystyle K,V} jointly contain an unordered set of n {\displaystyle n} key-value pairs. Value vectors in matrix V {\displaystyle V} are weighted using the weights resulting from the softmax operation, so that the rows of the m {\displaystyle m} -by- d v {\displaystyle d_{v}} output matrix are confined to the convex hull of the points in R d v {\displaystyle \mathbb {R} ^{d_{v}}} given by the rows of V {\displaystyle V} . To understand the permutation invariance and permutation equivariance properties of QKV attention, let A ∈ R m × m {\displaystyle A\in \mathbb {R} ^{m\times m}} and B ∈ R n × n {\displaystyle B\in \mathbb {R} ^{n\times n}} be permutation matrices; and D ∈ R m × n {\displaystyle D\in \mathbb {R} ^{m\times n}} an arbitrary matrix. The softmax function is permutation equivariant in the sense that: softmax ( A D B ) = A softmax ( D ) B {\displays

    Read more →
  • Dataset shift

    Dataset shift

    Dataset shift is a phenomenon in machine learning and statistics in which the joint distribution of input variables and target labels is different in the training phase and the deployment or test phase (i.e., P t r a i n ( X , Y ) ≠ P t e s t ( X , Y ) {\displaystyle P_{train}(X,Y)\neq P_{test}(X,Y)} ). This happens when the statistical properties of data used to train a model are no longer representative of the data encountered in real-world use, often resulting in degraded predictive performance and diminished generalization ability. Dataset shift is a generic term for a number of particular types of distributional change. Covariate shift is when the distribution of the input features changes, but the conditional relationship between inputs and outputs remains constant . Prior probability shift (or label shift) happens when the distribution of target labels changes, but the conditional distribution of inputs given labels stays the same. Concept shift (also known as concept drift) is the change of the conditional relationship between inputs and outputs that renders previously learned patterns invalid over time. A key challenge for deploying machine learning systems is dataset shift, in particular in dynamic environments where the data distributions change over time. Detecting and mitigating such shifts is an active area of research, e.g., drift detection, domain adaptation, continual learning.

    Read more →