AI Art Detection

AI Art Detection — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Contract management software

    Contract management software

    Contract management software constitutes software and associated data management used to support contract management, contract lifecycle management, and contractor management on projects in the procurement of goods and services. It may be used together with project management software. == History == Historically, contract management was seen as a "paper-intensive" process. Early steps from the early 2000's reported by the Aberdeen Group required extensive data conversion work to enable documents to be handled electronically. With the adoption of the European Union's General Data Protection Regulation (GDPR) in 2016, companies needed to take additional steps in regards to contract management. Each data responsible entity was obliged to sign data processing agreements (DPAs) with the various vendors, who treat personal data on behalf of the data responsible. DPAs need to be regularly controlled, adjusted and renewed, which adds an extra agreement to such vendors or at least an extra DPA addendum to each agreement. By 2018, Ardent Partner's research had found that software used for automating contract management activities was being more extensively used among major companies or businesses with "Best-in-Class" procurement teams. Contract management process automation was found to be closely linked with more effective internal business collaboration, standardization and risk management. == Advantages and key functions == Using contract management software can have multiple benefits compared to manually managing paper contracts. This software can help keep track of multiple activities and can have features for automating administration, ensuring compliance, monitoring risk, running reports and triggering alerts. In addition to these types of features, contract management software systems provide a centralized repository for employees to quickly access all contracts worldwide in one place. Contract management software is produced by many companies, working on a range of scales and offering varying degrees of customizability. Basic functions should include the ability to store contract documents, track changes to contract documents, search documents for a particular criterion, send key date alerts and to report required aspects of the contract. Other functions include managing a new contract request, capturing related data, following a document through a review and approval process, and collecting digital signatures. Contract management software may also be an aid to project portfolio management and spend analysis, and may also monitor KPIs. Leading contract management software provides contract visibility, monitoring, and compliance to automate and streamline the contract lifecycle process. Contract management software which uses artificial intelligence (AI) can identify contract types based on pattern recognition. AI contracting software trains its algorithms on a set of contract data to recognize patterns and extract variables such as clauses, dates, and parties. It also offers simple prediction capabilities, by sorting through a large volume of contracts and flagging individual contracts based on specified criteria. AI software can also read contracts in multiple formats and languages, extract contract data, and provide analytics. It can reduce the risk of human error in contract drafting and review. A centralized repository provides a critical advantage allowing for all contract documents to be stored within one location. Having contracts stored in multiple locations can delay and interrupt the contracting process. == Contract risk management software (CRMS) for capital projects == Very large enterprises, such as capital expenditure (capex) projects, involve multiple parties and high risk and uncertainty. They are unlike traditional operating contracts in that they are subject to shared deadlines in unique situations. As the complexity of these unique projects increases, the relationships between parties become more important. This requires contract management software, or contract risk management software (CRMS), to become more dynamic and responsive. The terms of these capex contracts necessarily involve assumptions at the start of the process and are likely to change over the lifetime of the project lifecycle. For this reason, CRMS must be capable of recording one single instance of agreed changes to contract terms and incorporating these changes in an auditable and legally robust way. With multiple decision makers involved, CRMS should also make accountability more transparent and enable faster decisions about variation proposals.

    Read more →
  • Neocognitron

    Neocognitron

    The neocognitron is a hierarchical, multilayered artificial neural network proposed by Kunihiko Fukushima in 1979. It has been used for Japanese handwritten character recognition and other pattern recognition tasks, and served as the inspiration for convolutional neural networks. Previously in 1969, he published a similar architecture, but with hand-designed kernels inspired by convolutions in mammalian vision. In 1975 he improved it to the Cognitron, and in 1979 he improved it to the neocognitron, which learns all convolutional kernels by unsupervised learning (in his terminology, "self-organized by 'learning without a teacher'"). The neocognitron was inspired by the model proposed by Hubel & Wiesel in 1959. They found two types of cells in the visual primary cortex called simple cell and complex cell, and also proposed a cascading model of these two types of cells for use in pattern recognition tasks. The neocognitron is a natural extension of these cascading models. The neocognitron consists of multiple types of cells, the most important of which are called S-cells and C-cells. The local features are extracted by S-cells, and these features' deformation, such as local shifts, are tolerated by C-cells. Local features in the input are integrated gradually and classified in the higher layers. The idea of local feature integration is found in several other models, such as the Convolutional Neural Network model, the SIFT method, and the HoG method. There are various kinds of neocognitron. For example, some types of neocognitron can detect multiple patterns in the same input by using backward signals to achieve selective attention.

    Read more →
  • Automated Pain Recognition

    Automated Pain Recognition

    Automated Pain Recognition (APR) is a method for objectively measuring pain and at the same time represents an interdisciplinary research area that comprises elements of medicine, psychology, psychobiology, and computer science. The focus is on computer-aided objective recognition of pain, implemented on the basis of machine learning. Automated pain recognition allows for the valid, reliable detection and monitoring of pain in people who are unable to communicate verbally. The underlying machine learning processes are trained and validated in advance by means of unimodal or multimodal body signals. Signals used to detect pain may include facial expressions or gestures and may also be of a (psycho-)physiological or paralinguistic nature. To date, the focus has been on identifying pain intensity, but visionary efforts are also being made to recognize the quality, site, and temporal course of pain. However, the clinical implementation of this approach is a controversial topic in the field of pain research. Critics of automated pain recognition argue that pain diagnosis can only be performed subjectively by humans. == Background == Pain diagnosis under conditions where verbal reporting is restricted - such as in verbally and/or cognitively impaired people or in patients who are sedated or mechanically ventilated - is based on behavioral observations by trained professionals. However, all known observation procedures (e.g., Zurich Observation Pain Assessment (ZOPA)); Pain Assessment in Advanced Dementia Scale (PAINAD) require a great deal of specialist expertise. These procedures can be made more difficult by perception- and interpretation-related misjudgments on the part of the observer. With regard to the differences in design, methodology, evaluation sample, and conceptualization of the phenomenon of pain, it is difficult to compare the quality criteria of the various tools. Even if trained personnel could theoretically record pain intensity several times a day using observation instruments, it would not be possible to measure it every minute or second. In this respect, the goal of automated pain recognition is to use valid, robust pain response patterns that can be recorded multimodally for a temporally dynamic, high-resolution, automated pain intensity recognition system. == Procedure == For automated pain recognition, pain-relevant parameters are usually recorded using non-invasive sensor technology, which captures data on the (physical) responses of the person in pain. This can be achieved with camera technology that captures facial expressions, gestures, or posture, while audio sensors record paralinguistic features. (Psycho-)physiological information such as muscle tone and heart rate can be collected via biopotential sensors (electrodes). Pain recognition requires the extraction of meaningful characteristics or patterns from the data collected. This is achieved using machine learning techniques that are able to provide an assessment of the pain after training (learning), e.g., "no pain," "mild pain," or "severe pain." == Parameters == Although the phenomenon of pain comprises different components (sensory discriminative, affective (emotional), cognitive, vegetative, and (psycho-)motor), automated pain recognition currently relies on the measurable parameters of pain responses. These can be divided roughly into the two main categories of "physiological responses" and "behavioral responses". === Physiological responses === In humans, pain almost always initiates autonomic nervous processes that are reflected measurably in various physiological signals. ==== Physiological signals ==== Measurements can include electrodermal activity (EDA, also skin conductance), electromyography (EMG), electrocardiogram (ECG), blood volume pulse (BVP), electroencephalogram (EEG), respiration, and body temperature, which are regulatory mechanisms of the sympathetic and parasympathetic systems. Physiological signals are mainly recorded using special non-invasive surface electrodes (for EDA, EMG, ECG, and EEG), a blood volume pulse sensor (BVP), a respiratory belt (respiration), and a thermal sensor (body temperature). Endocrinological and immunological parameters can also be recorded, but this requires measures that are somewhat invasive (e.g., blood sampling). === Behavioral responses === Behavioral responses to pain fulfil two functions: protection of the body (e.g., through protective reflexes) and external communication of the pain (e.g., as a cry for help). The responses are particularly evident in facial expressions, gestures, and paralinguistic features. ==== Facial expressions ==== Behavioral signals captured comprise facial expression patterns (expressive behavior), which are measured with the aid of video signals. Facial expression recognition is based on the everyday clinical observation that pain often manifests itself in the patient's facial expressions but that this is not necessarily always the case, since facial expressions can be inhibited through self-control. Despite the possibility that facial expressions may be influenced consciously, facial expression behavior represents an essential source of information for pain diagnosis and is thus also a source of information for automatic pain recognition. One advantage of video-based facial expression recognition is the contact-free measurement of the face, provided that it can be captured on video, which is not possible in every position (e.g., lying face down) or may be limited by bandages covering the face. Facial expression analysis relies on rapid, spontaneous, and temporary changes in neuromuscular activity that lead to visually detectable changes in the face. ==== Gestures ==== Gestures are also captured predominantly using non-contact camera technology. Motor pain responses vary and are strongly dependent on the type and cause of the pain. They range from abrupt protective reflexes (e.g., spontaneous retraction of extremities or doubling up) to agitation (pathological restlessness) and avoidance behavior (hesitant, cautious movements). ==== Paralinguistic features of language ==== Among other things, pain leads to nonverbal linguistic behavior that manifests itself in sounds such as sighing, gasping, moaning, whining, etc. Paralinguistic features are usually recorded using highly sensitive microphones. == Algorithms == After the recording, pre-processing (e.g., filtering), and extraction of relevant features, an optional information fusion can be performed. During this process, modalities from different signal sources are merged to generate new or more precise knowledge. The pain is classified using machine learning processes. The method chosen has a significant influence on the recognition rate and depends greatly on the quality and granularity of the underlying data. Similar to the field of affective computing, the following classifiers are currently being used: Support Vector Machine (SVM): The goal of an SVM is to find a clearly defined optimal hyperplane with the greatest minimal distance to two (or more) classes to be separated. The hyperplane acts as a decision function for classifying an unknown pattern. Random Forest (RF): RF is based on the composition of random, uncorrelated decision trees. An unknown pattern is judged individually by each tree and assigned to a class. The final classification of the patterns by the RF is then based on a majority decision. k-Nearest Neighbors (k-NN): The k-NN algorithm classifies an unknown object using the class label that most commonly classifies the k neighbors closest to it. Its neighbors are determined using a selected similarity measure (e.g., Euclidean distance, Jaccard coefficient, etc.). Artificial neural networks (ANNs): ANNs are inspired by biological neural networks and model their organizational principles and processes in a very simplified manner. Class patterns are learned by adjusting the weights of the individual neuronal connections. == Databases == In order to classify pain in a valid manner, it is necessary to create representative, reliable, and valid pain databases that are available to the machine learner for training. An ideal database would be sufficiently large and would consist of natural (not experimental), high-quality pain responses. However, natural responses are difficult to record and can only be obtained to a limited extent; in most cases they are characterized by suboptimal quality. The databases currently available therefore contain experimental or quasi-experimental pain responses, and each database is based on a different pain model. The following list shows a selection of the most relevant pain databases (last updated: April 2020): UNBC-McMaster Shoulder Pain BioVid Heat Pain EmoPain SenseEmotion X-ITE Pain

    Read more →
  • Natarajan dimension

    Natarajan dimension

    In the theory of Probably Approximately Correct Machine Learning, the Natarajan dimension characterizes the complexity of learning a set of functions, generalizing from the Vapnik–Chervonenkis dimension for boolean functions to multi-class functions. Originally introduced as the Generalized Dimension by Natarajan, it was subsequently renamed the Natarajan Dimension by Haussler and Long. == Definition == Let H {\displaystyle H} be a set of functions from a set X {\displaystyle X} to a set Y {\displaystyle Y} . H {\displaystyle H} shatters a set C ⊂ X {\displaystyle C\subset X} if there exist two functions f 0 , f 1 ∈ H {\displaystyle f_{0},f_{1}\in H} such that For every x ∈ C , f 0 ( x ) ≠ f 1 ( x ) {\displaystyle x\in C,f_{0}(x)\neq f_{1}(x)} . For every B ⊂ C {\displaystyle B\subset C} , there exists a function h ∈ H {\displaystyle h\in H} such that for all x ∈ B , h ( x ) = f 0 ( x ) {\displaystyle x\in B,h(x)=f_{0}(x)} and for all x ∈ C − B , h ( x ) = f 1 ( x ) {\displaystyle x\in C-B,h(x)=f_{1}(x)} . The Natarajan dimension of H is the maximal cardinality of a set shattered by H {\displaystyle H} . It is easy to see that if | Y | = 2 {\displaystyle |Y|=2} , the Natarajan dimension collapses to the Vapnik–Chervonenkis dimension. Shalev-Shwartz and Ben-David present comprehensive material on multi-class learning and the Natarajan dimension, including uniform convergence and learnability. Recently, Cohen et al showed that the Natarajan dimension is the dominant term governing agnostic multi-class PAC learnability.

    Read more →
  • United States Tech Force

    United States Tech Force

    The U.S. Tech Force (also styled as US Tech Force, Tech Force, or Government Tech Force) is a federal hiring initiative launched by the second Donald Trump administration in December 2025. The program, administered by the Office of Personnel Management (OPM), aims to recruit about 1,000 early-career technology professionals into two-year government jobs to modernize federal IT systems, advance artificial intelligence (AI) capabilities, and address technological gaps in government operations. The initiative is an effort to plug capability gaps created by Trump-administration efforts to shrink the federal government, which led to the departure of some 220,000 federal employees, including many in IT. The initiative seeks early-career workers; officials said it would offer competitive salaries and opportunities to work on high-impact government technology projects. Major technology companies—including Amazon, Apple, Microsoft, Nvidia, Meta, Google, and OpenAI—agreed to help identify and refer candidates. Candidates are allowed to take Tech Force positions on leaves of absence and without divesting their stock, raising conflict-of-interest questions. In January 2026, OPM direction Scott Kupor said the deadline for applying to Tech Force was being extended because of "tremendous interest" without saying how many people had actually applied. Also in December 2025, news broke that the administration is planning another novel use of private-sector workers: hiring cybersecurity firms for offensive cyber operations.

    Read more →
  • Linear discriminant analysis

    Linear discriminant analysis

    Linear discriminant analysis (LDA), normal discriminant analysis (NDA), canonical variates analysis (CVA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification. LDA is closely related to analysis of variance (ANOVA) and regression analysis, which also attempt to express one dependent variable as a linear combination of other features or measurements. However, ANOVA uses categorical independent variables and a continuous dependent variable, whereas discriminant analysis has continuous independent variables and a categorical dependent variable (i.e. the class label). Logistic regression and probit regression are more similar to LDA than ANOVA is, as they also explain a categorical variable by the values of continuous independent variables. These other methods are preferable in applications where it is not reasonable to assume that the independent variables have a normal distribution, which is a fundamental assumption of the LDA method. LDA is also closely related to principal component analysis (PCA) and factor analysis in that they both look for linear combinations of variables which best explain the data. LDA explicitly attempts to model the difference between the classes of data. PCA, in contrast, does not take into account any difference in class, and factor analysis builds the feature combinations based on similarities rather than differences. Discriminant analysis is also different from factor analysis in that it is not an interdependence technique: a distinction between independent variables and dependent variables (also called criterion variables) must be made. LDA works when the measurements made on independent variables for each observation are continuous quantities. When dealing with categorical independent variables, the equivalent technique is discriminant correspondence analysis. Discriminant analysis is used when groups are known a priori (unlike in cluster analysis). Each case must have a score on one or more quantitative predictor measures, and a score on a group measure. In simple terms, discriminant function analysis is classification - the act of distributing things into groups, classes or categories of the same type. == History == The original dichotomous discriminant analysis was developed by Sir Ronald Fisher in 1936. It is different from an ANOVA or MANOVA, which is used to predict one (ANOVA) or multiple (MANOVA) continuous dependent variables by one or more independent categorical variables. Discriminant function analysis is useful in determining whether a set of variables is effective in predicting category membership. == LDA for two classes == Consider a set of observations x → {\displaystyle {\vec {x}}} (also called features, attributes, variables or measurements) for each sample of an object or event with known class y {\displaystyle y} . This set of samples is called the training set in a supervised learning context. The classification problem is then to find a good predictor for the class y {\displaystyle y} of any sample of the same distribution (not necessarily from the training set) given only an observation x → {\displaystyle {\vec {x}}} . LDA approaches the problem by assuming that the conditional probability density functions p ( x → | y = 0 ) {\displaystyle p({\vec {x}}|y=0)} and p ( x → | y = 1 ) {\displaystyle p({\vec {x}}|y=1)} are both the normal distribution with mean and covariance parameters ( μ → 0 , Σ 0 ) {\displaystyle \left({\vec {\mu }}_{0},\Sigma _{0}\right)} and ( μ → 1 , Σ 1 ) {\displaystyle \left({\vec {\mu }}_{1},\Sigma _{1}\right)} , respectively. Under this assumption, the Bayes-optimal solution is to predict points as being from the second class if the log of the likelihood ratios is bigger than some threshold T, so that: 1 2 ( x → − μ → 0 ) T Σ 0 − 1 ( x → − μ → 0 ) + 1 2 ln ⁡ | Σ 0 | − 1 2 ( x → − μ → 1 ) T Σ 1 − 1 ( x → − μ → 1 ) − 1 2 ln ⁡ | Σ 1 | > T {\displaystyle {\frac {1}{2}}({\vec {x}}-{\vec {\mu }}_{0})^{\mathrm {T} }\Sigma _{0}^{-1}({\vec {x}}-{\vec {\mu }}_{0})+{\frac {1}{2}}\ln |\Sigma _{0}|-{\frac {1}{2}}({\vec {x}}-{\vec {\mu }}_{1})^{\mathrm {T} }\Sigma _{1}^{-1}({\vec {x}}-{\vec {\mu }}_{1})-{\frac {1}{2}}\ln |\Sigma _{1}|\ >\ T} Without any further assumptions, the resulting classifier is referred to as quadratic discriminant analysis (QDA). LDA instead makes the additional simplifying homoscedasticity assumption (i.e. that the class covariances are identical, so Σ 0 = Σ 1 = Σ {\displaystyle \Sigma _{0}=\Sigma _{1}=\Sigma } ) and that the covariances have full rank. In this case, several terms cancel: x → T Σ 0 − 1 x → = x → T Σ 1 − 1 x → {\displaystyle {\vec {x}}^{\mathrm {T} }\Sigma _{0}^{-1}{\vec {x}}={\vec {x}}^{\mathrm {T} }\Sigma _{1}^{-1}{\vec {x}}} x → T Σ i − 1 μ → i = μ → i T Σ i − 1 x → {\displaystyle {\vec {x}}^{\mathrm {T} }{\Sigma _{i}}^{-1}{\vec {\mu }}_{i}={{\vec {\mu }}_{i}}^{\mathrm {T} }{\Sigma _{i}}^{-1}{\vec {x}}} because both sides are scalar and transpose to each other ( Σ i {\displaystyle \Sigma _{i}} is Hermitian) and the above decision criterion becomes a threshold on the dot product w → T x → > c {\displaystyle {\vec {w}}^{\mathrm {T} }{\vec {x}}>c} for some threshold constant c, where w → = Σ − 1 ( μ → 1 − μ → 0 ) {\displaystyle {\vec {w}}=\Sigma ^{-1}({\vec {\mu }}_{1}-{\vec {\mu }}_{0})} c = 1 2 w → T ( μ → 1 + μ → 0 ) {\displaystyle c={\frac {1}{2}}\,{\vec {w}}^{\mathrm {T} }({\vec {\mu }}_{1}+{\vec {\mu }}_{0})} This means that the criterion of an input x → {\displaystyle {\vec {x}}} being in a class y {\displaystyle y} is purely a function of this linear combination of the known observations. It is often useful to see this conclusion in geometrical terms: the criterion of an input x → {\displaystyle {\vec {x}}} being in a class y {\displaystyle y} is purely a function of projection of multidimensional-space point x → {\displaystyle {\vec {x}}} onto vector w → {\displaystyle {\vec {w}}} (thus, we only consider its direction). In other words, the observation belongs to y {\displaystyle y} if corresponding x → {\displaystyle {\vec {x}}} is located on a certain side of a hyperplane perpendicular to w → {\displaystyle {\vec {w}}} . The location of the plane is defined by the threshold c {\displaystyle c} . == Assumptions == The assumptions of discriminant analysis are the same as those for MANOVA. The analysis is quite sensitive to outliers and the size of the smallest group must be larger than the number of predictor variables. Multivariate normality: Independent variables are normal for each level of the grouping variable. Homogeneity of variance/covariance (homoscedasticity): Variances among group variables are the same across levels of predictors. Can be tested with Box's M statistic. It has been suggested, however, that linear discriminant analysis be used when covariances are equal, and that quadratic discriminant analysis may be used when covariances are not equal. Independence: Participants are assumed to be randomly sampled, and a participant's score on one variable is assumed to be independent of scores on that variable for all other participants. It has been suggested that discriminant analysis is relatively robust to slight violations of these assumptions, and it has also been shown that discriminant analysis may still be reliable when using dichotomous variables (where multivariate normality is often violated). == Discriminant functions == Discriminant analysis works by creating one or more linear combinations of predictors, creating a new latent variable for each function. These functions are called discriminant functions. The number of functions possible is either N g − 1 {\displaystyle N_{g}-1} where N g {\displaystyle N_{g}} = number of groups, or p {\displaystyle p} (the number of predictors), whichever is smaller. The first function created maximizes the differences between groups on that function. The second function maximizes differences on that function, but also must not be correlated with the previous function. This continues with subsequent functions with the requirement that the new function not be correlated with any of the previous functions. Given group j {\displaystyle j} , with R j {\displaystyle \mathbb {R} _{j}} sets of sample space, there is a discriminant rule such that if x ∈ R j {\displaystyle x\in \mathbb {R} _{j}} , then x ∈ j {\displaystyle x\in j} . Discriminant analysis then, finds “good” regions of R j {\displaystyle \mathbb {R} _{j}} to minimize classification error, therefore leading to a high percent correct classified in the classification table. Each function is given a discriminant score to determine how well it predicts group placement. Structure Corr

    Read more →
  • Margin-infused relaxed algorithm

    Margin-infused relaxed algorithm

    Margin-infused relaxed algorithm (MIRA) is a machine learning and online algorithm for multiclass classification problems. It is designed to learn a set of parameters (vector or matrix) by processing all the given training examples one-by-one and updating the parameters according to each training example, so that the current training example is classified correctly with a margin against incorrect classifications at least as large as their loss. The change of the parameters is kept as small as possible. A two-class version called binary MIRA simplifies the algorithm by not requiring the solution of a quadratic programming problem (see below). When used in a one-vs-all configuration, binary MIRA can be extended to a multiclass learner that approximates full MIRA, but may be faster to train. The flow of the algorithm looks as follows: The update step is then formalized as a quadratic programming problem: Find m i n ‖ w ( i + 1 ) − w ( i ) ‖ {\displaystyle min\|w^{(i+1)}-w^{(i)}\|} , so that s c o r e ( x t , y t ) − s c o r e ( x t , y ′ ) ≥ L ( y t , y ′ ) ∀ y ′ {\displaystyle score(x_{t},y_{t})-score(x_{t},y')\geq L(y_{t},y')\ \forall y'} , i.e. the score of the current correct training y {\displaystyle y} must be greater than the score of any other possible y ′ {\displaystyle y'} by at least the loss (number of errors) of that y ′ {\displaystyle y'} in comparison to y {\displaystyle y} .

    Read more →
  • Multimodal learning

    Multimodal learning

    Multimodal learning is a type of deep learning that integrates and processes multiple types of data, referred to as modalities, such as text, audio, images, or video. This integration allows for a more holistic understanding of complex data, improving model performance in tasks like visual question answering, cross-modal retrieval, text-to-image generation, aesthetic ranking, and image captioning. Multimodal learning was proposed in 2011 at the beginning of the deep learning period. Large multimodal models, such as Google Gemini and GPT-4o, have become increasingly popular since 2023, enabling increased versatility and a broader understanding of real-world phenomena. == Motivation == Data usually comes with different modalities which carry different information. For example, it is very common to caption an image to convey the information not presented in the image itself. Similarly, sometimes it is more straightforward to use an image to describe information which may not be obvious from text. As a result, if different words appear in similar images, then these words likely describe the same thing. Conversely, if a word is used to describe seemingly dissimilar images, then these images may represent the same object. Thus, in cases dealing with multi-modal data, it is important to use a model which is able to jointly represent the information such that the model can capture the combined information from different modalities. == Multimodal transformers == Models such as CLIP (Contrastive Language–Image Pretraining) learn joint representations of images and text by optimizing contrastive objectives, allowing the model to match images with their corresponding textual descriptions. == Multimodal deep Boltzmann machines == A Boltzmann machine is a type of stochastic neural network invented by Geoffrey Hinton and Terry Sejnowski in 1985. Boltzmann machines can be seen as the stochastic, generative counterpart of Hopfield nets. They are named after the Boltzmann distribution in statistical mechanics. The units in Boltzmann machines are divided into two groups: visible units and hidden units. Each unit is like a neuron with a binary output that represents whether it is activated or not. General Boltzmann machines allow connection between any units. However, learning is impractical using general Boltzmann Machines because the computational time is exponential to the size of the machine. A more efficient architecture is called restricted Boltzmann machine where connection is only allowed between hidden unit and visible unit, which is described in the next section. Multimodal deep Boltzmann machines can process and learn from different types of information, such as images and text, simultaneously. This can notably be done by having a separate deep Boltzmann machine for each modality, for example one for images and one for text, joined at an additional top hidden layer. == Applications == Multimodal machine learning has numerous applications across various domains: Cross-modal retrieval: cross-modal retrieval allows users to search for data across different modalities (e.g., retrieving images based on text descriptions), improving multimedia search engines and content recommendation systems. Classification and missing data retrieval: multimodal Deep Boltzmann Machines outperform traditional models like support vector machines and latent Dirichlet allocation in classification tasks and can predict missing data in multimodal datasets, such as images and text. Healthcare diagnostics: multimodal models integrate medical imaging, genomic data, and patient records to improve diagnostic accuracy and early disease detection, especially in cancer screening. Content generation: models like DALL·E generate images from textual descriptions, benefiting creative industries, while cross-modal retrieval enables dynamic multimedia searches. Robotics and human-computer interaction: multimodal learning improves interaction in robotics and AI by integrating sensory inputs like speech, vision, and touch, aiding autonomous systems and human-computer interaction. Emotion recognition: combining visual, audio, and text data, multimodal systems enhance sentiment analysis and emotion recognition, applied in customer service, social media, and marketing.

    Read more →
  • Sprite (computer graphics)

    Sprite (computer graphics)

    In computer graphics, a sprite is a two-dimensional bitmap that is integrated into a larger scene, most often in a 2D video game. Originally, the term sprite referred to fixed-sized objects composited together, by hardware, with a background. Use of the term has since become more general. Systems with hardware sprites include arcade video games of the 1970s and 1980s; game consoles including as the Atari VCS (1977), ColecoVision (1982), Famicom (1983), Genesis/Mega Drive (1988); and home computers such as the TI-99/4 (1979), Atari 8-bit computers (1979), Commodore 64 (1982), MSX (1983), Amiga (1985), and X68000 (1987). Hardware varies in the number of sprites supported, the size and colors of each sprite, and special effects such as scaling or reporting pixel-precise overlap. Hardware composition of sprites occurs as each scan line is prepared for the video output device, such as a cathode-ray tube, without involvement of the main CPU and without the need for a full-screen frame buffer. Sprites can be positioned or altered by setting attributes used during the hardware composition process. The number of sprites which can be displayed per scan line is often lower than the total number of sprites a system supports. For example, the Texas Instruments TMS9918 chip supports 32 sprites, but only four can appear on the same scan line. The CPUs in modern computers, video game consoles, and mobile devices are fast enough that bitmaps can be drawn into a frame buffer without special hardware assistance. Beyond that, GPUs can render vast numbers of scaled, rotated, anti-aliased, partially translucent, very high resolution images in parallel with the CPU. == Etymology == According to Karl Guttag, one of two engineers for the 1979 Texas Instruments TMS9918 video display processor, this use of the word sprite came from David Ackley, a manager at TI. It was also used by Danny Hillis at Texas Instruments in the late 1970s. The term was derived from the fact that sprites "float" on top of the background image without overwriting it, much like a ghost or mythological sprite. Some hardware manufacturers used different terms, especially before sprite became common: Player/Missile Graphics was a term used by Atari, Inc. for hardware sprites in the Atari 8-bit computers (1979) and Atari 5200 console (1982). The term reflects the use for both characters ("players") and smaller associated objects ("missiles") that share the same color. The earlier Atari Video Computer System and some Atari arcade games used player, missile, and ball. Stamp was used in some arcade hardware in the early 1980s, including Ms. Pac-Man. Movable Object Block, or MOB, was used in MOS Technology's graphics chip literature. Commodore, the main user of MOS chips and the owner of MOS for most of the chip maker's lifetime, instead used the term sprite for the Commodore 64. OBJs (short for objects) is used in the developer manuals for the NES, Super NES, and Game Boy. The region of video RAM used to store sprite attributes and coordinates is called OAM (Object Attribute Memory). This also applies to the Game Boy Advance and Nintendo DS. == History == === Arcade video games === The use of sprites originated with arcade video games. Nolan Bushnell came up with the original concept when he developed the first arcade video game, Computer Space (1971). Technical limitations made it difficult to adapt the early mainframe game Spacewar! (1962), which performed an entire screen refresh for every little movement, so he came up with a solution to the problem: controlling each individual game element with a dedicated transistor. The rockets were essentially hardwired bitmaps that moved around the screen independently of the background, an important innovation for producing screen images more efficiently and providing the basis for sprite graphics. The earliest video games to represent player characters as human player sprites were arcade sports video games, beginning with Taito's TV Basketball, released in April 1974 and licensed to Midway Manufacturing for release in North America. Designed by Tomohiro Nishikado, he wanted to move beyond simple Pong-style rectangles to character graphics, by rearranging the rectangle shapes into objects that look like basketball players and basketball hoops. Ramtek released another sports video game in October 1974, Baseball, which similarly displayed human-like characters. The Namco Galaxian arcade system board, for the 1979 arcade game Galaxian, displays animated, multi-colored sprites over a scrolling background. It became the basis for Nintendo's Radar Scope and Donkey Kong arcade hardware and home consoles such as the Nintendo Entertainment System. According to Steve Golson from General Computer Corporation, the term "stamp" was used instead of "sprite" at the time. === Home systems === Signetics devised the first chips capable of generating sprite graphics (referred to as objects by Signetics) for home systems. The Signetics 2636 video processors were first used in the 1978 1292 Advanced Programmable Video System and later in the 1979 Elektor TV Games Computer. The Atari VCS, released in 1977, has a hardware sprite implementation where five graphical objects can be moved independently of the game playfield. The term sprite was not in use at the time. The VCS's sprites are called movable objects in the programming manual, further identified as two players, two missiles, and one ball. These each consist of a single row of pixels that are displayed on a scan line. To produce a two-dimensional shape, the sprite's single-row bitmap is altered by software from one scan line to the next. The 1979 Atari 400 and 800 home computers have similar, but more elaborate, circuitry capable of moving eight single-color objects per scan line: four 8-bit wide players and four 2-bit wide missiles. Each is the full height of the display—a long, thin strip. DMA from a table in memory automatically sets the graphics pattern registers for each scan line. Hardware registers control the horizontal position of each player and missile. Vertical motion is achieved by moving the bitmap data within a player or missile's strip. The feature was called player/missile graphics by Atari. Texas Instruments developed the TMS9918 chip with sprite support for its 1979 TI-99/4 home computer. An updated version is used in the 1981 TI-99/4A. === In 2.5D and 3D games === Sprites remained popular with the rise of 2.5D games (those which recreate a 3D game space from a 2D map) in the late 1980s and early 1990s. A technique called billboarding allows 2.5D games to keep onscreen sprites rotated toward the player view at all times. Some 2.5D games, such as 1993's Doom, allow the same entity to be represented by different sprites depending on its rotation relative to the viewer, furthering the illusion of 3D. Fully 3D games usually present world objects as 3D models, but sprites are supported in some 3D game engines, such as GoldSrc and Unreal, and may be billboarded or locked to fixed orientations. Sprites remain useful for small details, particle effects, and other applications where the lack of a third dimension is not a major detriment. == Systems with hardware sprites == These are base hardware specs and do not include additional programming techniques, such as using raster interrupts to repurpose sprites mid-frame.

    Read more →
  • Mating pool

    Mating pool

    Mating pool is a concept used in evolutionary algorithms and means a population of parents for the next population. The mating pool is formed by candidate solutions that the selection operators deem to have the highest fitness in the current population. Solutions that are included in the mating pool are referred to as parents. Individual solutions can be repeatedly included in the mating pool, with individuals of higher fitness values having a higher chance of being included multiple times. Crossover operators are then applied to the parents, resulting in recombination of genes recognized as superior. Lastly, random changes in the genes are introduced through mutation operators, increasing the genetic variation in the gene pool. Those two operators improve the chance of creating new, superior solutions. A new generation of solutions is thereby created, the children, who will constitute the next population. Depending on the selection method, the total number of parents in the mating pool can be different to the size of the initial population, resulting in a new population that’s smaller. To continue the algorithm with an equally sized population, random individuals from the old populations can be chosen and added to the new population. At this point, the fitness value of the new solutions is evaluated. If the termination conditions are fulfilled, processes come to an end. Otherwise, they are repeated. The repetition of the steps result in candidate solutions that evolve towards the most optimal solution over time. The genes will become increasingly uniform towards the most optimal gene, a process called convergence. If 95% of the population share the same version of a gene, the gene has converged. When all the individual fitness values have reached the value of the best individual, i.e. all the genes have converged, population convergence is achieved. == Mating pool creation == Several methods can be applied to create a mating pool. All of these processes involve the selective breeding of a particular number of individuals within a population. There are multiple criteria that can be employed to determine which individuals make it into the mating pool and which are left behind. The selection methods can be split into three general types: fitness proportionate selection, ordinal based selection and threshold based selection. === Fitness proportionate selection === In the case of fitness proportionate selection, random individuals are selected to enter the pool. However, the ones with a higher level of fitness are more likely to be picked and therefore have a greater chance of passing on their features to the next generation. One of the techniques used in this type of parental selection is the roulette wheel selection. This approach divides a hypothetical circular wheel into different slots, the size of which is equal to the fitness values of each potential candidate. Afterwards, the wheel is rotated and a fixed point determines which individual gets picked. The greater the fitness value of an individual, the higher the probability of being chosen as a parent by the random spin of the wheel. Alternatively, stochastic universal sampling can be implemented. This selection method is also based on the rotation of a spinning wheel. However, in this case there is more than one fixed point and as a result all of the mating pool members will be selected simultaneously. === Ordinal based selection === The ordinal based selection methods include the tournament and ranking selection. Tournament selection involves the random selection of individuals of a population and the subsequent comparison of their fitness levels. The winners of these “tournaments” are the ones with the highest values and will be put into the mating pool as parents. In ranking selection all the individuals are sorted based on their fitness values. Then, the selection of the parents is made according to the rank of the candidates. Every individual has a chance of being chosen, but higher ranked ones are favored === Threshold based selection === The last type of selection method is referred to as the threshold based method. This includes the truncation selection method, which sorts individuals based on their phenotypic values on a specific trait and later selects the proportion of them that are within a certain threshold as parents.

    Read more →
  • Gaussian adaptation

    Gaussian adaptation

    Gaussian adaptation (GA), also called normal or natural adaptation (NA) is an evolutionary algorithm designed for the maximization of manufacturing yield due to statistical deviation of component values of signal processing systems. In short, GA is a stochastic adaptive process where a number of samples of an n-dimensional vector x[xT = (x1, x2, ..., xn)] are taken from a multivariate Gaussian distribution, N(m, M), having mean m and moment matrix M. The samples are tested for fail or pass. The first- and second-order moments of the Gaussian restricted to the pass samples are m and M. The outcome of x as a pass sample is determined by a function s(x), 0 < s(x) < q ≤ 1, such that s(x) is the probability that x will be selected as a pass sample. The average probability of finding pass samples (yield) is P ( m ) = ∫ s ( x ) N ( x − m ) d x {\displaystyle P(m)=\int s(x)N(x-m)\,dx} Then the theorem of GA states: For any s(x) and for any value of P < q, there always exist a Gaussian p. d. f. [ probability density function ] that is adapted for maximum dispersion. The necessary conditions for a local optimum are m = m and M proportional to M. The dual problem is also solved: P is maximized while keeping the dispersion constant (Kjellström, 1991). Proofs of the theorem may be found in the papers by Kjellström, 1970, and Kjellström & Taxén, 1981. Since dispersion is defined as the exponential of entropy/disorder/average information it immediately follows that the theorem is valid also for those concepts. Altogether, this means that Gaussian adaptation may carry out a simultaneous maximisation of yield and average information (without any need for the yield or the average information to be defined as criterion functions). The theorem is valid for all regions of acceptability and all Gaussian distributions. It may be used by cyclic repetition of random variation and selection (like the natural evolution). In every cycle a sufficiently large number of Gaussian distributed points are sampled and tested for membership in the region of acceptability. The centre of gravity of the Gaussian, m, is then moved to the centre of gravity of the approved (selected) points, m. Thus, the process converges to a state of equilibrium fulfilling the theorem. A solution is always approximate because the centre of gravity is always determined for a limited number of points. It was used for the first time in 1969 as a pure optimization algorithm making the regions of acceptability smaller and smaller (in analogy to simulated annealing, Kirkpatrick 1983). Since 1970 it has been used for both ordinary optimization and yield maximization. == Natural evolution and Gaussian adaptation == It has also been compared to the natural evolution of populations of living organisms. In this case s(x) is the probability that the individual having an array x of phenotypes will survive by giving offspring to the next generation; a definition of individual fitness given by Hartl 1981. The yield, P, is replaced by the mean fitness determined as a mean over the set of individuals in a large population. Phenotypes are often Gaussian distributed in a large population and a necessary condition for the natural evolution to be able to fulfill the theorem of Gaussian adaptation, with respect to all Gaussian quantitative characters, is that it may push the centre of gravity of the Gaussian to the centre of gravity of the selected individuals. This may be accomplished by the Hardy–Weinberg law. This is possible because the theorem of Gaussian adaptation is valid for any region of acceptability independent of the structure (Kjellström, 1996). In this case the rules of genetic variation such as crossover, inversion, transposition etcetera may be seen as random number generators for the phenotypes. So, in this sense Gaussian adaptation may be seen as a genetic algorithm. == How to climb a mountain == Mean fitness may be calculated provided that the distribution of parameters and the structure of the landscape is known. The real landscape is not known, but figure below shows a fictitious profile (blue) of a landscape along a line (x) in a room spanned by such parameters. The red curve is the mean based on the red bell curve at the bottom of figure. It is obtained by letting the bell curve slide along the x-axis, calculating the mean at every location. As can be seen, small peaks and pits are smoothed out. Thus, if evolution is started at A with a relatively small variance (the red bell curve), then climbing will take place on the red curve. The process may get stuck for millions of years at B or C, as long as the hollows to the right of these points remain, and the mutation rate is too small. If the mutation rate is sufficiently high, the disorder or variance may increase and the parameter(s) may become distributed like the green bell curve. Then the climbing will take place on the green curve, which is even more smoothed out. Because the hollows to the right of B and C have now disappeared, the process may continue up to the peaks at D. But of course the landscape puts a limit on the disorder or variability. Besides — dependent on the landscape — the process may become very jerky, and if the ratio between the time spent by the process at a local peak and the time of transition to the next peak is very high, it may as well look like a punctuated equilibrium as suggested by Gould (see Ridley). == Computer simulation of Gaussian adaptation == Thus far the theory only considers mean values of continuous distributions corresponding to an infinite number of individuals. In reality however, the number of individuals is always limited, which gives rise to an uncertainty in the estimation of m and M (the moment matrix of the Gaussian). And this may also affect the efficiency of the process. Unfortunately very little is known about this, at least theoretically. The implementation of normal adaptation on a computer is a fairly simple task. The adaptation of m may be done by one sample (individual) at a time, for example m(i + 1) = (1 – a) m(i) + ax where x is a pass sample, and a < 1 a suitable constant so that the inverse of a represents the number of individuals in the population. M may in principle be updated after every step y leading to a feasible point x = m + y according to: M(i + 1) = (1 – 2b) M(i) + 2byyT, where yT is the transpose of y and b << 1 is another suitable constant. In order to guarantee a suitable increase of average information, y should be normally distributed with moment matrix μ2M, where the scalar μ > 1 is used to increase average information (information entropy, disorder, diversity) at a suitable rate. But M will never be used in the calculations. Instead we use the matrix W defined by WWT = M. Thus, we have y = Wg, where g is normally distributed with the moment matrix μU, and U is the unit matrix. W and WT may be updated by the formulas W = (1 – b)W + bygT and WT = (1 – b)WT + bgyT because multiplication gives M = (1 – 2b)M + 2byyT, where terms including b2 have been neglected. Thus, M will be indirectly adapted with good approximation. In practice it will suffice to update W only W(i + 1) = (1 – b)W(i) + bygT. This is the formula used in a simple 2-dimensional model of a brain satisfying the Hebbian rule of associative learning; see the next section (Kjellström, 1996 and 1999). The figure below illustrates the effect of increased average information in a Gaussian p.d.f. used to climb a mountain Crest (the two lines represent the contour line). Both the red and green cluster have equal mean fitness, about 65%, but the green cluster has a much higher average information making the green process much more efficient. The effect of this adaptation is not very salient in a 2-dimensional case, but in a high-dimensional case, the efficiency of the search process may be increased by many orders of magnitude. == The evolution in the brain == In the brain the evolution of DNA-messages is supposed to be replaced by an evolution of signal patterns and the phenotypic landscape is replaced by a mental landscape, the complexity of which will hardly be second to the former. The metaphor with the mental landscape is based on the assumption that certain signal patterns give rise to a better well-being or performance. For instance, the control of a group of muscles leads to a better pronunciation of a word or performance of a piece of music. In this simple model it is assumed that the brain consists of interconnected components that may add, multiply and delay signal values. A nerve cell kernel may add signal values, a synapse may multiply with a constant and An axon may delay values. This is a basis of the theory of digital filters and neural networks consisting of components that may add, multiply and delay signalvalues and also of many brain models, Levine 1991. In the figure below the brain stem is supposed to deliver Gaussian distributed signal patterns. This may be possible since certai

    Read more →
  • Neural Networks (journal)

    Neural Networks (journal)

    Neural Networks is a monthly peer-reviewed scientific journal and an official journal of the International Neural Network Society, European Neural Network Society, and Japanese Neural Network Society. == History == The journal was established in 1988 and is published by Elsevier. It covers all aspects of research on artificial neural networks. The founding editor-in-chief was Stephen Grossberg (Boston University). The current editors-in-chief are DeLiang Wang (Ohio State University) and Taro Toyoizumi (RIKEN Center for Brain Science). == Abstracting and indexing == The journal is abstracted and indexed in Scopus and the Science Citation Index Expanded. According to the Journal Citation Reports, the journal has a 2022 impact factor of 7.8.

    Read more →
  • Autocommit

    Autocommit

    In the context of data management, autocommit is a mode of operation of a database connection. Each individual database interaction (i.e., each SQL statement) submitted through the database connection in autocommit mode will be executed in its own transaction that is implicitly committed. A SQL statement executed in autocommit mode cannot be rolled back. Autocommit mode incurs per-statement transaction overhead and can often lead to undesirable performance or resource utilization impact on the database. Nonetheless, in systems such as Microsoft SQL Server, as well as connection technologies such as ODBC and Microsoft OLE DB, autocommit mode is the default for all statements that change data, in order to ensure that individual statements will conform to the ACID (atomicity-consistency-isolation-durability) properties of transactions. The alternative to autocommit mode (non-autocommit) means that the SQL client application itself is responsible for ending transactions explicitly via the commit or rollback SQL commands. Non-autocommit mode enables grouping of multiple data manipulation SQL commands into a single atomic transaction. Some DBMS (e.g. MariaDB) force autocommit for every DDL statement, even in non-autocommit mode. In this case, before each DDL statement, previous DML statements in transaction are autocommitted. Each DDL statement is executed in its own new autocommit transaction.

    Read more →
  • Prototype methods

    Prototype methods

    Prototype methods are machine learning methods that use data prototypes. A data prototype is a data value that reflects other values in its class, e.g., the centroid in a K-means clustering problem. == Methods == The following are some prototype methods K-means clustering Learning vector quantization (LVQ) Gaussian mixtures == Related Methods == While K-nearest neighbor's does not use prototypes, it is similar to prototype methods like K-means clustering.

    Read more →
  • Structured kNN

    Structured kNN

    Structured k-nearest neighbours (SkNN) is a machine learning algorithm that generalizes k-nearest neighbors (k-NN). k-NN supports binary classification, multiclass classification, and regression, whereas SkNN allows training of a classifier for general structured output. For instance, a data sample might be a natural language sentence, and the output could be an annotated parse tree. Training a classifier consists of showing many instances of ground truth sample-output pairs. After training, the SkNN model is able to predict the corresponding output for new, unseen sample instances; that is, given a natural language sentence, the classifier can produce the most likely parse tree. == Training == As a training set, SkNN accepts sequences of elements with class labels. The type of element does not matter; the only requirement is a defined metric function that gives a distance between each pair of elements of a set. SkNN is based on idea of creating a graph, with each node representing a class label. There is an edge between a pair of nodes if there is a sequence of two elements in the training set with corresponding classes. The first step of SkNN training is the construction of such a graph from training sequences. There are two special nodes in the graph corresponding to sentence beginnings and ends: if a sequence starts with class C, the edge between node START and node C should be created. Like regular k-NN, the second part of SkNN training consists of storing the elements of a training sequence in a certain way. Each element of the training sequences is stored in the node related to the class of the previous element in the sequence. Every first element is stored in the START node. == Inference == Labelling input sequences by SkNN consists of finding the sequence of transitions in the graph, starting from node START. Each transition corresponds to a single element of the input sequence. As a result, the label of each element is determined as the target node label of the transition. The cost of the path is defined as the sum of all transitions, with the cost of transition from node A to node B being the distance from the current input sequence element to the nearest element of class B, stored in node A. Determining an optimal path may be performed using a modified Viterbi algorithm (where the sum of the distances is minimized, unlike the original algorithm which maximizes the product of probabilities).

    Read more →