AI Chatbot Robot

AI Chatbot Robot — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Operation Serenata de Amor

    Operation Serenata de Amor

    Operation Serenata de Amor is an artificial intelligence project designed to analyze public spending in Brazil. The project has been funded by a recurrent financing campaign since September 7, 2016, and came in the wake of major scandals of misappropriation of public funds in Brazil, such as the Mensalão scandal and what was revealed in the Operation Car Wash investigations. The analysis began with data from the National Congress then expanded to other types of budget and instances of government, such as the Federal Senate. The project is built through collaboration on GitHub and using a public group with more than 600 participants on Telegram. The name "Serenata de Amor," which means "serenade of love," was taken from a popular cashew cream bonbon produced by Chocolates Garoto in Brazil. == Modules == Throughout development of the project, new modules have been newly introduced in addition to the main repository: The main repository, serenata-de-amor, serves as the starting point for investigative work. Rosie is the robot programmed to identify public funds expenses with discrepancies, starting with CEAP (Quota for Exercise of Parliamentary Activity); it analyzes each of the reimbursements requested by the deputies and senators, indicating the reasons that lead it to believe they are suspicious. From Rosie was born whistleblower, which tweets under the name of @RosieDaSerenata, distributing the results found on social media. Jarbas (Github repository) is a data visualization tool which shows a complete list of reimbursements made available by the Chamber of Deputies and mined by Rosie. Toolbox is a Python installable package that supports the development of Serenata de Amor and Rosie. == History == Operation Serenata de Amor is an Artificial intelligence project for analysis of public expenditures. It was conceived in March 2016 by data scientist Irio Musskopf, sociologist Eduardo Cuducos and entrepreneur Felipe Cabral. The project was financed collectively in the Catarse platform, where it reached 131% of the collection goal paying 3 months of project development. Ana Schwendler, also a data scientist, Pedro Vilanova "Tonny", data journalist, Bruno Pazzim, software engineer, Filipe Linhares, a frontend engineer, Leandro Devegili, an entrepreneur and André Pinho took the first steps towards constructing the platform, such as collecting and structuring the first datasets. Jessica Temporal, data scientist and Yasodara Córdova "Yaso", researcher, Tatiana Balachova "Russa", UX designer, joined the project after the financing took place. The members created a recurring financing campaign, expanding the analysis of public spending to the Federal Senate. Donors make monthly payments ranging from 5 BRL to 200 BRL to maintain group activities. The monthly amount collected is around 10,000 BRL. == Results == In January 2017, concluding the period financed by the initial campaign, the group carried out an investigation into the suspicious activities found by the data analysis system. 629 complaints were made to the Ombudsman's Office of the Chamber of Deputies, questioning expenses of 216 federal deputies. In addition, the Facebook project page has more than 25,000 followers, and users frequently cite the operation as a benchmark in transparency in the Brazilian government. One of the examples of results obtained by the operation is the case of the Deputy who had to return about 700 BRL to the House after his expenses were analyzed by the platform. The platform was able to analyze more than 3 million notes, raising about 8,000 suspected cases in public spending. The community that supports the work of the team benefits from open source repositories, with licenses open for the collaboration. So much so that the two main data scientists of the project presented it at the CivicTechFest in Taipei, obtaining several mentions even in the international press. The technical leader presented the project in Poland during DevConf2017 in Kraków. It was also presented in the Google News Lab in 2017. It was presented by Yaso, when she was the Director of the initiative, at the MIT Media Lab/Berkman Klein Center Initiative for Artificial Intelligence ethics, and at the Artificial Intelligence and Inclusion Symposium, an initiative of the Global Network of Internet & Society Centers (NoC). It was also presented both by Irio and Yaso at the Digital Harvard Kennedy School, over a lunch seminar, where the transparency of the platform and the main solutions found were discussed, so that the code and data are always available to verify its suitability. This infographic provides information about the first results of Operation Serenata de Amor, a project that analyzes open data on public spending to find discrepancies. The project was presented by Yaso to the House Audit and Control Committee of the Chamber of Deputies in August 2017, and raised the interest of House officials who work with open data. The operation has been a source of inspiration for other civic projects that aim to work with similar goals, demonstrating the broader impact of artificial intelligence also in industry in Brazil. Participation of several team members in events throughout Brazil and abroad can be found on the Internet, such as presentation at OpenDataDay, held at Calango Hackerspace in the Federal District, Campus Party Bahia, Campus Party Brasilia, Friends of Tomorrow, XIII National Meeting of Internal Control, in the event USP Talks Hackfest against corruption in João Pessoa, the latter being also highlighted in the National Press.

    Read more →
  • JOONE

    JOONE

    JOONE (Java Object Oriented Neural Engine) is a component based neural network framework built in Java. == Features == Joone consists of a component-based architecture based on linkable components that can be extended to build new learning algorithms and neural networks architectures. Components are plug-in code modules that are linked to produce an information flow. New components can be added and reused. Beyond simulation, Joone also has to some extent multi-platform deployment capabilities. Joone has a GUI Editor to graphically create and test any neural network, and a distributed training environment that allows for neural networks to be trained on multiple remote machines. == Comparison == As of 2010, Joone, Encog and Neuroph are the major free component based neural network development environment available for the Java platform. Unlike the two other (commercial) systems that are in existence, Synapse and NeuroSolutions, it is written in Java and has direct cross-platform support. A limited number of components exist and the graphical development environment is rudimentary so it has significantly fewer features than its commercial counterparts. Joone can be considered to be more of a neural network framework than a full integrated development environment. Unlike its commercial counterparts, it has a strong focus on code-based development of neural networks rather than visual construction. While in theory Joone can be used to construct a wider array of adaptive systems (including those with non-adaptive elements), its focus is on backpropagation based neural networks.

    Read more →
  • Dynamic topic model

    Dynamic topic model

    Within statistics, Dynamic topic models' are generative models that can be used to analyze the evolution of (unobserved) topics of a collection of documents over time. This family of models was proposed by David Blei and John Lafferty and is an extension to Latent Dirichlet Allocation (LDA) that can handle sequential documents. In LDA, both the order the words appear in a document and the order the documents appear in the corpus are oblivious to the model. Whereas words are still assumed to be exchangeable, in a dynamic topic model the order of the documents plays a fundamental role. More precisely, the documents are grouped by time slice (e.g.: years) and it is assumed that the documents of each group come from a set of topics that evolved from the set of the previous slice. == Topics == Similarly to LDA and pLSA, in a dynamic topic model, each document is viewed as a mixture of unobserved topics. Furthermore, each topic defines a multinomial distribution over a set of terms. Thus, for each word of each document, a topic is drawn from the mixture and a term is subsequently drawn from the multinomial distribution corresponding to that topic. The topics, however, evolve over time. For instance, the two most likely terms of a topic at time t could be "network" and "Zipf" (in descending order) while the most likely ones at time t+1 could be "Zipf" and "percolation" (in descending order). == Model == Define α t {\displaystyle \alpha _{t}} as the per-document topic distribution at time t. β t , k {\displaystyle \beta _{t,k}} as the word distribution of topic k at time t. η t , d {\displaystyle \eta _{t,d}} as the topic distribution for document d in time t, z t , d , n {\displaystyle z_{t,d,n}} as the topic for the nth word in document d in time t, and w t , d , n {\displaystyle w_{t,d,n}} as the specific word. In this model, the multinomial distributions α t + 1 {\displaystyle \alpha _{t+1}} and β t + 1 , k {\displaystyle \beta _{t+1,k}} are generated from α t {\displaystyle \alpha _{t}} and β t , k {\displaystyle \beta _{t,k}} , respectively. Even though multinomial distributions are usually written in terms of the mean parameters, representing them in terms of the natural parameters is better in the context of dynamic topic models. The former representation has some disadvantages due to the fact that the parameters are constrained to be non-negative and sum to one. When defining the evolution of these distributions, one would need to assure that such constraints were satisfied. Since both distributions are in the exponential family, one solution to this problem is to represent them in terms of the natural parameters, that can assume any real value and can be individually changed. Using the natural parameterization, the dynamics of the topic model are given by β t , k | β t − 1 , k ∼ N ( β t − 1 , k , σ 2 I ) {\displaystyle \beta _{t,k}|\beta _{t-1,k}\sim N(\beta _{t-1,k},\sigma ^{2}I)} and α t | α t − 1 ∼ N ( α t − 1 , δ 2 I ) {\displaystyle \alpha _{t}|\alpha _{t-1}\sim N(\alpha _{t-1},\delta ^{2}I)} . The generative process at time slice 't' is therefore: Draw topics β t , k | β t − 1 , k ∼ N ( β t − 1 , k , σ 2 I ) ∀ k {\displaystyle \beta _{t,k}|\beta _{t-1,k}\sim N(\beta _{t-1,k},\sigma ^{2}I)\forall k} Draw mixture model α t | α t − 1 ∼ N ( α t − 1 , δ 2 I ) {\displaystyle \alpha _{t}|\alpha _{t-1}\sim N(\alpha _{t-1},\delta ^{2}I)} For each document: Draw η t , d ∼ N ( α t , a 2 I ) {\displaystyle \eta _{t,d}\sim N(\alpha _{t},a^{2}I)} For each word: Draw topic Z t , d , n ∼ Mult ( π ( η t , d ) ) {\displaystyle Z_{t,d,n}\sim {\textrm {Mult}}(\pi (\eta _{t,d}))} Draw word W t , d , n ∼ Mult ( π ( β t , Z t , d , n ) ) {\displaystyle W_{t,d,n}\sim {\textrm {Mult}}(\pi (\beta _{t,Z_{t,d,n}}))} where π ( x ) {\displaystyle \pi (x)} is a mapping from the natural parameterization x to the mean parameterization, namely π ( x i ) = exp ⁡ ( x i ) ∑ i exp ⁡ ( x i ) {\displaystyle \pi (x_{i})={\frac {\exp(x_{i})}{\sum _{i}\exp(x_{i})}}} . == Inference == In the dynamic topic model, only W t , d , n {\displaystyle W_{t,d,n}} is observable. Learning the other parameters constitutes an inference problem. Blei and Lafferty argue that applying Gibbs sampling to do inference in this model is more difficult than in static models, due to the nonconjugacy of the Gaussian and multinomial distributions. They propose the use of variational methods, in particular, the Variational Kalman Filtering and the Variational Wavelet Regression. == Applications == In the original paper, a dynamic topic model is applied to the corpus of Science articles published between 1881 and 1999 aiming to show that this method can be used to analyze the trends of word usage inside topics. The authors also show that the model trained with past documents is able to fit documents of an incoming year better than LDA. A continuous dynamic topic model was developed by Wang et al. and applied to predict the timestamp of documents. Going beyond text documents, dynamic topic models were used to study musical influence, by learning musical topics and how they evolve in recent history.

    Read more →
  • Vector quantization

    Vector quantization

    Vector quantization (VQ) is a classical quantization technique from signal processing that allows the modeling of probability density functions by the distribution of prototype vectors. Developed in the early 1980s by Robert M. Gray, it was originally used for data compression. It works by dividing a large set of points (vectors) into groups having approximately the same number of points closest to them. Each group is represented by its centroid point, as in k-means and some other clustering algorithms. In simpler terms, vector quantization chooses a set of points to represent a larger set of points. The density matching property of vector quantization is powerful, especially for identifying the density of large and high-dimensional data. Since data points are represented by the index of their closest centroid, commonly occurring data have low error, and rare data high error. This is why VQ is suitable for lossy data compression. It can also be used for lossy data correction and density estimation. Vector quantization is based on the competitive learning paradigm, so it is closely related to the self-organizing map model and to sparse coding models used in deep learning algorithms such as autoencoder. == Training == One simple training algorithm for vector quantization is: Pick a sample point at random Move the nearest quantization vector centroid towards this sample point, by a small fraction of the distance Repeat A more sophisticated algorithm reduces the bias in the density matching estimation and ensures that all points are used, by including an extra sensitivity parameter: Increase each centroid's sensitivity s i {\displaystyle s_{i}} by a small amount Pick a sample point P {\displaystyle P} at random For each quantization vector centroid c i {\displaystyle c_{i}} , let d ( P , c i ) {\displaystyle d(P,c_{i})} denote the distance of P {\displaystyle P} and c i {\displaystyle c_{i}} Find the centroid c i {\displaystyle c_{i}} for which d ( P , c i ) − s i {\displaystyle d(P,c_{i})-s_{i}} is the smallest Move c i {\displaystyle c_{i}} towards P {\displaystyle P} by a small fraction of the distance Set s i {\displaystyle s_{i}} to zero Repeat It is desirable to use a cooling schedule to produce convergence: see Simulated annealing. Another simple method is LBG, which is based on k-means. The algorithm can be iteratively updated with "live" data, rather than by picking random points from a data set, but this will introduce some bias if the data are temporally correlated over many samples. == Applications == Vector quantization is used for lossy data compression, lossy data correction, pattern recognition, density estimation and clustering. Lossy data correction, or prediction, is used to recover data missing from some dimensions. It is done by finding the nearest group with the data dimensions available, then predicting the result based on the values for the missing dimensions, assuming that they will have the same value as the group's centroid. For density estimation, the area/volume that is closer to a particular centroid than to any other is inversely proportional to the density (due to the density matching property of the algorithm). === Use in data compression === Vector quantization, also called "block quantization" or "pattern matching quantization" is often used in lossy data compression. It works by encoding values from a multidimensional vector space into a finite set of values from a discrete subspace of lower dimension. A lower-space vector requires less storage space, so the data is compressed. Due to the density matching property of vector quantization, the compressed data has errors that are inversely proportional to density. The transformation is usually done by projection or by using a codebook. In some cases, a codebook can be also used to entropy code the discrete value in the same step, by generating a prefix coded variable-length encoded value as its output. The set of discrete amplitude levels is quantized jointly rather than each sample being quantized separately. Consider a k-dimensional vector [ x 1 , x 2 , . . . , x k ] {\displaystyle [x_{1},x_{2},...,x_{k}]} of amplitude levels. It is compressed by choosing the nearest matching vector from a set of n-dimensional vectors [ y 1 , y 2 , . . . , y n ] {\displaystyle [y_{1},y_{2},...,y_{n}]} , with n < k. All possible combinations of the n-dimensional vector [ y 1 , y 2 , . . . , y n ] {\displaystyle [y_{1},y_{2},...,y_{n}]} form the vector space to which all the quantized vectors belong. Only the index of the codeword in the codebook is sent instead of the quantized values. This conserves space and achieves more compression. Twin vector quantization (VQF) is part of the MPEG-4 standard dealing with time domain weighted interleaved vector quantization. === Video codecs based on vector quantization === Bink video Cinepak Daala is transform-based but uses pyramid vector quantization on transformed coefficients Digital Video Interactive: Production-Level Video and Real-Time Video Indeo Microsoft Video 1 QuickTime: Apple Video (RPZA) and Graphics Codec (SMC) Sorenson SVQ1 and SVQ3 Smacker video VQA format, used in many games The usage of video codecs based on vector quantization has declined significantly in favor of those based on motion compensated prediction combined with transform coding, e.g. those defined in MPEG standards, as the low decoding complexity of vector quantization has become less relevant. === Audio codecs based on vector quantization === AMR-WB+ CELP CELT (now part of Opus) is transform-based but uses pyramid vector quantization on transformed coefficients Codec 2 DTS G.729 iLBC Ogg Vorbis TwinVQ === Use in pattern recognition === VQ was also used in the eighties for speech and speaker recognition. Recently it has also been used for efficient nearest neighbor search and on-line signature recognition. In pattern recognition applications, one codebook is constructed for each class (each class being a user in biometric applications) using acoustic vectors of this user. In the testing phase the quantization distortion of a testing signal is worked out with the whole set of codebooks obtained in the training phase. The codebook that provides the smallest vector quantization distortion indicates the identified user. The main advantage of VQ in pattern recognition is its low computational burden when compared with other techniques such as dynamic time warping (DTW) and hidden Markov model (HMM). The main drawback when compared to DTW and HMM is that it does not take into account the temporal evolution of the signals (speech, signature, etc.) because all the vectors are mixed up. In order to overcome this problem a multi-section codebook approach has been proposed. The multi-section approach consists of modelling the signal with several sections (for instance, one codebook for the initial part, another one for the center and a last codebook for the ending part). === Use as clustering algorithm === As VQ is seeking for centroids as density points of nearby lying samples, it can be also directly used as a prototype-based clustering method: each centroid is then associated with one prototype. By aiming to minimize the expected squared quantization error and introducing a decreasing learning gain fulfilling the Robbins-Monro conditions, multiple iterations over the whole data set with a concrete but fixed number of prototypes converges to the solution of k-means clustering algorithm in an incremental manner. === Generative adversarial networks (GAN) === VQ has been used to quantize a feature representation layer in the discriminator of generative adversarial networks. The feature quantization (FQ) technique performs implicit feature matching. It improves the GAN training, and yields an improved performance on a variety of popular GAN models: BigGAN for image generation, StyleGAN for face synthesis, and U-GAT-IT for unsupervised image-to-image translation.

    Read more →
  • Learning rate

    Learning rate

    In machine learning and statistics, the learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function. Since it influences to what extent newly acquired information overrides old information, it metaphorically represents the speed at which a machine learning model "learns". In the adaptive control literature, the learning rate is commonly referred to as gain. In setting a learning rate, there is a trade-off between the rate of convergence and overshooting. While the descent direction is usually determined from the gradient of the loss function, the learning rate determines how big a step is taken in that direction. Too high a learning rate will make the learning jump over minima, but too low a learning rate will either take too long to converge or get stuck in an undesirable local minimum. In order to achieve faster convergence, prevent oscillations and getting stuck in undesirable local minima the learning rate is often varied during training either in accordance to a learning rate schedule or by using an adaptive learning rate. The learning rate and its adjustments may also differ per parameter, in which case it is a diagonal matrix that can be interpreted as an approximation to the inverse of the Hessian matrix in Newton's method. The learning rate is related to the step length determined by inexact line search in quasi-Newton methods and related optimization algorithms. == Learning rate schedule == Initial rate can be left as system default or can be selected using a range of techniques. A learning rate schedule changes the learning rate during learning and is most often changed between epochs/iterations. This is mainly done with two parameters: decay and momentum. There are many different learning rate schedules but the most common are time-based, step-based and exponential. Decay serves to settle the learning in a nice place and avoid oscillations, a situation that may arise when too high a constant learning rate makes the learning jump back and forth over a minimum, and is controlled by a hyperparameter. Momentum is analogous to a ball rolling down a hill; we want the ball to settle at the lowest point of the hill (corresponding to the lowest error). Momentum both speeds up the learning (increasing the learning rate) when the error cost gradient is heading in the same direction for a long time and also avoids local minima by 'rolling over' small bumps. Momentum is controlled by a hyperparameter analogous to a ball's mass which must be chosen manually—too high and the ball will roll over minima which we wish to find, too low and it will not fulfil its purpose. The formula for factoring in the momentum is more complex than for decay but is most often built in with deep learning libraries such as Keras. Time-based learning schedules alter the learning rate depending on the learning rate of the previous time iteration. Factoring in the decay the mathematical formula for the learning rate is: η n + 1 = η 0 1 + d n {\displaystyle \eta _{n+1}={\frac {\eta _{0}}{1+dn}}} where η {\displaystyle \eta } is the learning rate, η 0 {\displaystyle \eta _{0}} is the original learning rate, d {\displaystyle d} is a decay parameter and n {\displaystyle n} is the iteration step. Step-based learning schedules changes the learning rate according to some predefined steps. The decay application formula is here defined as: η n = η 0 d ⌊ 1 + n r ⌋ {\displaystyle \eta _{n}=\eta _{0}d^{\left\lfloor {\frac {1+n}{r}}\right\rfloor }} where η n {\displaystyle \eta _{n}} is the learning rate at iteration n {\displaystyle n} , η 0 {\displaystyle \eta _{0}} is the initial learning rate, d {\displaystyle d} is how much the learning rate should change at each drop (0.5 corresponds to a halving) and r {\displaystyle r} corresponds to the drop rate, or how often the rate should be dropped (10 corresponds to a drop every 10 iterations). The floor function ( ⌊ … ⌋ {\displaystyle \lfloor \dots \rfloor } ) here drops the value of its input to 0 for all values smaller than 1. Exponential learning schedules are similar to step-based, but instead of steps, a decreasing exponential function is used. The mathematical formula for factoring in the decay is: η n = η 0 e − d n {\displaystyle \eta _{n}=\eta _{0}e^{-dn}} where d {\displaystyle d} is a decay parameter. == Adaptive learning rate == The issue with learning rate schedules is that they all depend on hyperparameters that must be manually chosen for each given learning session and may vary greatly depending on the problem at hand or the model used. To combat this, there are many different types of adaptive gradient descent algorithms such as Adagrad, Adadelta, RMSprop, and Adam which are generally built into deep learning libraries such as Keras.

    Read more →
  • Ellen Voorhees

    Ellen Voorhees

    Ellen Marie Voorhees (born March 13, 1958) is an American computer scientist known for her work in document retrieval, information retrieval, and natural language processing. She works in the retrieval group at the National Institute of Standards and Technology (NIST). == Education and career == Voorhees was born in Bensalem Township, Pennsylvania, and was the 1976 valedictorian at Bensalem High School. She completed her undergraduate studies at Pennsylvania State University, graduating in 1979 with a bachelor's degree in computer science. She attended Cornell University, where she received her master's degree and then went on to complete her Ph.D. in 1985. Her dissertation, The Effectiveness and Efficiency of Agglomerative Hierarchic Clustering in Document Retrieval, was supervised by Gerard Salton. Prior to joining NIST, she was a senior member of the technical staff at Siemens Corporate Research in Princeton, New Jersey. == Recognition == Voorhees was elected as an ACM Fellow in 2018 for "contributions in evaluation of information retrieval, question answering, and other language technologies". In 2023, Voorhees was awarded an honorary Doctor of Science degree from the University of Glasgow in recognition of her body of work in the evaluation of information retrieval, question answering, and other language technologies. In 2024, Voorhees received the Gerard Salton Award, a lifetime achievement award given by ACM's Special Interest Group on Information Retrieval (SIGIR).

    Read more →
  • Co-occurrence

    Co-occurrence

    In linguistics, co-occurrence or cooccurrence (in older texts often shown with diacritic as coöccurrence) is an above-chance frequency of ordered occurrence of two adjacent terms in a text corpus. Co-occurrence in this linguistic sense can be interpreted as an indicator of semantic proximity or an idiomatic expression. Corpus linguistics and its statistical analyses can reveal (regularity of) patterns of co-occurrences within a language and enable the working out of typical collocations for its lexical items. A co-occurrence restriction is identified when linguistic elements never occur together. Analysis of these restrictions can lead to discoveries about the structure and development of a language. Co-occurrence can be seen an extension of word counting in higher dimensions. Co-occurrence can be quantitatively described using measures like a massive correlation or mutual information. Co-occurrence information and knowledge of co-occurring words may be relevant in analysis of language for the purposes of large language models, part of the emerging field of artificial intelligence, and helpful in word games such as scrabble.

    Read more →
  • NovelAI

    NovelAI

    NovelAI is an online cloud-based, SaaS model, and a paid subscription service for AI-assisted storywriting and text-to-image synthesis, originally launched in beta on June 15, 2021, with the image generation feature being implemented later on October 3, 2022. NovelAI is owned and operated by Anlatan, which is headquartered in Wilmington, Delaware. == Features == NovelAI uses GPT-based large language models (LLMs) to generate storywriting and prose. It has several models, such as Calliope, Sigurd, Euterpe, Krake, and Genji, with Genji being a Japanese-language model. The service also offers encrypted servers and customizable editors. For AI art generation, which generates images from text prompts, NovelAI uses a custom version of the source-available Stable Diffusion text-to-image diffusion model called NovelAI Diffusion, which is trained on a Danbooru-based dataset. NovelAI is also capable of generating a new image based on an existing image. The NovelAI terms of service states that all generated content belongs to the user, regardless if the user is an individual or a corporation. Anlatan states that generated images are not stored locally on their servers. == History == On April 28, 2021, Anlatan officially launched NovelAI. On June 15, 2021, Anlatan released their finetuned GPT-Neo-2.7B model from EleutherAI named Calliope, after the Greek Muses. A day later, they released their Opus-exclusive GPT-J-6B finetuned model named Sigurd, after the Norse/Germanic hero. On March 21, 2023, Nvidia and CoreWeave announced Anlatan being one of the first CoreWeave customers to deploy NVIDIA's H100 Tensor Core GPUs for new LLM model inferencing and training. On April 1, 2023, Anlatan added ControlNet features to their text-to-image NovelAI Diffusion model. On May 16, 2023, Anlatan announced that they named their H100 cluster Shoggy, a reference to H.P. Lovecraft's Shoggoths, which was used to pre-train an undisclosed 8192 token context LLM in-house model. == Reception and controversy == Following the implementation of image generation, NovelAI became a widely-discussed topic in Japan, with some online commentators noting that its image synthesis features are very adept at producing close impressions of anime characters, including lolicon and shotacon imagery, while others have expressed concern that it is a paid service reliant on a diffusion model, while the original machine learning training data consists of images used without the consent of the original artists. Attorney Kosuke Terauchi notes that, since a revision of the law in 2018, it is no longer illegal in Japan for machine learning models to scrape copyrighted content from the internet to use as training data; meanwhile, in the United States where NovelAI is based, there is no specific legal framework which regulates machine learning, and thus the fair use doctrine of US copyright law applies instead. Danbooru has posted an official statement in regards to NovelAI's use of the site's content for AI training, expressing that Danbooru is not affiliated with NovelAI, and does not endorse nor condone NovelAI's use of artists' artworks for machine learning. FayerWayer described NovelAI as a service capable of generating hentai. Manga artist Izumi Ū commented that while the manga style art generated by NovelAI is highly accurate, there are still imperfections in the output, although he views these as human-like in a favourable light nonetheless. In response to the topic of NovelAI, Narugami, founder of the Japanese freelance artist commissioning website Skeb, stated on October 5, 2022 that the use of AI image generation is prohibited on the platform since 2018. Illustrations using NovelAI have been posted on social media and illustration posting sites, and by October 13, 2,111 works tagged with #NovelAI were posted on Pixiv. Pixiv has stated that it is not considering a complete elimination of creations that use AI, though it requires AI-generated posts to be marked as such and allows users to filter them out. == Incidents == On October 6, 2022, NovelAI experienced a data breach where its software's source code was leaked.

    Read more →
  • Test data

    Test data

    Test data are sets of inputs or information used to verify the correctness, performance, and reliability of software systems. Test data encompass various types, such as positive and negative scenarios, edge cases, and realistic user scenarios, and aims to exercise different aspects of the software to uncover bugs and validate its behavior. Test data is also used in regression testing to verify that new code changes or enhancements do not introduce unintended side effects or break existing functionalities. == Background == Test data may be used to verify that a given set of inputs to a function produces an expected result. Alternatively, data can be used to challenge the program's ability to handle unusual, extreme, exceptional, or unexpected inputs. Test data can be produced in a focused or systematic manner, as is typically the case in domain testing, or through less focused approaches, such as high-volume randomized automated tests. Test data can be generated by the tester or by a program or function that assists the tester. It can be recorded for reuse or used only once. Test data may be created manually, using data generation tools (often based on randomness), or retrieved from an existing production environment. The data set may consist of synthetic (fake) data, but ideally, it should include representative (real) data. == Limitations == Due to privacy regulations such as GDPR, PCI, and the HIPAA, the use of privacy-sensitive personal data for testing is restricted. However, anonymized (and preferably subsetted) production data may be used as representative data for testing and development. Programmers may also choose to generate synthetic data as an alternative to using real or anonymized data. While synthetic data can offer significant advantages, such as enhanced privacy and flexibility, it also comes with limitations. For instance, generating synthetic data that accurately reflects real-world complexity can be challenging. There is also a risk of synthetic data not fully capturing the nuances of real data, potentially leading to gaps in test coverage. == Domain testing == Domain testing is a set of techniques focusing on test data. This includes identifying critical inputs, values at the boundaries between equivalence classes, and combinations of inputs that drive the system toward specific outputs. Domain testing helps ensure that various scenarios are effectively tested, including edge cases and unusual conditions.

    Read more →
  • Datacap

    Datacap

    Datacap (an IBM Company), a privately owned company, manufactures and sells computer software, and services. Datacap's first product, Paper Keyboard, was a "forms processing" product and shipped in 1989. In August 2010, IBM announced that it had acquired Datacap for an undisclosed amount. == Overview == Datacap sells products through a value-added distribution network worldwide. The software is classified as "enterprise software", meaning that it requires trained professionals to install and configure. Although the Company has focused on providing solutions for scanning paper documents, most recently Company materials have emphasized customer requirements to handle electronic documents ("eDocs"), documents being received into an organization electronically (usually email). Datacap claims that its software is unique because of the rules engine ("Rulerunner") used for processing inbound documents, including performing the image processing (deskew, noise removal, etc.), optical character recognition (OCR), intelligent character recognition (ICR), validations, and export-release formatting of extracted data to target ERP and line of business application.

    Read more →
  • Eric Brill

    Eric Brill

    Eric Brill is a computer scientist specializing in natural language processing. He created the Brill tagger, a supervised part of speech tagger. Another research paper of Brill introduced a machine learning technique now known as transformation-based learning. == Biography == Brill earned a BA in mathematics from the University of Chicago in 1987 and a MS in Computer Science from UT Austin in 1989. In 1994, he completed his PhD at the University of Pennsylvania. He was an assistant professor at Johns Hopkins University from 1994 to 1999. In 1999, he left JHU for Microsoft Research, he developed a system called "Ask MSR" that answered search engine queries written as questions in English, and was quoted in 2004 as predicting the shift of Google's web-page based search to information based search. In 2009 he moved to eBay to head their research laboratories.

    Read more →
  • Machine translation of sign languages

    Machine translation of sign languages

    The machine translation of sign languages has been possible, albeit in a limited fashion, since 1977. When a research project successfully matched English letters from a keyboard to ASL manual alphabet letters which were simulated on a robotic hand. These technologies translate signed languages into written or spoken language, and written or spoken language to sign language, without the use of a human interpreter. Sign languages possess different phonological features than spoken languages, which has created obstacles for developers. Developers use computer vision and machine learning to recognize specific phonological parameters and epentheses unique to sign languages, and speech recognition and natural language processing allow interactive communication between hearing and deaf people. == Limitations == Sign language translation technologies are limited in the same way as spoken language translation. None can translate with 100% accuracy. In fact, sign language translation technologies are far behind their spoken language counterparts. This is, in no trivial way, due to the fact that signed languages have multiple articulators. Where spoken languages are articulated through the vocal tract, signed languages are articulated through the hands, arms, head, shoulders, torso, and parts of the face. This multi-channel articulation makes translating sign languages very difficult. An additional challenge for sign language MT is the fact that there is no formal written format for signed languages. There are notations systems but no writing system has been adopted widely enough, by the international Deaf community, that it could be considered the 'written form' of a given sign language. Sign Languages then are recorded in various video formats. There is no gold standard parallel corpus that is large enough for SMT, for example. == History == The history of automatic sign language translation started with the development of hardware such as finger-spelling robotic hands. In 1977, a finger-spelling hand project called RALPH (short for "Robotic Alphabet") created a robotic hand that can translate alphabets into finger-spellings. Later, the use of gloves with motion sensors became the mainstream, and some projects such as the CyberGlove and VPL Data Glove were born. The wearable hardware made it possible to capture the signers' hand shapes and movements with the help of the computer software. However, with the development of computer vision, wearable devices were replaced by cameras due to their efficiency and fewer physical restrictions on signers. To process the data collected through the devices, researchers implemented neural networks such as the Stuttgart Neural Network Simulator for pattern recognition in projects such as the CyberGlove. Researchers also use many other approaches for sign recognition. For example, Hidden Markov Models are used to analyze data statistically, and GRASP and other machine learning programs use training sets to improve the accuracy of sign recognition. Fusion of non-wearable technologies such as cameras and Leap Motion controllers have shown to increase the ability of automatic sign language recognition and translation software. == Technologies == === VISICAST === http://www.visicast.cmp.uea.ac.uk/Visicast_index.html === eSIGN project === http://www.visicast.cmp.uea.ac.uk/eSIGN/index.html === The American Sign Language Avatar Project at DePaul University === http://asl.cs.depaul.edu/ === Spanish to LSE === López-Ludeña, Verónica; San-Segundo, Rubén; González, Carlos; López, Juan Carlos; Pardo, José M. (2012), Methodology for developing a Speech into Sign Language Translation System in a New Semantic Domain (PDF), CiteSeerX 10.1.1.1065.5265, S2CID 2724186 === SignAloud === SignAloud is a technology that incorporates a pair of gloves made by a group of students at University of Washington that transliterate American Sign Language (ASL) into English. In February 2015 Thomas Pryor, a hearing student from the University of Washington, created the first prototype for this device at Hack Arizona, a hackathon at the University of Arizona. Pryor continued to develop the invention and in October 2015, Pryor brought Navid Azodi onto the SignAloud project for marketing and help with public relations. Azodi has a rich background and involvement in business administration, while Pryor has a wealth of experience in engineering. In May 2016, the duo told NPR that they are working more closely with people who use ASL so that they can better understand their audience and tailor their product to the needs of these people rather than the assumed needs. However, no further versions have been released since then. The invention was one of seven to win the Lemelson-MIT Student Prize, which seeks to award and applaud young inventors. Their invention fell under the "Use it!" category of the award which includes technological advances to existing products. They were awarded $10,000. The gloves have sensors that track the users hand movements and then send the data to a computer system via Bluetooth. The computer system analyzes the data and matches it to English words, which are then spoken aloud by a digital voice. The gloves do not have capability for written English input to glove movement output or the ability to hear language and then sign it to a deaf person, which means they do not provide reciprocal communication. The device also does not incorporate facial expressions and other nonmanual markers of sign languages, which may alter the actual interpretation from ASL. === ProDeaf === ProDeaf (WebLibras) is a computer software that can translate both text and voice into Portuguese Libras (Portuguese Sign Language) "with the goal of improving communication between the deaf and hearing." There is currently a beta edition in production for American Sign Language as well. The original team began the project in 2010 with a combination of experts including linguists, designers, programmers, and translators, both hearing and deaf. The team originated at Federal University of Pernambuco (UFPE) from a group of students involved in a computer science project. The group had a deaf team member who had difficulty communicating with the rest of the group. In order to complete the project and help the teammate communicate, the group created Proativa Soluções and have been moving forward ever since. The current beta version in American Sign Language is very limited. For example, there is a dictionary section and the only word under the letter 'j' is 'jump'. If the device has not been programmed with the word, then the digital avatar must fingerspell the word. The last update of the app was in June 2016, but ProDeaf has been featured in over 400 stories across the country's most popular media outlets. The application cannot read sign language and turn it into word or text, so it only serves as a one-way communication. Additionally, the user cannot sign to the app and receive an English translation in any form, as English is still in the beta edition. === Kinect Sign Language Translator === Since 2012, researchers from the Chinese Academy of Sciences and specialists of deaf education from Beijing Union University in China have been collaborating with Microsoft Research Asian team to create Kinect Sign Language Translator. The translator consists of two modes: translator mode and communication mode. The translator mode is capable of translating single words from sign into written words and vice versa. The communication mode can translate full sentences and the conversation can be automatically translated with the use of the 3D avatar. The translator mode can also detect the postures and hand shapes of a signer as well as the movement trajectory using the technologies of machine learning, pattern recognition, and computer vision. The device also allows for reciprocal communication because the speech recognition technology allows the spoken language to be translated into the sign language and the 3D modeling avatar can sign back to the deaf people. The original project was started in China based on translating Chinese Sign Language. In 2013, the project was presented at Microsoft Research Faculty Summit and Microsoft company meeting. Currently, this project is also being worked by researchers in the United States to implement American Sign Language translation. As of now, the device is still a prototype, and the accuracy of translation in the communication mode is still not perfect. === SignAll === SignAll is an automatic sign language translation system provided by Dolphio Technologies in Hungary. The team is "pioneering the first automated sign language translation solution, based on computer vision and natural language processing (NLP), to enable everyday communication between individuals with hearing who use spoken English and deaf or hard of hearing individuals who use ASL." The system of SignAll uses Kinect from Microsoft and other web camera

    Read more →
  • Highway network

    Highway network

    In machine learning, the Highway Network was the first working very deep feedforward neural network with hundreds of layers, much deeper than previous neural networks. It uses skip connections modulated by learned gating mechanisms to regulate information flow, inspired by long short-term memory (LSTM) recurrent neural networks. The advantage of the Highway Network over other deep learning architectures is its ability to overcome or partially prevent the vanishing gradient problem, thus improving its optimization. Gating mechanisms are used to facilitate information flow across the many layers ("information highways"). Highway Networks have found use in text sequence labeling and speech recognition tasks. In 2014, the state of the art was training deep neural networks with 20 to 30 layers. Stacking too many layers led to a steep reduction in training accuracy, known as the "degradation" problem. In 2015, two techniques were developed to train such networks: the Highway Network (published in May), and the residual neural network, or ResNet (December). ResNet behaves like an open-gated Highway Net. == Model == The model has two gates in addition to the H ( W H , x ) {\displaystyle H(W_{H},x)} gate: the transform gate T ( W T , x ) {\displaystyle T(W_{T},x)} and the carry gate C ( W C , x ) {\displaystyle C(W_{C},x)} . The latter two gates are non-linear transfer functions (specifically sigmoid by convention). The function H {\displaystyle H} can be any desired transfer function. The carry gate is defined as: C ( W C , x ) = 1 − T ( W T , x ) {\displaystyle C(W_{C},x)=1-T(W_{T},x)} while the transform gate is just a gate with a sigmoid transfer function. == Structure == The structure of a hidden layer in the Highway Network follows the equation: y = H ( x , W H ) ⋅ T ( x , W T ) + x ⋅ C ( x , W C ) = H ( x , W H ) ⋅ T ( x , W T ) + x ⋅ ( 1 − T ( x , W T ) ) {\displaystyle {\begin{aligned}y=H(x,W_{H})\cdot T(x,W_{T})+x\cdot C(x,W_{C})\\=H(x,W_{H})\cdot T(x,W_{T})+x\cdot (1-T(x,W_{T}))\end{aligned}}} == Related work == Sepp Hochreiter analyzed the vanishing gradient problem in 1991 and attributed to it the reason why deep learning did not work well. To overcome this problem, Long Short-Term Memory (LSTM) recurrent neural networks have residual connections with a weight of 1.0 in every LSTM cell (called the constant error carrousel) to compute y t + 1 = F ( x t ) + x t {\textstyle y_{t+1}=F(x_{t})+x_{t}} . During backpropagation through time, this becomes the residual formula y = F ( x ) + x {\textstyle y=F(x)+x} for feedforward neural networks. This enables training very deep recurrent neural networks with a very long time span t. A later LSTM version published in 2000 modulates the identity LSTM connections by so-called "forget gates" such that their weights are not fixed to 1.0 but can be learned. In experiments, the forget gates were initialized with positive bias weights, thus being opened, addressing the vanishing gradient problem. As long as the forget gates of the 2000 LSTM are open, it behaves like the 1997 LSTM. The Highway Network of May 2015 applies these principles to feedforward neural networks. It was reported to be "the first very deep feedforward network with hundreds of layers". It is like a 2000 LSTM with forget gates unfolded in time, while the later Residual Nets have no equivalent of forget gates and are like the unfolded original 1997 LSTM. If the skip connections in Highway Networks are "without gates," or if their gates are kept open (activation 1.0), they become Residual Networks. The residual connection is a special case of the "short-cut connection" or "skip connection" by Rosenblatt (1961) and Lang & Witbrock (1988) which has the form x ↦ F ( x ) + A x {\displaystyle x\mapsto F(x)+Ax} . Here the randomly initialized weight matrix A does not have to be the identity mapping. Every residual connection is a skip connection, but almost all skip connections are not residual connections. The original Highway Network paper not only introduced the basic principle for very deep feedforward networks, but also included experimental results with 20, 50, and 100 layers networks, and mentioned ongoing experiments with up to 900 layers. Networks with 50 or 100 layers had lower training error than their plain network counterparts, but no lower training error than their 20 layers counterpart (on the MNIST dataset, Figure 1 in ). No improvement on test accuracy was reported with networks deeper than 19 layers (on the CIFAR-10 dataset; Table 1 in ). The ResNet paper, however, provided strong experimental evidence of the benefits of going deeper than 20 layers. It argued that the identity mapping without modulation is crucial and mentioned that modulation in the skip connection can still lead to vanishing signals in forward and backward propagation (Section 3 in ). This is also why the forget gates of the 2000 LSTM were initially opened through positive bias weights: as long as the gates are open, it behaves like the 1997 LSTM. Similarly, a Highway Net whose gates are opened through strongly positive bias weights behaves like a ResNet. The skip connections used in modern neural networks (e.g., Transformers) are dominantly identity mappings.

    Read more →
  • P4-metric

    P4-metric

    The P4 metric (also known as FS or Symmetric F ) enables performance evaluation of a binary classifier. The P4 metric is calculated from precision, recall, specificity, and NPV (negative predictive value). The definition of the P4 metric is similar to that of the F1 metric, however the P4 metric definition addresses criticisms leveled against the definition of the F1 metric. The definition of the P4 metric may, therefore, be understood as an extension of the F1 metric. Like the other known metrics, the P4 metric is a function of: TP (true positives), TN (true negatives), FP (false positives), FN (false negatives). == Justification == The key concept of the P4 metric is to leverage the four key conditional probabilities: P ( + ∣ C + ) {\displaystyle P(+\mid C{+})} — the probability that the sample is positive, provided the classifier result was positive. P ( C + ∣ + ) {\displaystyle P(C{+}\mid +)} — the probability that the classifier result will be positive, provided the sample is positive. P ( C − ∣ − ) {\displaystyle P(C{-}\mid -)} — the probability that the classifier result will be negative, provided the sample is negative. P ( − ∣ C − ) {\displaystyle P(-\mid C{-})} — the probability the sample is negative, provided the classifier result was negative. The main assumption behind this metric is that all the probabilities mentioned above are close to 1 for a properly designed binary classifier. Indeed, P 4 = 1 {\displaystyle \mathrm {P} _{4}=1} if, and only if, all of the probabilities above are equal to 1. Another important feature is that P 4 {\displaystyle \mathrm {P} _{4}} tends to zero any of the above probabilities tend to zero. == Definition == P4 is defined as a harmonic mean of four key conditional probabilities: P 4 = 4 1 P ( + ∣ C + ) + 1 P ( C + ∣ + ) + 1 P ( C − ∣ − ) + 1 P ( − ∣ C − ) = 4 1 p r e c i s i o n + 1 r e c a l l + 1 s p e c i f i c i t y + 1 N P V . {\displaystyle \mathrm {P} _{4}={\frac {4}{{\frac {1}{P(+\mid C{+})}}+{\frac {1}{P(C{+}\mid +)}}+{\frac {1}{P(C{-}\mid -)}}+{\frac {1}{P(-\mid C{-})}}}}={\frac {4}{{\frac {1}{\mathit {precision}}}+{\frac {1}{\mathit {recall}}}+{\frac {1}{\mathit {specificity}}}+{\frac {1}{\mathit {NPV}}}}}.} In terms of TP,TN,FP,FN it can be calculated as follows: P 4 = 4 ⋅ T P ⋅ T N 4 ⋅ T P ⋅ T N + ( T P + T N ) ⋅ ( F P + F N ) . {\displaystyle \mathrm {P} _{4}={\frac {4\cdot \mathrm {TP} \cdot \mathrm {TN} }{4\cdot \mathrm {TP} \cdot \mathrm {TN} +(\mathrm {TP} +\mathrm {TN} )\cdot (\mathrm {FP} +\mathrm {FN} )}}.} == Evaluation of the binary classifier performance == Evaluating the performance of binary classifiers is a multidisciplinary concept. It spans from the evaluation of medical tests, psychiatric tests to machine learning classifiers from a variety of fields. Thus, many of the metrics in use exist under several names, some defined independently. == Properties of P4 metric == Symmetry — contrasting to the F1 metric, P4 is symmetrical. It means - it does not change its value when dataset labeling is changed - positives named negatives and negatives named positives. Range: P 4 ∈ [ 0 , 1 ] {\displaystyle \mathrm {P} _{4}\in [0,1]} . Achieving P 4 ≈ 1 {\displaystyle \mathrm {P} _{4}\approx 1} requires all the key four conditional probabilities being close to 1. For P 4 ≈ 0 {\displaystyle \mathrm {P} _{4}\approx 0} it is sufficient that one of the key four conditional probabilities is close to 0. == Examples, comparing with the other metrics == Dependency table for selected metrics ("true" means depends, "false" - does not depend): Metrics that do not depend on a given probability are prone to misrepresentation when the probability approaches 0. === Example 1: Rare disease detection test === Let us consider a medical test used to detect a rare disease. Suppose a population size of 100000 and 0.05% of the population is infected. Further suppose the following test performance: 95% of all positive individuals are classified correctly (TPR=0.95) and 95% of all negative individuals are classified correctly (TNR=0.95). In such a case, due to high population imbalance and in spite of having high test accuracy (0.95), the probability that an individual who has been classified as positive is in fact positive is very low: P ( + ∣ C + ) = 0.0095. {\displaystyle P(+\mid C{+})=0.0095.} We can observe how this low probability is reflected in some of the metrics: P 4 = 0.0370 {\displaystyle \mathrm {P} _{4}=0.0370} , F 1 = 0.0188 {\displaystyle \mathrm {F} _{1}=0.0188} , J = 0.9100 {\displaystyle \mathrm {J} =\mathbf {0.9100} } (Informedness / Youden index), M K = 0.0095 {\displaystyle \mathrm {MK} =0.0095} (Markedness). === Example 2: Image recognition — cats vs dogs === Consider the problem of training a neural network based image classifier with only two types of images: those containing dogs (labeled as 0) and those containing cats (labeled as 1). Thus, the goal is to distinguish between the cats and dogs. Suppose that the classifier overpredicts in favour of cats ("positive" samples): 99.99% of cats are classified correctly and only 1% of dogs are classified correctly. Further, suppose that the image dataset consists of 100000 images, 90% of which are pictures of cats and 10% are pictures of dogs. In this situation, the probability that the picture containing dog will be classified correctly is pretty low: P ( C − | − ) = 0.01. {\displaystyle P(C-|-)=0.01.} Not all metrics are notice this low probability: P 4 = 0.0388 {\displaystyle \mathrm {P} _{4}=0.0388} , F 1 = 0.9478 {\displaystyle \mathrm {F} _{1}=\mathbf {0.9478} } , J = 0.0099 {\displaystyle \mathrm {J} =0.0099} (Informedness / Youden index), M K = 0.8183 {\displaystyle \mathrm {MK} =\mathbf {0.8183} } (Markedness).

    Read more →
  • AI Photo Editors: Free vs Paid (2026)

    AI Photo Editors: Free vs Paid (2026)

    Trying to pick the best AI photo editor? An AI photo editor is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI photo editor slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →