INDECT is a research project in the area of intelligent security systems performed by several European universities since 2009 and funded by the European Union. The purpose of the project is to involve European scientists and researchers in the development of solutions to and tools for automatic threat detection through e.g. processing of CCTV camera data streams, standardization of video sequence quality for user applications, threat detection in computer networks as well as data and privacy protection. The area of research, applied methods, and techniques are described in the public deliverables which are available to the public on the project's website. Practically, all information related to the research is public. Only documents that comprise information related to financial data or information that could negatively influence the competitiveness and law enforcement capabilities of parties involved in the project are not published. This follows regulations and practices applied in EU research projects. == Application and target users == The main end-user of INDECT solutions are police forces and security services. The principle of operation of the project is detecting threats and identifying sources of threats, without monitoring and searching for particular citizens or groups of citizens. Then, the system operator (i.e. police officer) decides whether an intervention of services responsible for public security are required or not. Further investigation eventually leading to persons related to threats is performed, preserving the presumption of innocence, based on existing procedures already used by police services and prosecutors. As it can be found in the project deliverables, INDECT does not involve storage of personal data (such as names, addresses, identity document numbers, etc.). A similar, behavior-based surveillance program was SAMURAI (Suspicious and Abnormal behavior Monitoring Using a netwoRk of cAmeras & sensors for sItuation awareness enhancement). == Expected results == The main expected results of the INDECT project are: Trial of intelligent analysis of video and audio data for threat detection in urban environments Creation of tools and technology for privacy and data protection during storage and transmission of information using quantum cryptography and new methods of digital watermarking Performing computer-aided detection of threats and targeted crimes in Internet resources with privacy-protecting solutions Construction of a search engine for rapid semantic search based on watermarking of content related to child pornography and human organ trafficking Implementation of a distributed computer system that is capable of effective intelligent processing == Controversy == Some media and other sources accuse INDECT of privacy abuse, collecting personal data, and keeping information from the public. Consequently, these issues have been commented and discussed by some Members of the European Parliament. As seen in the project's documentation, INDECT does not involve mobile phone tracking or call interception. The rumors about testing INDECT during 2012 UEFA European Football Championship also turned out to be false. The mid-term review of the Seventh Framework Programme to the European Parliament strongly urges the European Commission to immediately make all documents available and to define a clear and strict mandate for the research goal, the application, and the end users of INDECT, and stresses a thorough investigation of the possible impact on fundamental rights. Nevertheless, according to Mr. Paweł Kowal, MEP, the project had the ethical review on 15 March 2011 in Brussels with the participation of ethics experts from Austria, France, Netherlands, Germany and Great Britain.
DataScene
DataScene is a scientific graphing, animation, data analysis, and real-time data monitoring software package. It was developed with the Common Language Infrastructure technology and the GDI+ graphics library. With the two Common Language Runtime engines - the .Net and Mono frameworks - DataScene runs on all major operating systems. With DataScene, the user can plot 39 types 2D & 3D graphs (e.g., Area graph, Bar graph, Boxplot graph, Pie graph, Line graph, Histogram graph, Surface graph, Polar graph, Water Fall graph, etc.), manipulate, print, and export graphs to various formats (e.g., Bitmap, WMF/EMF, JPEG, PNG, GIF, TIFF, PostScript, and PDF), analyze data with different mathematical methods (fitting curves, calculating statics, FFT, etc.), create chart animations for presentations (e.g. with PowerPoint), classes, and web pages, and monitor and chart real-time data. == History == DataScene was first released (version 1.0) in March 2009 for the Windows platform and the .Net 2.0 framework. Since version 2.0, DataScene has been ported to the Mono framework 2.6 and all Linux and Unix/X11 operating systems. Cyberwit offers free licensing for the Express edition of DataScene.
Markovian discrimination
Markovian discrimination is a class of spam filtering methods used in CRM114 and other spam filters to filter based on statistical patterns of transition probabilities between words or other lexical tokens in spam messages that would not be captured using simple bag-of-words naive Bayes spam filtering. == Markovian Discrimination vs. Bag-of-Words Discrimination == A bag-of-words model contains only a dictionary of legal words and their relative probabilities in spam and genuine messages. A Markovian model additionally includes the relative transition probabilities between words in spam and in genuine messages, where the relative transition probability is the likelihood that a given word will be written next, based on what the current word is. Put another way, a bag-of-words filter discriminates based on relative probabilities of single words alone regardless of phrase structure, while a Markovian word-based filter discriminates based on relative probabilities of either pairs of words, or, more commonly, short sequences of words. This allows the Markovian filter greater sensitivity to phrase structure. Neither naive Bayes nor Markovian filters are limited to the word level for tokenizing messages. They may also process letters, partial words, or phrases as tokens. In such cases, specific bag-of-words methods would correspond to general bag-of-tokens methods. Modelers can parameterize Markovian spam filters based on the relative probabilities of any such tokens' transitions appearing in spam or in legitimate messages. == Visible and Hidden Markov Models == There are two primary classes of Markov models, visible Markov models and hidden Markov models, which differ in whether the Markov chain generating token sequences is assumed to have its states fully determined by each generated token (the visible Markov models) or might also have additional state (the hidden Markov models). With a visible Markov model, each current token is modeled as if it contains the complete information about previous tokens of the message relevant to the probability of future tokens, whereas a hidden Markov model allows for more obscure conditional relationships. Since those more obscure conditional relationships are more typical of natural language messages including both genuine messages and spam, hidden Markov models are generally preferred over visible Markov models for spam filtering. Due to storage constraints, the most commonly employed model is a specific type of hidden Markov model known as a Markov random field, typically with a 'sliding window' or clique size ranging between four and six tokens.
AI Background Removers: Free vs Paid (2026)
Looking for the best AI background remover? An AI background remover is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI background remover slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.
Evaluation of machine translation
Various methods for the evaluation for machine translation have been employed. This article focuses on the evaluation of the output of machine translation, rather than on performance or usability evaluation. == Round-trip translation == A typical way for lay people to assess machine translation quality is to translate from a source language to a target language and back to the source language with the same engine. Though intuitively this may seem like a good method of evaluation, it has been shown that round-trip translation is a "poor predictor of quality". The reason why it is such a poor predictor of quality is reasonably intuitive. A round-trip translation is not testing one system, but two systems: the language pair of the engine for translating into the target language, and the language pair translating back from the target language. Consider the following examples of round-trip translation performed from English to Italian and Portuguese from Somers (2005): In the first example, where the text is translated into Italian then back into English—the English text is significantly garbled, but the Italian is a serviceable translation. In the second example, the text translated back into English is perfect, but the Portuguese translation is meaningless; the program thought "tit" was a reference to a tit (bird), which was intended for a "tat", a word it did not understand. While round-trip translation may be useful to generate a "surplus of fun," the methodology is deficient for serious study of machine translation quality. == Human evaluation == This section covers two of the large scale evaluation studies that have had significant impact on the field—the ALPAC 1966 study and the ARPA study. === Automatic Language Processing Advisory Committee (ALPAC) === One of the constituent parts of the ALPAC report was a study comparing different levels of human translation with machine translation output, using human subjects as judges. The human judges were specially trained for the purpose. The evaluation study compared an MT system translating from Russian into English with human translators, on two variables. The variables studied were "intelligibility" and "fidelity". Intelligibility was a measure of how "understandable" the sentence was, and was measured on a scale of 1–9. Fidelity was a measure of how much information the translated sentence retained compared to the original, and was measured on a scale of 0–9. Each point on the scale was associated with a textual description. For example, 3 on the intelligibility scale was described as "Generally unintelligible; it tends to read like nonsense but, with a considerable amount of reflection and study, one can at least hypothesize the idea intended by the sentence". Intelligibility was measured without reference to the original, while fidelity was measured indirectly. The translated sentence was presented, and after reading it and absorbing the content, the original sentence was presented. The judges were asked to rate the original sentence on informativeness. So, the more informative the original sentence, the lower the quality of the translation. The study showed that the variables were highly correlated when the human judgment was averaged per sentence. The variation among raters was small, but the researchers recommended that at the very least, three or four raters should be used. The evaluation methodology managed to separate translations by humans from translations by machines with ease. The study concluded that, "highly reliable assessments can be made of the quality of human and machine translations". === Advanced Research Projects Agency (ARPA) === As part of the Human Language Technologies Program, the Advanced Research Projects Agency (ARPA) created a methodology to evaluate machine translation systems, and continues to perform evaluations based on this methodology. The evaluation programme was instigated in 1991, and continues to this day. Details of the programme can be found in White et al. (1994) and White (1995). The evaluation programme involved testing several systems based on different theoretical approaches; statistical, rule-based and human-assisted. A number of methods for the evaluation of the output from these systems were tested in 1992 and the most recent suitable methods were selected for inclusion in the programmes for subsequent years. The methods were; comprehension evaluation, quality panel evaluation, and evaluation based on adequacy and fluency. Comprehension evaluation aimed to directly compare systems based on the results from multiple choice comprehension tests, as in Church et al. (1993). The texts chosen were a set of articles in English on the subject of financial news. These articles were translated by professional translators into a series of language pairs, and then translated back into English using the machine translation systems. It was decided that this was not adequate for a standalone method of comparing systems and as such abandoned due to issues with the modification of meaning in the process of translating from English. The idea of quality panel evaluation was to submit translations to a panel of expert native English speakers who were professional translators and get them to evaluate them. The evaluations were done on the basis of a metric, modelled on a standard US government metric used to rate human translations. This was good from the point of view that the metric was "externally motivated", since it was not specifically developed for machine translation. However, the quality panel evaluation was very difficult to set up logistically, as it necessitated having a number of experts together in one place for a week or more, and furthermore for them to reach consensus. This method was also abandoned. Along with a modified form of the comprehension evaluation (re-styled as informativeness evaluation), the most popular method was to obtain ratings from monolingual judges for segments of a document. The judges were presented with a segment, and asked to rate it for two variables, adequacy and fluency. Adequacy is a rating of how much information is transferred between the original and the translation, and fluency is a rating of how good the English is. This technique was found to cover the relevant parts of the quality panel evaluation, while at the same time being easier to deploy, as it didn't require expert judgment. Measuring systems based on adequacy and fluency, along with informativeness is now the standard methodology for the ARPA evaluation program. == Automatic evaluation == In the context of this article, a metric is a measurement. A metric that evaluates machine translation output represents the quality of the output. The quality of a translation is inherently subjective, there is no objective or quantifiable "good." Therefore, any metric must assign quality scores so they correlate with the human judgment of quality. That is, a metric should score highly translations that humans score highly, and give low scores to those humans give low scores. Human judgment is the benchmark for assessing automatic metrics, as humans are the end-users of any translation output. The measure of evaluation for metrics is correlation with human judgment. This is generally done at two levels, at the sentence level, where scores are calculated by the metric for a set of translated sentences, and then correlated against human judgment for the same sentences. And at the corpus level, where scores over the sentences are aggregated for both human judgments and metric judgments, and these aggregate scores are then correlated. Figures for correlation at the sentence level are rarely reported, although Banerjee et al. (2005) do give correlation figures that show that, at least for their metric, sentence-level correlation is substantially worse than corpus level correlation. While not widely reported, it has been noted that the genre, or domain, of a text has an effect on the correlation obtained when using metrics. Coughlin (2003) reports that comparing the candidate text against a single reference translation does not adversely affect the correlation of metrics when working in a restricted domain text. Even if a metric correlates well with human judgment in one study on one corpus, this successful correlation may not carry over to another corpus. Good metric performance, across text types or domains, is important for the reusability of the metric. A metric that only works for text in a specific domain is useful, but less useful than one that works across many domains—because creating a new metric for every new evaluation or domain is undesirable. Another important factor in the usefulness of an evaluation metric is to have a good correlation, even when working with small amounts of data, that is candidate sentences and reference translations. Turian et al. (2003) point out that, "Any MT evaluation measure is less reliable on shorter translations", and
Data-centric AI
Data-centric AI is an approach within artificial intelligence that emphasizes on improving the quality, consistency and representativeness of the data used to train machine learning models, rather than focusing primarily on optimizing model architectures or algorithms. This idea has gained traction as researchers and practitioners have come to believe that many performance limitations of machine learning systems stem from issues such as noisy labels, biased datasets, and lack of coverage in the data. Data-centric AI involves disciplined approach to data cleaning, augmentation, labeling, and governance that improves model performance and reliability in applications such as computer vision, natural language processing, and further.
Best AI Paragraph Rewriters in 2026
In search of the best AI paragraph rewriter? An AI paragraph rewriter is software that uses machine learning to help you get more done — it turns a rough idea into a polished result in seconds. When choosing one, weigh output quality, pricing, export formats, and how well it fits the tools you already use. Whether you are a beginner or a pro, the right AI paragraph rewriter slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.