AI Data Usage

AI Data Usage — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Instance-based learning

    Instance-based learning

    In machine learning, instance-based learning (sometimes called memory-based learning) is a family of learning algorithms that, instead of performing explicit generalization, compare new problem instances with instances seen in training, which have been stored in memory. Because computation is postponed until a new instance is observed, these algorithms are sometimes referred to as "lazy." It is called instance-based because it constructs hypotheses directly from the training instances themselves. This means that the hypothesis complexity can grow with the data: in the worst case, a hypothesis is a list of n training items and the computational complexity of classifying a single new instance is O(n). One advantage that instance-based learning has over other methods of machine learning is its ability to adapt its model to previously unseen data. Instance-based learners may simply store a new instance or throw an old instance away. Examples of instance-based learning algorithms are the k-nearest neighbors algorithm, kernel machines and RBF networks. These store (a subset of) their training set; when predicting a value/class for a new instance, they compute distances or similarities between this instance and the training instances to make a decision. To battle the memory complexity of storing all training instances, as well as the risk of overfitting to noise in the training set, instance reduction algorithms have been proposed.

    Read more →
  • Historical Thesaurus of English

    Historical Thesaurus of English

    The Historical Thesaurus of English (HTE) is the largest thesaurus in the world. It is called a historical thesaurus as it arranges the whole vocabulary of English, from the earliest written records in Old English to the present, according to the first documented occurrence of a word in the entire history of the English language. The HTE was conceived and begun in 1965 by the English Language & Linguistics department of the University of Glasgow, who have ever since continued to compile the thesaurus. From the 1980s onwards the project was moved from paper-based records to a computer database. Today, the HTE is available to the public online, but a print version, the Historical Thesaurus of the Oxford English Dictionary (HTOED), was published in 2009. == Main project: The Historical Thesaurus of English (HTE) == The Historical Thesaurus of English (HTE) is a complete database of all the words in the Oxford English Dictionary and other dictionaries (including Old English), arranged by semantic field and date. In this way, the HTE arranges the whole vocabulary of English, from the earliest written records in Old English to the present, alongside dates of use. It is the first historical thesaurus to be compiled for any of the world's languages and contains 800,000 meanings for 600,000 words, within 230,000 categories. As the HTE website states, "in addition to providing hitherto unavailable information for linguistic and textual scholars, the Historical Thesaurus online is a rich resource for students of social and cultural history, showing how concepts developed through the words that refer to them." === Structure === The work is divided into three main sections: the External World, the Mind, and Society. These are broken down into successively narrower domains. The text eventually discriminates more than 236,000 categories. The second order categories are: === History === The ambitious project was announced at a 1965 meeting of the Philological Society by its originator, Michael Samuels. Work on the HTE started in the same year. In 2017, the University of Glasgow was awarded the Queen's Anniversary Prize for Higher Education for the HTE. A second edition of the online HTE is currently in progress and is expected to be launched in late 2020. Work is released on the freely-available HTE website when available. == Print edition: Historical Thesaurus of the Oxford English Dictionary (HTOED) == On 22 October 2009, after 44 years of work, version 1.0 of the HTE was published by Oxford University Press in a two-volume slipcased set as the Historical Thesaurus of the Oxford English Dictionary (HTOED). The two hardcover volumes together total nearly 4,500 pages.

    Read more →
  • Journal of Experimental and Theoretical Artificial Intelligence

    Journal of Experimental and Theoretical Artificial Intelligence

    The Journal of Experimental and Theoretical Artificial Intelligence is a quarterly peer-reviewed scientific journal published by Taylor and Francis. It covers all aspects of artificial intelligence and was established in 1989. The editor-in-chief is Eric Dietrich (Binghamton University), the deputy editors-in-chief are Li Pheng Khoo (School of Mechanical & Aerospace Engineering, Nanyang Technological University) and Antonio Lieto (Department of Computer Science, University of Turin). == Abstracting and indexing == The journal is abstracted and indexed in: According to the Journal Citation Reports, the journal has a 2020/2021 impact factor of 2.340 .

    Read more →
  • Plinian Core

    Plinian Core

    Plinian Core is a set of vocabulary terms that can be used to describe different aspects of biological species information. Under "biological species Information" all kinds of properties or traits related to taxa—biological and non-biological—are included. Thus, for instance, terms pertaining descriptions, legal aspects, conservation, management, demographics, nomenclature, or related resources are incorporated. == Description == The Plinian Core is aimed to facilitate the exchange of information about the species and upper taxa. What is in scope? Species level catalogs of any kind of biological objects or data. Terminology associated with biological collection data. Striving for compatibility with other biodiversity-related standards. Facilitating the addition of components and attributes of biological data. What is not in scope? Data interchange protocols. Non-biodiversity-related data. Occurrence level data. This standard is named after Pliny the Elder, a very influential figure in the study of the biological species. Plinian Core design requirements includes: ease of use, to be self-contained, able to support data integration from multiple databases, and ability to handle different levels of granularity. Core terms can be grouped in its current version as follows: Metadata Base Elements Record Metadata Nomenclature and Classification Taxonomic description Natural history Invasive species Habitat and Distribution Demography and Threats Uses, Management and Conservation associatedParty, MeasurementOrFact, References, AncillaryData == Background == Plinian Core started as a collaborative project between Instituto Nacional de Biodiversidad and GBIF Spain in 2005. A series of iterations in which elements were defined and implanted in different projects resulted in a "Plinian Core Flat" [deprecated]. As a result, a new development was impulse to overcome them in 2012. New formal requirements, additional input and a will to better support the standard and its documentation, as well as to align it with the processes of TDWG, the world reference body for biodiversity information standards. A new version, Plinian Core v3.x.x was defined. This provides more flexibility to fully represent the information of a species in a variety of scenarios. New elements to deal with aspects such as IPR, related resources, referenced, etc. were introduced, and elements already included were better-defined and documented. Partner for the development of Plinian Core in this new phase incorporated the University of Granada (UG, Spain), the Alexander von Humboldt Institute (IAvH, Colombia), the National Commission for the Knowledge and Use of Biodiversity (Conabio, Mexico) and the University of São Paulo (USP, Brazil). A "Plinian Core Task Group" within TDWG "Interest Group on species Information" was constituted and currently working on its development. == Levels of the standard == Plinian Core is presented in to levels: the abstract model and the application profiles. The abstract model (AM), comprising the abstract model schema(xsd) and the terms' URIs, is the normative part. It is all comprehensive, and allows for different levels of granularity in describing species properties. The AM should be taken as a "menu" from which to choose terms and level of detail needed in any specific project. The subsets of the abstract model intended to be implemented in specific projects are the "application profiles" (APs). Besides containing part of the elements of the AM, APs can impose additional specifications on the included elements, such as controlled vocabularies. Some examples of APs in use follow: Application profile CONABIO Application profile INBIO Application profile GBIF.ES Application profile Banco de Datos de la Naturaleza.Spain Application profile SIB-COLOMBIA == Relation to other standards == Plinian incorporates a number of elements already defined by other standards. The following table summarizes these standards and the elements used in Plinian Core:

    Read more →
  • Autoscaling

    Autoscaling

    Autoscaling, (also written as auto scaling, auto-scaling, or known as automatic scaling), is a method used in cloud computing that dynamically adjusts the amount of computational resources in a server farm - typically measured by the number of active servers - automatically based on the load on the farm. For example, the number of servers running behind a web application may be increased or decreased automatically based on the number of active users on the site. Since such metrics may change dramatically throughout the course of the day, and servers are a limited resource that cost money to run even while idle, there is often an incentive to run "just enough" servers to support the current load while still being able to support sudden and large spikes in activity. Autoscaling is helpful for such needs, as it can reduce the number of active servers when activity is low, and launch new servers when activity is high. Autoscaling is closely related to, and builds upon, the idea of load balancing. == Advantages == Autoscaling offers the following advantages: For companies running their own web server infrastructure, autoscaling typically means allowing some servers to go to sleep during times of low load, saving on electricity costs (as well as water costs if water is being used to cool the machines). For companies using infrastructure hosted in the cloud, autoscaling can mean lower bills, because most cloud providers charge based on total usage rather than maximum capacity. Even for companies that cannot reduce the total compute capacity they run or pay for at any given time, autoscaling can help by allowing the company to run less time-sensitive workloads on machines that get freed up by autoscaling during times of low traffic. Autoscaling solutions, such as the one offered by Amazon Web Services, can also take care of replacing unhealthy instances and therefore protecting somewhat against hardware, network, and application failures. Autoscaling can offer greater uptime and more availability in cases where production workloads are variable and unpredictable. Autoscaling differs from having a fixed daily, weekly, or yearly cycle of server use in that it is responsive to actual usage patterns, and thus reduces the potential downside of having too few or too many servers for the traffic load. For instance, if traffic is usually lower at midnight, then a static scaling solution might schedule some servers to sleep at night, but this might result in downtime on a night where people happen to use the Internet more (for instance, due to a viral news event). Autoscaling, on the other hand, can handle unexpected traffic spikes better. == Terminology == In the list below, we use the terminology used by Amazon Web Services (AWS). However, alternative names are noted and terminology that is specific to the names of Amazon services is not used for the names. == Practice == === Amazon Web Services (AWS) === Amazon Web Services launched the Amazon Elastic Compute Cloud (EC2) service in August 2006, that allowed developers to programmatically create and terminate instances (machines). At the time of initial launch, AWS did not offer autoscaling, but the ability to programmatically create and terminate instances gave developers the flexibility to write their own code for autoscaling. Third-party autoscaling software for AWS began appearing around April 2008. These included tools by Scalr and RightScale. RightScale was used by Animoto, which was able to handle Facebook traffic by adopting autoscaling. On May 18, 2009, Amazon launched its own autoscaling feature along with Elastic Load Balancing, as part of Amazon Elastic Compute Cloud. Autoscaling is now an integral component of Amazon's EC2 offering. Autoscaling on Amazon Web Services is done through a web browser or the command line tool. In May 2016 Autoscaling was also offered in AWS ECS Service. On-demand video provider Netflix documented their use of autoscaling with Amazon Web Services to meet their highly variable consumer needs. They found that aggressive scaling up and delayed and cautious scaling down served their goals of uptime and responsiveness best. In an article for TechCrunch, Zev Laderman, the co-founder and CEO of Newvem, a service that helps optimize AWS cloud infrastructure, recommended that startups use autoscaling in order to keep their Amazon Web Services costs low. Various best practice guides for AWS use suggest using its autoscaling feature even in cases where the load is not variable. That is because autoscaling offers two other advantages: automatic replacement of any instances that become unhealthy for any reason (such as hardware failure, network failure, or application error), and automatic replacement of spot instances that get interrupted for price or capacity reasons, making it more feasible to use spot instances for production purposes. Netflix's internal best practices require every instance to be in an autoscaling group, and its conformity monkey terminates any instance not in an autoscaling group in order to enforce this best practice. === Microsoft's Windows Azure === On June 27, 2013, Microsoft announced that it was adding autoscaling support to its Windows Azure cloud computing platform. Documentation for the feature is available on the Microsoft Developer Network. === Oracle Cloud === Oracle Cloud Platform allows server instances to automatically scale a cluster in or out by defining an auto-scaling rule. These rules are based on CPU and/or memory utilization and determine when to add or remove nodes. === Google Cloud Platform === On November 17, 2014, the Google Compute Engine announced a public beta of its autoscaling feature for use in Google Cloud Platform applications. As of March 2015, the autoscaling tool is still in Beta. === Facebook === In a blog post in August 2014, a Facebook engineer disclosed that the company had started using autoscaling to bring down its energy costs. The blog post reported a 27% decline in energy use for low traffic hours (around midnight) and a 10-15% decline in energy use over the typical 24-hour cycle. === Kubernetes Horizontal Pod Autoscaler === Kubernetes Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment or replicaset based on observed CPU utilization (or, with beta support, on some other, application-provided metrics) == Alternative autoscaling decision approaches == Autoscaling by default uses reactive decision approach for dealing with traffic scaling: scaling only happens in response to real-time changes in metrics. In some cases, particularly when the changes occur very quickly, this reactive approach to scaling is insufficient. Two other kinds of autoscaling decision approaches are described below. === Scheduled autoscaling approach === This is an approach to autoscaling where changes are made to the minimum size, maximum size, or desired capacity of the autoscaling group at specific times of day. Scheduled scaling is useful, for instance, if there is a known traffic load increase or decrease at specific times of the day, but the change is too sudden for reactive approach based autoscaling to respond fast enough. AWS autoscaling groups support scheduled scaling. === Predictive autoscaling === This approach to autoscaling uses predictive analytics. The idea is to combine recent usage trends with historical usage data as well as other kinds of data to predict usage in the future, and autoscale based on these predictions. For parts of their infrastructure and specific workloads, Netflix found that Scryer, their predictive analytics engine, gave better results than Amazon's reactive autoscaling approach. In particular, it was better for: Identifying huge spikes in demand in the near future and getting capacity ready a little in advance Dealing with large-scale outages, such as failure of entire availability zones and regions Dealing with variable traffic patterns, providing more flexibility on the rate of scaling out or in based on the typical level and rate of change in demand at various times of day On November 20, 2018, AWS announced that predictive scaling would be available as part of its autoscaling offering.

    Read more →
  • Thomas Bolander

    Thomas Bolander

    Thomas Bolander is a Danish professor at DTU Compute, Technical University of Denmark, where he studies logic and artificial intelligence. Most of his studies focus on the social aspect of artificial intelligence, and how we can make future AI able to navigate in social interactions. Thomas Bolander also sits in different commissions, expert panels and boards, among these he is a member of the Siri Commission, the TeckDK Commission, a member of the editorial board of the journal Studia Logica and co-organizer of Science and Cocktails. Bolander is known for his dissemination of science. In 2019 he was awarded the H. C. Ørsted Medal. Which he was the first to achieve after a break of three years.

    Read more →
  • Decision Model and Notation

    Decision Model and Notation

    In business analysis, the Decision Model and Notation (DMN) is a standard published by the Object Management Group. It is a standard approach for describing and modeling repeatable decisions within organizations to ensure that decision models are interchangeable across organizations. The DMN standard provides the industry with a modeling notation for decisions that will support decision management and business rules. The notation is designed to be readable by business and IT users alike. This enables various groups to effectively collaborate in defining a decision model: the business people who manage and monitor the decisions, the business analysts or functional analysts who document the initial decision requirements and specify the detailed decision models and decision logic, the technical developers responsible for the automation of systems that make the decisions. The primary goal of DMN is to offer a common notation that all business users can easily understand. This includes business analysts who develop decision requirements and models, technical developers who automate decisions, and businesspeople who manage and monitor those decisions. DMN serves as a standardized link between business decision design and implementation.[4] The DMN standard can be effectively used standalone but it is also complementary to the BPMN and CMMN standards. BPMN defines a special kind of activity, the Business Rule Task, which "provides a mechanism for the process to provide input to a business rule engine and to get the output of calculations that the business rule engine might provide" that can be used to show where in a BPMN process a decision defined using DMN should be used. DMN has been made a standard for Business Analysis according to BABOK v3. == Elements of the standard == The standard includes three main elements Decision Requirements Diagrams that show how the elements of decision-making are linked into a dependency network. Decision tables to represent how each decision in such a network can be made. Business context for decisions such as the roles of organizations or the impact on performance metrics. A Friendly Enough Expression Language (FEEL) that can be used to evaluate expressions in a decision table and other logic formats. == Use cases == The standard identifies three main use cases for DMN Defining manual decision making Specifying the requirements for automated decision-making Representing a complete, executable model of decision-making == Benefits == Using the DMN standard will improve business analysis and business process management, since other popular requirement management techniques such as BPMN and UML do not handle decision making growth of projects using business rule management systems or BRMS, which allow faster changes it facilitates better communications between business, IT and analytic roles in a company it provides an effective requirements modeling approach for predictive analytics projects and fulfills the need for "business understanding" in methodologies for advanced analytics such as CRISP-DM it provides a standard notation for decision tables, the most common style of business rules in a business rule management system (BRMS) == Relationship to BPMN == DMN has been designed to work with BPMN. Business process models can be simplified by moving process logic into decision services. DMN is a separate domain within the OMG that provides an explicit way to connect to processes in BPMN. Decisions in DMN can be explicitly linked to processes and tasks that use the decisions. This integration of DMN and BPMN has been studied extensively. DMN expects that the logic of a decision will be deployed as a stateless, side-effect free Decision Service. Such a service can be invoked from a business process and the data in the process can be mapped to the inputs and outputs of the decision service. == DMN BPMN example == As mentioned, BPMN is a related OMG Standard for process modeling. DMN complements BPMN, providing a separation of concerns between the decision and the process. The example here describes a BPMN process and DMN DRD (Decision Requirements Diagram) for onboarding a bank customer. Several decisions are modeled and these decisions will direct the processes response. === New bank account process === In the BPMN process model shown in the figure, a customer makes a request to open a new bank account. The account application provides the account representative with all the information needed to create an account and provide the requested services. This includes the name, address and various forms of identification. In the next steps of the work flow, the know your customer (KYC) services are called. In the KYC services, the name and address are validated; followed by a check against the international criminal database (Interpol) and the database of persons that are 'politically exposed persons (PEP)'. The PEP is a person who is either entrusted with a prominent political position or a close relative thereof. Deposits from persons on the PEP list are potentially corrupt. This is shown as two services on the process model. Anti-money-laundering (AML) regulations require these checks before the customer account is certified. The results of these services plus the forms of identification are sent to the Certify New Account decision. This is shown as a 'rule' activity, verify account, on the process diagram. If the new customer passes certification, then the account is classified into onboarding for business retail, retail, wealth management and high-value business. Otherwise the customer application is declined. The Classify New Customer Decision classifies the customer. If the verify-account process returns a result of 'Manual' then the PEP or the Interpol check returned a close match. The account representative must visually inspect the name and the application to determine if the match is valid and accept or decline the application. === Certify new account decision === An account is certified for opening if the individual's' address is verified, and if valid identification is provided, and if the applicant is not on a list of criminals or politically exposed persons. These are shown as sub-decisions below the 'certify new account' decision. The account verification services provides a 100% match of the applicants address. For identification to be valid, the customer must provide a driver's license, passport or government issued ID. The checks against PEP and Interpol are 'fuzzy' matches and return matching score values. Scores above 85 are considered a 'match' and scores between 65 and 85 would require a 'manual' screening process. People who match either of these lists are rejected by the account application process. If there is a partial match with a score between 65 and 85, against the Interpol or PEP list then the certification is set to manual and an account representative performs a manual verification of the applicant's data. These rules are reflected in the figure below, which presents the decision table for whether to pass the provided name for the lists checks. === Client category === The client's on-boarding process is driven by what category they fall in. The category is decided by the: Type of client, business or private The size of the funds on deposit And the estimated net worth This decision is shown below: There are 6 business rules that determine the client's category and these are shown in the decision table here: === Summary example === In this example, the outcome of the 'Verify Account' decision directed the responses of the new account process. The same is true for the 'Classify Customer' decision. By adding or changing the business rules in the tables, one can easily change the criteria for these decisions and control the process differently. Modeling is a critical aspect of improving an existing process or business challenge. Modeling is generally done by a team of business analysts, IT personnel, and modeling experts. The expressive modeling capabilities of BPMN allows business analyst to understand the functions of the activities of the process. Now with the addition of DMN, business analysts can construct an understandable model of complex decisions. Combining BPMN and DMN yields a very powerful combination of models that work synergistically to simplify processes. == Relationship to decision mining and process mining == Automated discovery techniques that infer decision models from process execution data have been proposed as well. Here, a DMN decision model is derived from a data-enriched event log, along with the process that uses the decisions. In doing so, decision mining complements process mining with traditional data mining approaches. == cDMN extension == Constraint Decision Model and Notation (cDMN) is a formal notation for expressing knowledge in a tabular, intuitive format. It extends DMN with constraint reasoning and related concepts while aiming to retain the us

    Read more →
  • Computer game bot Turing test

    Computer game bot Turing test

    The computer game bot Turing test is a variant of the Turing test, where a human judge viewing and interacting with a virtual world must distinguish between other humans and video game bots, both interacting with the same virtual world. This variant was first proposed in 2008 by Associate Professor Philip Hingston of Edith Cowan University, and implemented through a tournament called the 2K BotPrize. == History == The computer game bot Turing test was proposed to advance the fields of artificial intelligence (AI) and computational intelligence with respect to video games. It was considered that a poorly implemented bot implied a subpar game, so a bot that would be capable of passing this test, and therefore might be indistinguishable from a human player, would directly improve the quality of a game. It also served to debunk a flawed notion that "game AI is a solved problem." Emphasis is placed on a game bot that interacts with other players in a multiplayer environment. Unlike a bot that simply needs to make optimal human-like decisions to play or beat a game, this bot must make the same decisions while also convincing another in-game player of its human-likeness. == Implementation == The computer game bot Turing test was designed to test a bot's ability to interact with a game environment in comparison with a human player; simply 'winning' was insufficient. This evolved into a contest with a few important goals in mind: There are three participants: a human player, a computer-game bot, and a judge. The bot needs to appear more human-like than the human player. Judge scores are not bipolar — both human and bot can be scored anywhere on a scale from 1 to 5 (1=not humanlike, 5=human). All three participants are to be indistinguishable in the arena, with the exception of a randomly generated name tag, so as to reduce the chance of random elements such as name or appearance influencing the judges. Chat is disabled throughout the match. Bots were not given omniscient powers as they may be in other games. Bots must react only to the data that might be reasonably available to a human player. Human participants were of a moderate skill range, with no participant either ignorant to the game or capable of playing at a professional level. In 2008, the first 2K BotPrize tournament took place. The contest was held with the game Unreal Tournament 2004 as the platform. Contestants created their bots in advance using the GameBots interface. GameBots had some modifications made so as to adhere to the above conditions, such as removing data about vantage points or weapon damage that unfairly informed the bots of relevant strengths/weaknesses that a human would otherwise need to learn. == Tournament == The first BotPrize Tournament was held on 17 December 2008, as part of the 2008 IEEE Symposium on Computational Intelligence and Games in Australia. Each competing team was given time to set up and adjust their bots to the modified game client, although no coding changes were allowed at that point. The tournament was run in rounds, each a 10-minute death match. Judges were the last to join the server and every judge observed every player and every bot exactly once, although the pairing of players and bots did change. When the tournament ended, no bot was rated as more human than any player. In subsequent tournaments, run during 2009–2011, bots achieved scores that were increasingly human-like, but no contestant had won the BotPrize in any of these contests. In 2012, the 2K BotPrize was held once again, and two teams programmed bots that achieved scores greater than those of human players. == Successful bots == To date, there have been two successfully programmed bots that passed the computer game bot Turing test: UT^2, a team from the University of Texas at Austin, emphasized a bot that adjusted its behaviour based on previously observed human behaviour and neuroevolution. The team has made their bot available, although a copy of Unreal Tournament 2004 is required. Mihai Polceanu, a doctoral student from Romania, focused on creating a bot that would mimic opponent reactions, in a sense 'borrowing' the human-like nature of the opponent. These victors succeeded in the year 2012, Alan Turing's centenary year. == Aftermath == The outcome of a bot that appears more human-like than a human player is possibly overstated, since in the tournament in which the bots succeeded, the average 'humanness' rating of the human players was only 41.4%. This showcases some limits of this Turing test, since the results demonstrate that human behaviour is more complicated and quantitative than was accounted for. In light of this, the BotPrize competition organizers will increase the difficulty in upcoming years with new challenges, forcing competitors to improve their bots. It is also believed that methods and techniques developed for the computer game bot Turing test will be useful in fields other than video games, such as virtual training environments and in improving Human–robot interaction. == Contrasts to the Turing test == The computer game bot Turing test differs from the traditional or generic Turing test in a number of ways: Unlike the traditional Turing test, for example the Chatterbot-style contest held annually by the Loebner Prize competition, the humans who played against the Computer Game Bots are not trying to convince judges they are the human; rather, they want to win the game (i.e., by achieving the highest kill score). Judges are not restricted to awarding only one participant in a match as the 'human' and the other as the 'non-human.' This emphasizes more qualitative rather than polarized findings. With regards to a successful video game bot, this is not to be confused with a claim that the bot is 'intelligent,' whereas a machine that 'passed' the Turing test would arguably have some evidence for its Chatterbot's 'intelligence.' The game Unreal Tournament 2004 was chosen for its commercial availability and its interface for creating bots, GameBots. This limitation on medium is a sharp contrast to the Turing test, which emphasizes a conversation, where possible questions are vastly more numerous than the set of possible actions available in any specific video game. The available information to the participants, humans and bots, is not equal. Humans interact through vision and sound, whereas bots interact with data and events. The judges cannot introduce new events (e.g., a lava pit) to aid in differentiating between human and bot, whereas in a Chatterbot designed system, judges may theoretically ask any question in any manner. The two participants and the judge take part in a three-way interaction, unlike, for example, the paired two-way interaction of the Loebner Prize Contest.

    Read more →
  • EM algorithm and GMM model

    EM algorithm and GMM model

    In statistics, EM (expectation maximization) algorithm handles latent variables, while GMM is the Gaussian mixture model. == Background == In the picture below, are shown the red blood cell hemoglobin concentration and the red blood cell volume data of two groups of people, the Anemia group and the control group (i.e. the group of people without Anemia). As expected, people with Anemia have lower red blood cell volume and lower red blood cell hemoglobin concentration than those without Anemia. x {\displaystyle x} is a random vector such as x := ( red blood cell volume , red blood cell hemoglobin concentration ) {\displaystyle x:={\big (}{\text{red blood cell volume}},{\text{red blood cell hemoglobin concentration}}{\big )}} , and from medical studies it is known that x {\displaystyle x} are normally distributed in each group, i.e. x ∼ N ( μ , Σ ) {\displaystyle x\sim {\mathcal {N}}(\mu ,\Sigma )} . z {\displaystyle z} is denoted as the group where x {\displaystyle x} belongs, with z i = 0 {\displaystyle z_{i}=0} when x i {\displaystyle x_{i}} belongs to the Anemia group and z i = 1 {\displaystyle z_{i}=1} when x i {\displaystyle x_{i}} belongs to the control group. Also z ∼ Categorical ⁡ ( k , ϕ ) {\displaystyle z\sim \operatorname {Categorical} (k,\phi )} where k = 2 {\displaystyle k=2} , ϕ j ≥ 0 , {\displaystyle \phi _{j}\geq 0,} and ∑ j = 1 k ϕ j = 1 {\displaystyle \sum _{j=1}^{k}\phi _{j}=1} . See Categorical distribution. The following procedure can be used to estimate ϕ , μ , Σ {\displaystyle \phi ,\mu ,\Sigma } . A maximum likelihood estimation can be applied: ℓ ( ϕ , μ , Σ ) = ∑ i = 1 m log ⁡ ( p ( x ( i ) ; ϕ , μ , Σ ) ) = ∑ i = 1 m log ⁡ ∑ z ( i ) = 1 k p ( x ( i ) ∣ z ( i ) ; μ , Σ ) p ( z ( i ) ; ϕ ) {\displaystyle \ell (\phi ,\mu ,\Sigma )=\sum _{i=1}^{m}\log(p(x^{(i)};\phi ,\mu ,\Sigma ))=\sum _{i=1}^{m}\log \sum _{z^{(i)}=1}^{k}p\left(x^{(i)}\mid z^{(i)};\mu ,\Sigma \right)p(z^{(i)};\phi )} As the z i {\displaystyle z_{i}} for each x i {\displaystyle x_{i}} are known, the log likelihood function can be simplified as below: ℓ ( ϕ , μ , Σ ) = ∑ i = 1 m log ⁡ p ( x ( i ) ∣ z ( i ) ; μ , Σ ) + log ⁡ p ( z ( i ) ; ϕ ) {\displaystyle \ell (\phi ,\mu ,\Sigma )=\sum _{i=1}^{m}\log p\left(x^{(i)}\mid z^{(i)};\mu ,\Sigma \right)+\log p\left(z^{(i)};\phi \right)} Now the likelihood function can be maximized by making partial derivative over μ , Σ , ϕ {\displaystyle \mu ,\Sigma ,\phi } , obtaining: ϕ j = 1 m ∑ i = 1 m 1 { z ( i ) = j } {\displaystyle \phi _{j}={\frac {1}{m}}\sum _{i=1}^{m}1\{z^{(i)}=j\}} μ j = ∑ i = 1 m 1 { z ( i ) = j } x ( i ) ∑ i = 1 m 1 { z ( i ) = j } {\displaystyle \mu _{j}={\frac {\sum _{i=1}^{m}1\{z^{(i)}=j\}x^{(i)}}{\sum _{i=1}^{m}1\left\{z^{(i)}=j\right\}}}} Σ j = ∑ i = 1 m 1 { z ( i ) = j } ( x ( i ) − μ j ) ( x ( i ) − μ j ) T ∑ i = 1 m 1 { z ( i ) = j } {\displaystyle \Sigma _{j}={\frac {\sum _{i=1}^{m}1\{z^{(i)}=j\}(x^{(i)}-\mu _{j})(x^{(i)}-\mu _{j})^{T}}{\sum _{i=1}^{m}1\{z^{(i)}=j\}}}} If z i {\displaystyle z_{i}} is known, the estimation of the parameters results to be quite simple with maximum likelihood estimation. But if z i {\displaystyle z_{i}} is unknown it is much more complicated. Being z {\displaystyle z} a latent variable (i.e. not observed), with unlabeled scenario, the expectation maximization algorithm is needed to estimate z {\displaystyle z} as well as other parameters. Generally, this problem is set as a GMM since the data in each group is normally distributed. In machine learning, the latent variable z {\displaystyle z} is considered as a latent pattern lying under the data, which the observer is not able to see very directly. x i {\displaystyle x_{i}} is the known data, while ϕ , μ , Σ {\displaystyle \phi ,\mu ,\Sigma } are the parameter of the model. With the EM algorithm, some underlying pattern z {\displaystyle z} in the data x i {\displaystyle x_{i}} can be found, along with the estimation of the parameters. The wide application of this circumstance in machine learning is what makes EM algorithm so important. == EM algorithm in GMM == The EM algorithm consists of two steps: the E-step and the M-step. Firstly, the model parameters and the z ( i ) {\displaystyle z^{(i)}} can be randomly initialized. In the E-step, the algorithm tries to guess the value of z ( i ) {\displaystyle z^{(i)}} based on the parameters, while in the M-step, the algorithm updates the value of the model parameters based on the guess of z ( i ) {\displaystyle z^{(i)}} of the E-step. These two steps are repeated until convergence is reached. The algorithm in GMM is: Repeat until convergence: 1. (E-step) For each i , j {\displaystyle i,j} , set w j ( i ) := p ( z ( i ) = j | x ( i ) ; ϕ , μ , Σ ) {\displaystyle w_{j}^{(i)}:=p\left(z^{(i)}=j|x^{(i)};\phi ,\mu ,\Sigma \right)} 2. (M-step) Update the parameters ϕ j := 1 m ∑ i = 1 m w j ( i ) {\displaystyle \phi _{j}:={\frac {1}{m}}\sum _{i=1}^{m}w_{j}^{(i)}} μ j := ∑ i = 1 m w j ( i ) x ( i ) ∑ i = 1 m w j ( i ) {\displaystyle \mu _{j}:={\frac {\sum _{i=1}^{m}w_{j}^{(i)}x^{(i)}}{\sum _{i=1}^{m}w_{j}^{(i)}}}} Σ j := ∑ i = 1 m w j ( i ) ( x ( i ) − μ j ) ( x ( i ) − μ j ) T ∑ i = 1 m w j ( i ) {\displaystyle \Sigma _{j}:={\frac {\sum _{i=1}^{m}w_{j}^{(i)}\left(x^{(i)}-\mu _{j}\right)\left(x^{(i)}-\mu _{j}\right)^{T}}{\sum _{i=1}^{m}w_{j}^{(i)}}}} With Bayes' rule, the following result is obtained by the E-step: p ( z ( i ) = j | x ( i ) ; ϕ , μ , Σ ) = p ( x ( i ) | z ( i ) = j ; μ , Σ ) p ( z ( i ) = j ; ϕ ) ∑ l = 1 k p ( x ( i ) | z ( i ) = l ; μ , Σ ) p ( z ( i ) = l ; ϕ ) {\displaystyle p\left(z^{(i)}=j|x^{(i)};\phi ,\mu ,\Sigma \right)={\frac {p\left(x^{(i)}|z^{(i)}=j;\mu ,\Sigma \right)p\left(z^{(i)}=j;\phi \right)}{\sum _{l=1}^{k}p\left(x^{(i)}|z^{(i)}=l;\mu ,\Sigma \right)p\left(z^{(i)}=l;\phi \right)}}} According to GMM setting, these following formulas are obtained: p ( x ( i ) | z ( i ) = j ; μ , Σ ) = 1 ( 2 π ) n / 2 | Σ j | 1 / 2 exp ⁡ ( − 1 2 ( x ( i ) − μ j ) T Σ j − 1 ( x ( i ) − μ j ) ) {\displaystyle p\left(x^{(i)}|z^{(i)}=j;\mu ,\Sigma \right)={\frac {1}{(2\pi )^{n/2}\left|\Sigma _{j}\right|^{1/2}}}\exp \left(-{\frac {1}{2}}\left(x^{(i)}-\mu _{j}\right)^{T}\Sigma _{j}^{-1}\left(x^{(i)}-\mu _{j}\right)\right)} p ( z ( i ) = j ; ϕ ) = ϕ j {\displaystyle p\left(z^{(i)}=j;\phi \right)=\phi _{j}} In this way, a switch between the E-step and the M-step is possible, according to the randomly initialized parameters.

    Read more →
  • Sentential decision diagram

    Sentential decision diagram

    In artificial intelligence, a sentential decision diagram (SDD) is a type of knowledge representation used in knowledge compilation to represent Boolean functions. SDDs can be viewed as a generalization of the influential ordered binary decision diagram (OBDD) representation, by allowing decisions on multiple variables at once. Like OBDDs, SDDs allow for tractable Boolean operations, while being exponentially more succinct. For this reason, they have become an important representation in knowledge compilation. == Properties == SDDs are defined with respect to a generalization of variable ordering known as a variable tree (vtree). Provided that they satisfy additional properties known as compression and trimming (which are analogous to ROBDDs), SDDs are a canonical representation of Boolean functions; that is, they are unique given a vtree. Like OBDDs, they allow for operations such as conjunction, disjunction and negation to be computed directly on the representation in polynomial time, while being potentially more compact. They also allow for polynomial-time model counting. SDDs are known to be exponentially more succinct than OBDDs. == Applications == SDDs are used as a compilation target for probabilistic logic programs by the ProbLog 2 system since they support tractable (weighted) model counting as well as tractable negation, conjunction and disjunction while being more succinct than BDDs. SDDs have also been extended to model probability distributions, in which context they are known as probabilistic sentential decision diagrams (PSDD).

    Read more →
  • Writesonic

    Writesonic

    Writesonic is an AI visibility and generative engine optimization (GEO) platform used by enterprises, digital agencies, direct-to-consumer (D2C) companies, and fast-growing brands to understand and improve how they are represented in AI-generated search and answer systems. The platform analyzes how brands appear in AI answers, compares their visibility and citations against competitors, and provides tools to create and optimize on-site content and secure mentions across third-party sources, discussion forums, and user-generated platforms that influence AI outputs. == History == Writesonic was founded by Samanyou Garg in October 2020 in San Francisco, California. The company initially operated as Magicflow before adopting its current name. In its seed round, the company raised $2.5 million from investors including Y-Combinator, HOF Capital, and Soma Capital. The company began with AI-powered content generation tools. In 2023, it expanded into AI-enhanced search engine optimization. In 2024, the company launched an AI agent specifically designed for SEO tasks, with integrations to platforms including Ahrefs, Google Keyword Planner, Keywords Everywhere, and Google Search Console. This was among the first specialized AI agents developed for SEO automation. Around the same time, Writesonic expanded its product line into Generative engine optimization (GEO), developing tools to analyze and improve how brands are represented in AI-generated search and answer environments. However, it is currently being challenged in the market with competitors such as Profound (known for their dashboards) and Meridian (known for their execution). == Technology and features == In 2024, the company introduced an artificial intelligence agent designed to automate search engine optimization (SEO) tasks. The agent integrates with platforms such as Ahrefs, Google Keyword Planner, Keywords Everywhere, and Google Search Console to conduct technical audits, perform keyword research, carry out competitive analysis, and assist in strategy development. It is capable of identifying content gaps, suggesting optimization measures, and generating SEO strategies using real-time data from the integrated platforms. The platform also includes features for content strategy, optimization, and management. It makes use of large language models such as GPT-5, Claude Opus 4.1, and Claude Sonnet 4.5, in combination with proprietary workflows for fact-checking, internal linking, and content structure optimization.

    Read more →
  • Mistral AI

    Mistral AI

    Mistral AI SAS (French: [mistʁal]) is a French artificial intelligence (AI) company, headquartered in Paris. Founded in 2023, it has open-weight large language models (LLMs), with both open-source and proprietary AI models. As of 2025 the company has a valuation of more than US$14 billion. == Namesake == The company is named after the mistral, a powerful, cold wind in southern France, a term which originates from the Occitan language. == History == Mistral AI was established in April 2023 by three French AI researchers, Arthur Mensch, Guillaume Lample and Timothée Lacroix. Mensch, an expert in advanced AI systems, is a former employee of Google DeepMind; Lample and Lacroix, meanwhile, are large-scale AI models specialists who had worked for Meta Platforms. The trio originally met during their studies at École Polytechnique. == Company operation == === Funding === In June 2023, the start-up carried out a first fundraising of €105 million ($117 million) with investors including the American fund Lightspeed Venture Partners, Eric Schmidt, Xavier Niel and JCDecaux. The valuation was then estimated by the Financial Times at €240 million ($267 million). On 10 December 2023, Mistral AI announced that it had raised €385 million ($428 million) as part of its second fundraising. This round of financing involves the Californian fund Andreessen Horowitz, BNP Paribas and the software publisher Salesforce. It was valued at over €2 billion. On 26 February 2024, Microsoft announced an investment of $16 million in Mistral AI. On 16 April 2024, reporting revealed that Mistral was in talks to raise €500 million, a deal that would more than double its current valuation to at least €5 billion. In June 2024, Mistral AI secured a €600 million ($645 million) funding round, increasing its valuation to €5.8 billion ($6.2 billion). Based on valuation, as of June 2024, the company was ranked fourth globally in the AI industry, and first outside the San Francisco Bay Area. In April 2025, Mistral AI announced a €100 million partnership with the shipping company CMA CGM. In August 2025, the Financial Times reported that Mistral was in talks to raise $1 billion at a $10 billion valuation. In September 2025, Bloomberg announced that Mistral AI has secured a €2 billion investment valuing it at €12 billion ($14 billion). This comes after $1.5 billion investment from Dutch company ASML, which owns 11% of Mistral. In February 2026, Mistral acquired Koyeb, a Paris-based AI startup. Later that month, Mistral AI announced a multi-year strategic partnership with Accenture to help enterprises deploy sovereign AI solutions at scale. In March 2026 Mistral raised $830 million in order to build new datacenters near Paris and in Sweden. == Services == On 19 November, 2024, the company announced updates for Le Chat (pronounced /lə ʃa/ in French, like the French word for "cat"). It added the ability to create images, using Black Forest Labs' Flux Pro model. On 6 February 2025, Mistral AI released Le Chat on iOS and Android mobile devices. Mistral AI also introduced a Pro subscription tier, priced at $14.99 per month, which provides access to more advanced models, unlimited messaging, and web browsing. At the end of May 2026, Le Chat was renamed Vibe, and new features were introduced at the same time. == Models == The following table lists the main model versions of Mistral, describing the significant changes included with each version: === Mistral 7B === Mistral AI claimed in the Mistral 7B release blog post that the model outperforms LLaMA 2 13B on all benchmarks tested, and is on par with LLaMA 34B on many benchmarks tested, despite having only 7 billion parameters, a small size compared to its competitors. === Mixtral 8x7B === Mistral AI claimed in 2023 that its model beat both LLaMA 70B, and GPT-3.5 in most benchmarks. In March 2024, research conducted by Patronus AI comparing performance of LLMs on a 100-question test with prompts to generate text from books protected under U.S. copyright law found that OpenAI's GPT-4, Mixtral, Meta AI's LLaMA-2, and Anthropic's Claude 2 generated copyrighted text verbatim in 44%, 22%, 10%, and 8% of responses respectively. === Mistral Small 3.1 === On 17 March 2025, Mistral released Mistral Small 3.1 as a smaller, more efficient model. === Mistral Medium 3 === On 7 May 2025, Mistral AI released Mistral Medium 3. === Magistral Small and Magistral Medium === On 10 June 2025, Mistral AI released their first AI reasoning models: Magistral Small (open-source), and Magistral Medium, models which are purported to have chain-of-thought capabilities. === Mistral Large 3 and Ministral 3 === On 2 December 2025, Mistral AI released Mistral Large 3, a sparse, mixture-of-experts model with 41 billion active parameters and 675 billion total parameters, and Ministral 3, three small, dense models with 3 billion, 7 billion and 14 billion parameters. === Devstral 2 and Devstral Small 2 === On 10 December 2025, Mistral AI released Devstral 2 and Devstral Small 2. Devstral Small 2, a 24B parameter model is claimed to achieve better performance at coding than Qwen 3 Coder Flash model which is a 30B parameter model.

    Read more →
  • Microsoft Forms

    Microsoft Forms

    Microsoft Forms (formerly Office 365 Forms) is an online survey creator, part of Microsoft 365. == Usage == Forms allows users to create surveys and quizzes with automatic marking. The data can be exported to Microsoft Excel, Power BI dashboards and viewed live using the Present feature. == Phishing and fraud == Due to a wave of phishing attacks utilizing Microsoft 365 in early 2021, Microsoft uses algorithms to automatically detect and block phishing attempts with Microsoft Forms. Also, Microsoft advises Forms users not to submit personal information, such as passwords, in a form or survey. It also place a similar advisory underneath the “Submit” button in every form created with Forms, warning users not to give out their password.

    Read more →
  • IRCF360

    IRCF360

    Infrared Control Freak 360 (IRCF360) is a 360-degree proximity sensor and a motion sensing devices, developed by ROBOTmaker. The sensor is in BETA developers release as a low cost (software configurable) sensor for use within research, technical and hobby projects. == Overview == The 360-degree sensor was originally designed as a short range micro robot proximity sensor and mainly intended for Swarm robotics, Ant robotics, Swarm intelligence, autonomous Qaudcopter, Drone, UAV, multi-robot simulations e.g. Jasmine Project where 360 proximity sensing is required to avoid collision with other robots and for simple IR inter-robot communications. To overcome certain limitation with Infra-red (IR) proximity sensing (e.g. detection of dark surfaces) the sensing module includes ambient light sensing and basic tactile sensing functionality during forward movement sensing/probing providing photovore and photophobe robot swarm behaviours and characteristics. A project named Sensorium Project was started aimed at broadening the Sensors audience beyond its typical robot sensor usage. To demonstrate the sensor's functionality, opensource Java based Integrated Development Environments (IDE) are used, such as Arduino and Processing (programming language).

    Read more →
  • Allen's interval algebra

    Allen's interval algebra

    Allen's interval algebra is a calculus for temporal reasoning that was introduced by James F. Allen in 1983. The calculus defines possible relations between time intervals and provides a composition table that can be used as a basis for reasoning about temporal descriptions of events. == Formal description == === Relations === The following 13 base relations capture the possible relations between two intervals. To see that the 13 relations are exhaustive, note that each point of X {\displaystyle X} can be at 5 possible locations relative to Y {\displaystyle Y} : before, at the start, within, at the end, after. These give 5 + 4 + 3 + 2 + 1 = 15 {\displaystyle 5+4+3+2+1=15} possible relative positions for the start and the end of X {\displaystyle X} . Of these, we cannot have X 0 = X 1 = Y 0 {\displaystyle X_{0}=X_{1}=Y_{0}} since X 0 < X 1 {\displaystyle X_{0} Read more →