AI Chat List

AI Chat List — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • RagTime

    RagTime

    RagTime is a frame-oriented business publishing software which combines word processing, spreadsheets, simple drawings, image processing, and charts, in a single document/program, integrated software. It is often used to create forms, reports, documentation, desktop publishing, and in office environments. Typical users are business clients, educational institutions, administrations, architects, and also private users. Ragtime includes the following modules: Page layout (forms, templates etc.) Word processing Image processing Spreadsheets, similar to Microsoft Excel Formulas and functions which can be used throughout, in text, graphics, and spreadsheets Charts in different types of diagrams Drawings in vector graphics including lines, polygons, Bézier curves and more Slide show (presentation of RagTime documents) Audio/video Buttons (pop-up menus, switches, and more) that can be used within RagTime documents Import/export of various file formats Support of the AppleScript scripting language available system-wide under macOS == Principle == RagTime differs from most other comparable programs or software packages in its strict frame-oriented design: all content is contained within frames on each page. The content can have a fixed position within its frame or, if it is text or a spreadsheet, flow into another frame that is connected to the first frame via a so-called “pipeline”. RagTime has no different document types for different types of data; all content is stored in a single compound document type. Thus, a RagTime document not only can contain multiple pages, but also multiple layouts within the same document; e.g. spreadsheets in addition to text and images. The RagTime filename extension is .rtd (RagTime document); for templates the extension is .rtt (RagTime template). The current version is RagTime 6.6.5. It is available for OS X (10.6-10.14) and Windows (XP/Vista/7/8/10). == Extensions == FileTime – allows accessing “FileMaker Pro” databases from RagTime documents under OS X RagTime Connect – ODBC database connection for RagTime 6 (Mac and Windows) Johannes – print extension for the simple creation of stapled or folded brochures, booklets etc. PowerFunctions – additional functions for a more effective creation of intelligent documents for exchanging data and for use in mixed Mac/Windows environments MetaFormula – SYLK-based extension that allows calculating text as formula == History == RagTime has been developed since 1985 for the Macintosh – originally named MacFrame – and was published in 1986. When released, it already had the present name, which was chosen following the then-available software package Lotus Jazz. In the European Macintosh market, RagTime quickly gained a prominent position that continues to this day, even though the market share has decreased. Despite repeated attempts, the program could not gain acceptance in the North American market due to its high cost ($395 in 1990). The North American sales office closed in 1991, shortly after Claris Corporation released ClarisWorks which duplicated much of the functionality of RagTime for a lower price. After the manufacturer – first Brüning & Everth, followed by B&E Software and today RagTime.de Development – had focused on the Macintosh only for a very long time, it also released a Windows version, RagTime 5.0, in 1999. However, the program could not assume great significance against established competitors, especially Microsoft Office. Until mid-2006 RagTime was, in addition to the commercial version, also available as a free version (RagTime Solo) for personal use. RagTime Solo included the same features and performance (except for spelling and Syllabification) dictionaries), but was not allowed for use in commercial environments. In other languages RagTime Solo was distributed as RagTime Privat. In a press release from July 5, 2006, RagTime announced the discontinuation of RagTime Solo: “… the RagTime Solo license conditions were often misinterpreted or deliberately flouted. Therefore we discontinued RagTime Solo, there will be no private version of RagTime 6 anymore.” After a successful start of the RagTime 6.0 software, sales edged significantly lower in the following years. Disagreements arose among the shareholders about the continuation of the company, which filed for bankruptcy in July 2007. As a result, the rights to RagTime were taken over by the newly established company RagTime.de Development GmbH, which was responsible for the development. The sales partner RagTime.de Sales GmbH distributed the RagTime products until October 2015. Today RagTime.de Development GmbH is also responsible for sales. The last level of development is the extensively revamped version RagTime 6.6 of 8 October 2015, which also includes new OS X features (e.g. high-resolution “Retina” displays) and supports Windows 10. == Programming == RagTime 1-3 were developed in Pascal, since version 4 the development is completely coded in C++. External programming and automation can be implemented via AppleScript on a Mac, and via OLE/COM-API (e.g. Visual Basic) under Windows. On a Mac, RagTime provides a comprehensive AppleScript library, for the automation of almost any task, from automatic document creation to the export of PDF documents. RagTime also supports “recordings” by use of the “AppleScript Editor”, which allows recording the interactive RagTime operation as an AppleScript program sequence. AppleScripts can be saved in the RagTime document and called via menu or shortcut keys. On Windows, RagTime (since version 6) disposes over an OLE/COM API, which allows automating many RagTime components via external programming. For that purpose there is a type library that installs the available RagTime OLE/COM object catalogue. Programming can be realized in all programming languages supported by Microsoft.

    Read more →
  • Multiple correspondence analysis

    Multiple correspondence analysis

    In statistics, multiple correspondence analysis (MCA) is a data analysis technique for nominal categorical data, used to detect and represent underlying structures in a data set. It does this by representing data as points in a low-dimensional Euclidean space. The procedure thus appears to be the counterpart of principal component analysis for categorical data. MCA can be viewed as an extension of simple correspondence analysis (CA) in that it is applicable to a large set of categorical variables. == As an extension of correspondence analysis == MCA is performed by applying the CA algorithm to either an indicator matrix (also called complete disjunctive table – CDT) or a Burt table formed from these variables. An indicator matrix is an individuals × variables matrix, where the rows represent individuals and the columns are dummy variables representing categories of the variables. Analyzing the indicator matrix allows the direct representation of individuals as points in geometric space. The Burt table is the symmetric matrix of all two-way cross-tabulations between the categorical variables, and has an analogy to the covariance matrix of continuous variables. Analyzing the Burt table is a more natural generalization of simple correspondence analysis, and individuals or the means of groups of individuals can be added as supplementary points to the graphical display. In the indicator matrix approach, associations between variables are uncovered by calculating the chi-square distance between different categories of the variables and between the individuals (or respondents). These associations are then represented graphically as "maps", which eases the interpretation of the structures in the data. Oppositions between rows and columns are then maximized, in order to uncover the underlying dimensions best able to describe the central oppositions in the data. As in factor analysis or principal component analysis, the first axis is the most important dimension, the second axis the second most important, and so on, in terms of the amount of variance accounted for. The number of axes to be retained for analysis is determined by calculating modified eigenvalues. == Details == Since MCA is adapted to draw statistical conclusions from categorical variables (such as multiple choice questions), the first thing one needs to do is to transform quantitative data (such as age, size, weight, day time, etc) into categories (using for instance statistical quantiles). When the dataset is completely represented as categorical variables, one is able to build the corresponding so-called complete disjunctive table. We denote this table X {\displaystyle X} . If I {\displaystyle I} persons answered a survey with J {\displaystyle J} multiple choices questions with 4 answers each, X {\displaystyle X} will have I {\displaystyle I} rows and 4 J {\displaystyle 4J} columns. More theoretically, assume X {\displaystyle X} is the completely disjunctive table of I {\displaystyle I} observations of K {\displaystyle K} categorical variables. Assume also that the k {\displaystyle k} -th variable have J k {\displaystyle J_{k}} different levels (categories) and set J = ∑ k = 1 K J k {\displaystyle J=\sum _{k=1}^{K}J_{k}} . The table X {\displaystyle X} is then a I × J {\displaystyle I\times J} matrix with all coefficient being 0 {\displaystyle 0} or 1 {\displaystyle 1} . Set the sum of all entries of X {\displaystyle X} to be N {\displaystyle N} and introduce Z = X / N {\displaystyle Z=X/N} . In an MCA, there are also two special vectors: first r {\displaystyle r} , that contains the sums along the rows of Z {\displaystyle Z} , and c {\displaystyle c} , that contains the sums along the columns of Z {\displaystyle Z} . Note D r = diag ( r ) {\displaystyle D_{r}={\text{diag}}(r)} and D c = diag ( c ) {\displaystyle D_{c}={\text{diag}}(c)} , the diagonal matrices containing r {\displaystyle r} and c {\displaystyle c} respectively as diagonal. With these notations, computing an MCA consists essentially in the singular value decomposition of the matrix: M = D r − 1 / 2 ( Z − r c T ) D c − 1 / 2 {\displaystyle M=D_{r}^{-1/2}(Z-rc^{T})D_{c}^{-1/2}} The decomposition of M {\displaystyle M} gives you P {\displaystyle P} , Δ {\displaystyle \Delta } and Q {\displaystyle Q} such that M = P Δ Q T {\displaystyle M=P\Delta Q^{T}} with P, Q two unitary matrices and Δ {\displaystyle \Delta } is the generalized diagonal matrix of the singular values (with the same shape as Z {\displaystyle Z} ). The positive coefficients of Δ 2 {\displaystyle \Delta ^{2}} are the eigenvalues of Z {\displaystyle Z} . The interest of MCA comes from the way observations (rows) and variables (columns) in Z {\displaystyle Z} can be decomposed. This decomposition is called a factor decomposition. The coordinates of the observations in the factor space are given by F = D r − 1 / 2 P Δ {\displaystyle F=D_{r}^{-1/2}P\Delta } The i {\displaystyle i} -th rows of F {\displaystyle F} represent the i {\displaystyle i} -th observation in the factor space. And similarly, the coordinates of the variables (in the same factor space as observations!) are given by G = D c − 1 / 2 Q Δ {\displaystyle G=D_{c}^{-1/2}Q\Delta } == Recent works and extensions == In recent years, several students of Jean-Paul Benzécri have refined MCA and incorporated it into a more general framework of data analysis known as geometric data analysis. This involves the development of direct connections between simple correspondence analysis, principal component analysis and MCA with a form of cluster analysis known as Euclidean classification. Two extensions have great practical use. It is possible to include, as active elements in the MCA, several quantitative variables. This extension is called factor analysis of mixed data (see below). Very often, in questionnaires, the questions are structured in several issues. In the statistical analysis it is necessary to take into account this structure. This is the aim of multiple factor analysis which balances the different issues (i.e. the different groups of variables) within a global analysis and provides, beyond the classical results of factorial analysis (mainly graphics of individuals and of categories), several results (indicators and graphics) specific of the group structure. == Application fields == In the social sciences, MCA is arguably best known for its application by Pierre Bourdieu, notably in his books La Distinction, Homo Academicus and The State Nobility. Bourdieu argued that there was an internal link between his vision of the social as spatial and relational --– captured by the notion of field, and the geometric properties of MCA. Sociologists following Bourdieu's work most often opt for the analysis of the indicator matrix, rather than the Burt table, largely because of the central importance accorded to the analysis of the 'cloud of individuals'. == Multiple correspondence analysis and principal component analysis == MCA can also be viewed as a PCA applied to the complete disjunctive table. To do this, the CDT must be transformed as follows. Let y i k {\displaystyle y_{ik}} denote the general term of the CDT. y i k {\displaystyle y_{ik}} is equal to 1 if individual i {\displaystyle i} possesses the category k {\displaystyle k} and 0 if not. Let denote p k {\displaystyle p_{k}} , the proportion of individuals possessing the category k {\displaystyle k} . The transformed CDT (TCDT) has as general term: x i k = y i k / p k − 1 {\displaystyle x_{ik}=y_{ik}/p_{k}-1} The unstandardized PCA applied to TCDT, the column k {\displaystyle k} having the weight p k {\displaystyle p_{k}} , leads to the results of MCA. This equivalence is fully explained in a book by Jérôme Pagès. It plays an important theoretical role because it opens the way to the simultaneous treatment of quantitative and qualitative variables. Two methods simultaneously analyze these two types of variables: factor analysis of mixed data and, when the active variables are partitioned in several groups: multiple factor analysis. This equivalence does not mean that MCA is a particular case of PCA as it is not a particular case of CA. It only means that these methods are closely linked to one another, as they belong to the same family: the factorial methods. == Software == There are numerous software of data analysis that include MCA, such as STATA and SPSS. The R package FactoMineR also features MCA. This software is related to a book describing the basic methods for performing MCA . There is also a Python package for [1] which works with numpy array matrices; the package has not been implemented yet for Spark dataframes.

    Read more →
  • VIGRA

    VIGRA

    VIGRA is the abbreviation for "Vision with Generic Algorithms". It is a free open-source computer vision library which focuses on customizable algorithms and data structures. VIGRA component can be easily adapted to specific needs of target application without compromising execution speed, by using template techniques similar to those in the C++ Standard Template Library. == Features == VIGRA is cross-platform, with working builds on Microsoft Windows, Mac OS X, Linux, and OpenBSD. Since version 1.7.1, VIGRA provides Python bindings based on numpy framework. == History == VIGRA was originally designed and implemented by scientists at University of Hamburg faculty of computer science; its core maintainers are now working at Heidelberg Collaboratory for Image Processing (HCI) University of Heidelberg. In the meantime, many developers have contributed to the project. == Application == CellCognition and ilastik uses VIGRA computer vision library. OpenOffice.org uses VIGRA as part of its headless software rendering backend; LibreOffice does so until version 5.2.

    Read more →
  • Lattice Miner

    Lattice Miner

    Lattice Miner is a formal concept analysis software tool for the construction, visualization and manipulation of concept lattices. It allows the generation of formal concepts and association rules as well as the transformation of formal contexts via apposition, subposition, reduction and object/attribute generalization, and the manipulation of concept lattices via approximation, projection and selection. Lattice Miner allows also the drawing of nested line diagrams. == Introduction == Formal concept analysis (FCA) is a branch of applied mathematics based on the formalization of concept and concept hierarchy and mainly used as a framework for conceptual clustering and rule mining. Over the last two decades, a collection of tools have emerged to help FCA users visualize and analyze concept lattices. They range from the earliest DOS-based implementations (e.g., ConImp and GLAD) to more recent implementations in Java like ToscanaJ, Galicia, ConExp and Coron. A main issue in the development of FCA tools is to visualize large concept lattices and provide efficient mechanisms to highlight patterns (e.g., concepts, associations) that could be relevant to the user. The initial objective of the FCA tool called Lattice Miner was to focus on visualization mechanisms for the representation of concept lattices, including nested line diagrams. Later on, many other interesting features were integrated into the tool. == Functional architecture of Lattice Miner == Lattice Miner is a Java-based platform whose functions are articulated around a core. The Lattice Miner core provides all low-level operations and structures for the representation and manipulation of contexts, lattices and association rules. Mainly, the core of Lattice Miner consists of three modules: context, concept and association rule modules. The user interface offers a context editor and concept lattice manipulator to assist the user in a set of tasks. The architecture of Lattice Miner is open and modular enough to allow the integration of new features and facilities in each one of its components. === Context module === The context module offers all the basic operations and structures to manipulate binary and valued contexts as well as context decomposition to produce nested line diagrams. Basic context operations include apposition, subposition, generalization, clarification, reduction as well as the complementary context computation. The module provides also the arrow relations (for context reduction and decomposition) [2]. The tool has an input LMB format and recognizes the binary format SLF found in Galicia and the format CEX produced by ConExp. === Concept module === The main function of the concept module is to generate the concepts of the current binary context and construct the corresponding lattice and nested structure (see Figures 2 and 3). It provides the user with basic operators such as projection, selection, and exact search as well as advanced features like pair approximation. Some known algorithms are included in this module such as Bordat’s procedure, Godin’s algorithm and NextClosure algorithm. The approximation feature implemented in Lattice Miner is based on the following idea: given a pair (X,Y) where X ⊆ G, and Y ⊆ M, is there a set of formal concepts (Ai,Bi) which are “close to” (X,Y)? To answer this question, The tool starts to identify the type of couple that the pair (X,Y) represents. It can be a formal concept, a protoconcept, a semiconcept or a preconcept. In the last case, the approximation is given by the interval [(X",X′),(Y′,Y")] and highlighted in the line diagram. === Association rule module === This module includes procedures for computing the (stem) Guigues–Duquenne base using NextClosure algorithm [3], as well as the generic and informative bases. Implications with negation can be obtained using the apposition of a context and its complementary. This module embeds also procedures for the computation of a non-redundant family C of implications and the closure of a set Y of attributes for the given implication set C. === User interface === The initial objective of Lattice Miner was to focus on lattice drawing and visualization either as a flat or nested structure by taking into account the cognitive process of human beings and known principles for lattice drawing (e.g., reducing the number of edge intersections, ensuring diagram symmetry). Some well-known visualization techniques were implemented such as focus & context and fisheye view. The basic idea behind focus & context visualization paradigm is to allow a viewer to see key (important) objects in full detail in the foreground (focus) while at the same time an overview of all the surrounding information (context) remains available in the background. Lattice Miner translates the focus & context paradigm into clear and blurred elements while the size of nodes and the intensity of their color were used to indicate their importance. Various forms of highlighting, labelling and animation are also provided. In order to better handle the display of large lattices, nested line diagrams are offered in the tool. Figure 3 shows the third level of the nested line diagram corresponding to the binary context of Figure 1 where three levels of nesting are defined. Each one of the inner nodes of this diagram represents a combination of attributes from the previous two (outer) levels. Real inner concepts (see the node on the left hand-side of the diagram) are identified by colored nodes while void elements are in grey color. Each node of levels 1 and 2 can be expanded to exhibit its internal line diagram. Both flat and nested diagrams can be saved as an image. Simple (flat) lattices can also be saved as an XML format file.

    Read more →
  • Foundry VTT

    Foundry VTT

    Foundry Virtual Tabletop, commonly shortened to Foundry VTT or FVTT, is a commercial, self-hosted virtual tabletop application for role-playing games. It provides a stage for visualizing the game environment and tools allowing the game master and players to organize and track statistics and notes. The software is highly modular and depends on the community-maintained ecosystem of add-on modules that modify the software's behavior and implement different game systems. Perpetual licenses, which include updates, are offered for a one-time fee. == Features == Foundry Virtual Tabletop is a highly modular Node.js web application that is run locally by the Gamemaster or hosted on a remote server. Players connect to their gamemaster's Foundry VTT instance over the network using their web browser. It is system-agnostic in that its core feature-set is not restricted to a specific game system. Systems, specific features and game content are implemented as add-on modules, which can be individually downloaded from a public repository. The module repository contains paid, official content, as well as freely available community-made modules that enhance functionality of the software. As of May 2025, 350 individual game systems are implemented as modules. Individual settings created by the Game Master are termed Worlds in the interface and contain the list of modules that should be loaded as well as world-specific content, which can be added by the gamemaster. This content is grouped into Scenes, Actors, Items and Journals. Battle and world maps are created as Scenes, which contain the backdrop and data on placement of walls, light sources and other entities. Tokens representing Actors, which are player characters, vehicles or NPCs, can be placed on these Scenes to be moved by the user that owns them. Other entities that interact or integrate with actors are termed Items; these can be objects, but also game system-specific concepts such as character classes. Journals are text documents that can link to other entities present in the World or modules. Viewing and editing permissions can be set individually for each entity. The software features a custom lighting engine that determines visibility of certain areas on each battle map depending on the position of players' characters, also revealing areas covered by fog of war. It also contains tools for map creation and comes with a small asset library. == History == Foundry Gaming LLC founder Andrew Clayton, commonly known under his online nickname Atropos, began development of Foundry VTT in 2018 for personal use after becoming dissatisfied with the feature set and business models of other virtual tabletops. Foundry VTT was initially developed for Linux, which remains its primary platform, with support for other platforms having been developed later. Foundry Gaming LLC was incorporated in Spokane, Washington on October 9, 2018, with the software remaining in private beta-testing until May 2020, when it was publicly released. In November 2020, Cubicle 7 partnered with Foundry to bring official content modules for its game system Warhammer Fantasy Roleplay to Foundry VTT. Later, in 2025, Clayton would state that this first major publisher deal was of significant importance to Foundry VTT's growth and credits the community developers of the WFRP system module for making it possible in the first place. In November 2023, Paizo partnered with Foundry to bring official content modules for Pathfinder Roleplaying Game to Foundry VTT. In January 2024, Foundry publicly announced its partnership with Wizards of the Coast in bringing official Dungeons & Dragons content to Foundry VTT, with the first official module, Phandelver and Below: The Shattered Obelisk, having been released in February 2024. == Development == As of 2023, the Foundry VTT software itself is being developed and managed by a team of 9 people, while a content team of 12 people is working with partnered publishers to compile content into downloadable modules. The content team also develops in-house content published by Foundry Gaming LLC. Stated goals are to create a virtual tabletop software that offers a one-time purchase and content ownership, make use of modern web technologies, and provide a platform for developers to build upon. Clayton has stated that integration of Generative AI into Foundry VTT is not planned, citing ethical and legal concerns and calling its usage within the industry a "betrayal of the creative people who made the TTRPG industry what it is in the first place". == Reception == Foundry VTT is one of the most popular virtual tabletops for TTRPGs; in particular, as a self-hosted web-based VTT, it is known as a modern alternative to the software as a service Roll20. Wargamer named it one of the three "best virtual tabletops for D&D in 2023", noting its active community and high degree of technical complexity, which allows for customization not seen in other products at the cost of a much steeper learning curve. Comic Book Resources called it an "underrated gem" and "incredibly versatile" for similar reasons, while also praising its lighting engine and visual fidelity. As the previously mentioned outlets do, Foundry's modular ecosystem and technical implementation are often mentioned as good features, but also as a source of frustration for new users. In a video interview, Clayton acknowledges this issue and affirms that the development team intends to make usage of more technical features "friction-less" and will reduce module breakage between updates in the future.

    Read more →
  • Structured kNN

    Structured kNN

    Structured k-nearest neighbours (SkNN) is a machine learning algorithm that generalizes k-nearest neighbors (k-NN). k-NN supports binary classification, multiclass classification, and regression, whereas SkNN allows training of a classifier for general structured output. For instance, a data sample might be a natural language sentence, and the output could be an annotated parse tree. Training a classifier consists of showing many instances of ground truth sample-output pairs. After training, the SkNN model is able to predict the corresponding output for new, unseen sample instances; that is, given a natural language sentence, the classifier can produce the most likely parse tree. == Training == As a training set, SkNN accepts sequences of elements with class labels. The type of element does not matter; the only requirement is a defined metric function that gives a distance between each pair of elements of a set. SkNN is based on idea of creating a graph, with each node representing a class label. There is an edge between a pair of nodes if there is a sequence of two elements in the training set with corresponding classes. The first step of SkNN training is the construction of such a graph from training sequences. There are two special nodes in the graph corresponding to sentence beginnings and ends: if a sequence starts with class C, the edge between node START and node C should be created. Like regular k-NN, the second part of SkNN training consists of storing the elements of a training sequence in a certain way. Each element of the training sequences is stored in the node related to the class of the previous element in the sequence. Every first element is stored in the START node. == Inference == Labelling input sequences by SkNN consists of finding the sequence of transitions in the graph, starting from node START. Each transition corresponds to a single element of the input sequence. As a result, the label of each element is determined as the target node label of the transition. The cost of the path is defined as the sum of all transitions, with the cost of transition from node A to node B being the distance from the current input sequence element to the nearest element of class B, stored in node A. Determining an optimal path may be performed using a modified Viterbi algorithm (where the sum of the distances is minimized, unlike the original algorithm which maximizes the product of probabilities).

    Read more →
  • Evolutionary attractor

    Evolutionary attractor

    An evolutionary attractor is a point in an evolutionary space where a selection process will always drive trait values towards that point from the region around it. Because of the importance of evolution through natural selection, often such an evolutionary space will be defined by genetic or phenotypic traits, or possibly both. In this case the selection process will be a form of natural selection. The existence of an evolutionary attractor in a biological evolutionary space does not always imply that it can be reached from all points in that evolutionary space, nor does it identify what will happen when the evolutionary attractor is reached. While an evolutionary attractor may represent a point in evolutionary space that is resistant to further selection, such as an evolutionarily stable strategy, other possibilities are available. Because identification of an evolutionary attractor on its own does not describe everything about the evolutionary space in which it lies, this has led to interest in the evolutionary dynamics surrounding evolutionary attractors and in evolutionary spaces in general. (Theoretical biologists and mathematicians working in the area may prefer the terms adaptive dynamics or evolutionary invasion analysis to evolutionary dynamics.) These fields use differential equations which allows a more complete understanding of the dynamics in evolutionary spaces including the existence or otherwise of evolutionary attractors. Advances in the study of molecular evolution have also led to the identification of evolutionary attractors at a molecular level. Because biological evolutionary processes have been studied using evolutionary game theory, a technique inspired by game theory originally derived to address economic problems, not only can evolutionary attractors be found in biology but economists studying evolutionary economic models have also identified evolutionary attractors. Evolution in biology has also inspired evolutionary computation in computer science. Many algorithms in this field use a form of selection inspired by natural selection to generate results through evolutionary algorithms. This is therefore another area in which evolutionary attractors have been identified. == Evolutionary attractors in biology == It is not probably not surprising that biology is the field where most examples of evolutionary attractors have been identified, given the importance of evolution through natural selection. === Evolutionary attractors in adaptive landscapes === An evolutionary attractor is a point in genetic and/or phenotypic trait space, that evolution will always drive trait values towards via a selection process. The concept of an evolutionary attractor arose in population genetics following the origin of the adaptive landscape originally proposed by Sewall Wright in 1932. The height of a point in an adaptive landscape is a measure of evolutionary fitness. If a point in an adaptive landscape is a peak, then selection will always drive traits towards it and it will be an evolutionary attractor. While population genetics deals with discrete genetic traits, quantitative genetics extended such concepts to deal with continuous genetic traits, where the concept of evolutionary attractor is also valid. === Evolutionary attractors in evolutionary game models === Evolutionary game theory introduced into evolutionary biology concepts originally used in economics, with the advantage that evolution could be studied in relation to strategic choices made in animal conflicts. This is of particular interest because of the concept of the evolutionarily stable strategy or ESS, a strategy that once established is resistant to invasion by other strategies. ESSs will not always be evolutionary attractors, but if they are they will persist over evolutionary time. === Dynamics around evolutionary attractors in biology === Evolutionary attractors in biology do not exist in isolation. By definition they must exist in an evolutionary trait space where selection drives all traits towards them from a region immediately around them. That is, they must be convergence stable. Eshel (1983) modified the definition of an ESS by considering individually advantageous reduction from a majority deviation: he created the term continuous stability. A continuously stable ESS can be shown to be convergence stable, therefore it will act as an evolutionary attractor. But the nature of evolutionary trait spaces in biology means that it is not possible to guarantee that the region of convergence to the evolutionary attractor covers the whole of the trait space, nor that there is only one evolutionary attractor in a particular trait space. These issues have led to the emergence of the related fields of evolutionary dynamics, adaptive dynamics and evolutionary invasion analysis, all of which use differential equations to understand the dynamics in evolutionary trait spaces. Hence, if one or more evolutionary attractor exists in an evolutionary trait space, they provide techniques to understand the dynamics in that trait space around the evolutionary attractor. === Evolutionary attractors in an ecological context === Evolution in biology does not take place in single species in isolation. Ecological interaction of species leads to coevolution. Important examples of this are host-parasite or host-pathogen interaction, which can make both the dynamics around evolutionary attractors more complex, and the occurrence and number of evolutionary attractors more diverse. Evolutionary attractors have been identified in the analysis of evolutionary epidemiology of plant pathogens. In the above study working on plant populations the authors were able to identify evolutionary attractors using methods from adaptive dynamics. A model applied to the analysis of a maize (Zea mays L.) virus identified convergence stable equilibria through simulation modelling. A related model identified evolutionary attractors in the interaction of plants with fungal pathogens. === Evolutionary attractors in molecular genetics === As mentioned above much of the consideration of evolutionary attractors in biology has been through investigation of selection at a genetic or phenotypic level or both, in a single species or in coevolving species. Advances in the study of molecular genetics now allow the study of evolutionary attractors to be taken to a molecular genetic level. Wilson et. al (2019) studied the evolution of gene regulatory networks and identified the emergence of evolutionary attractors. == Evolutionary attractors in economics == Evolutionary game theory as applied in biology was inspired by game theory originally devised for applications in economics. Game theory remains an active field of research outside of biology, and thus it is not surprising that researchers in evolutionary economics use evolutionary game theory. Evolutionary attractors have been demonstrated by economists studying the evolutionary dynamics of market entry with market dynamics based on the replicator dynamics of biological evolutionary games. == Evolutionary attractors in computing == Evolutionary computation is a branch of computer science inspired by biological evolution. Many algorithms in evolutionary computation use a form of selection. Thus evolutionary attractors have been identified in computer science as well as in biology and economics. Evolutionary algorithms have generated evolutionary attractors, probably because of the similarity between adaptive hill-climbing in evolutionary heuristics and the adaptive landscape originated to explain evolution through natural selection.

    Read more →
  • Radial basis function

    Radial basis function

    In mathematics a radial basis function (RBF) is a real-valued function φ {\textstyle \varphi } whose value depends only on the distance between the input and some fixed point, either the origin, so that φ ( x ) = φ ^ ( ‖ x ‖ ) {\textstyle \varphi (\mathbf {x} )={\hat {\varphi }}(\left\|\mathbf {x} \right\|)} , or some other fixed point c {\textstyle \mathbf {c} } , called a center, so that φ ( x ) = φ ^ ( ‖ x − c ‖ ) {\textstyle \varphi (\mathbf {x} )={\hat {\varphi }}(\left\|\mathbf {x} -\mathbf {c} \right\|)} . Any function φ {\textstyle \varphi } that satisfies the property φ ( x ) = φ ^ ( ‖ x ‖ ) {\textstyle \varphi (\mathbf {x} )={\hat {\varphi }}(\left\|\mathbf {x} \right\|)} is a radial function. The distance is usually Euclidean distance, although other metrics are sometimes used. They are often used as a collection { φ k } k {\displaystyle \{\varphi _{k}\}_{k}} which forms a basis for some function space of interest, hence the name. Sums of radial basis functions are typically used to approximate given functions. This approximation process can also be interpreted as a simple kind of neural network; this was the context in which they were originally applied to machine learning, in work by David Broomhead and David Lowe in 1988, which stemmed from Michael J. D. Powell's seminal research from 1977. RBFs are also used as a kernel in support vector classification. The technique has proven effective and flexible enough that radial basis functions are now applied in a variety of engineering applications. == Definition == A radial function is a function φ : [ 0 , ∞ ) → R {\textstyle \varphi :[0,\infty )\to \mathbb {R} } . When paired with a norm ‖ ⋅ ‖ : V → [ 0 , ∞ ) {\textstyle \|\cdot \|:V\to [0,\infty )} on a vector space, a function of the form φ c = φ ( ‖ x − c ‖ ) {\textstyle \varphi _{\mathbf {c} }=\varphi (\|\mathbf {x} -\mathbf {c} \|)} is said to be a radial kernel centered at c ∈ V {\textstyle \mathbf {c} \in V} . A radial function and the associated radial kernels are said to be radial basis functions if, for any finite set of nodes { x k } k = 1 n ⊆ V {\displaystyle \{\mathbf {x} _{k}\}_{k=1}^{n}\subseteq V} , all of the following conditions are true: === Examples === Commonly used types of radial basis functions include (writing r = ‖ x − x i ‖ {\textstyle r=\left\|\mathbf {x} -\mathbf {x} _{i}\right\|} and using ε {\textstyle \varepsilon } to indicate a shape parameter that can be used to scale the input of the radial kernel): == Approximation == Radial basis functions are typically used to build up function approximations of the form where the approximating function y ( x ) {\textstyle y(\mathbf {x} )} is represented as a sum of N {\displaystyle N} radial basis functions, each associated with a different center x i {\textstyle \mathbf {x} _{i}} , and weighted by an appropriate coefficient w i . {\textstyle w_{i}.} The weights w i {\textstyle w_{i}} can be estimated using the matrix methods of linear least squares, because the approximating function is linear in the weights w i {\textstyle w_{i}} . Approximation schemes of this kind have been particularly used in time series prediction and control of nonlinear systems exhibiting sufficiently simple chaotic behaviour and 3D reconstruction in computer graphics (for example, hierarchical RBF and Pose Space Deformation). == RBF Network == The sum can also be interpreted as a rather simple single-layer type of artificial neural network called a radial basis function network, with the radial basis functions taking on the role of the activation functions of the network. It can be shown that any continuous function on a compact interval can in principle be interpolated with arbitrary accuracy by a sum of this form, if a sufficiently large number N {\textstyle N} of radial basis functions is used. The approximant y ( x ) {\textstyle y(\mathbf {x} )} is differentiable with respect to the weights w i {\textstyle w_{i}} . The weights could thus be learned using any of the standard iterative methods for neural networks. Using radial basis functions in this manner yields a reasonable interpolation approach provided that the fitting set has been chosen such that it covers the entire range systematically (equidistant data points are ideal). However, without a polynomial term that is orthogonal to the radial basis functions, estimates outside the fitting set tend to perform poorly. == RBFs for PDEs == Radial basis functions are used to approximate functions and so can be used to discretize and numerically solve Partial Differential Equations (PDEs). This was first done in 1990 by E. J. Kansa who developed the first RBF based numerical method. It is called the Kansa method and was used to solve the elliptic Poisson equation and the linear advection-diffusion equation. The function values at points x {\displaystyle \mathbf {x} } in the domain are approximated by the linear combination of RBFs: The derivatives are approximated as such: where N {\displaystyle N} are the number of points in the discretized domain, d {\displaystyle d} the dimension of the domain and λ {\displaystyle \lambda } the scalar coefficients that are unchanged by the differential operator. Different numerical methods based on Radial Basis Functions were developed thereafter. Some methods are the RBF-FD method, the RBF-QR method and the RBF-PUM method.

    Read more →
  • Computer security compromised by hardware failure

    Computer security compromised by hardware failure

    Computer security compromised by hardware failure is a branch of computer security applied to hardware. The objective of computer security includes protection of information and property from theft, corruption, or natural disaster, while allowing the information and property to remain accessible and productive to its intended users. Such secret information could be retrieved by different ways. This article focus on the retrieval of data thanks to misused hardware or hardware failure. Hardware could be misused or exploited to get secret data. This article collects main types of attack that can lead to data theft. Computer security can be compromised by devices, such as keyboards, monitors or printers (thanks to electromagnetic or acoustic emanation for example) or by components of the computer, such as the memory, the network card or the processor (thanks to time or temperature analysis for example). == Devices == === Monitor === The monitor is the main device used to access data on a computer. It has been shown that monitors radiate or reflect data on their environment, potentially giving attackers access to information displayed on the monitor. ==== Electromagnetic emanations ==== Video display units radiate: narrowband harmonics of the digital clock signals; broadband harmonics of the various 'random' digital signals such as the video signal. Known as compromising emanations or TEMPEST radiation, a code word for a U.S. government programme aimed at attacking the problem, the electromagnetic broadcast of data has been a significant concern in sensitive computer applications. Eavesdroppers can reconstruct video screen content from radio frequency emanations. Each (radiated) harmonic of the video signal shows a remarkable resemblance to a broadcast TV signal. It is therefore possible to reconstruct the picture displayed on the video display unit from the radiated emission by means of a normal television receiver. If no preventive measures are taken, eavesdropping on a video display unit is possible at distances up to several hundreds of meters, using only a normal black-and-white TV receiver, a directional antenna and an antenna amplifier. It is even possible to pick up information from some types of video display units at a distance of over 1 kilometer. If more sophisticated receiving and decoding equipment is used, the maximum distance can be much greater. ==== Compromising reflections ==== What is displayed by the monitor is reflected on the environment. The time-varying diffuse reflections of the light emitted by a CRT monitor can be exploited to recover the original monitor image. This is an eavesdropping technique for spying at a distance on data that is displayed on an arbitrary computer screen, including the currently prevalent LCD monitors. The technique exploits reflections of the screen's optical emanations in various objects that one commonly finds close to the screen and uses those reflections to recover the original screen content. Such objects include eyeglasses, tea pots, spoons, plastic bottles, and even the eye of the user. This attack can be successfully mounted to spy on even small fonts using inexpensive, off-the-shelf equipment (less than 1500 dollars) from a distance of up to 10 meters. Relying on more expensive equipment allowed to conduct this attack from over 30 meters away, demonstrating that similar attacks are feasible from the other side of the street or from a close by building. Many objects that may be found at a usual workplace can be exploited to retrieve information on a computer's display by an outsider. Particularly good results were obtained from reflections in a user's eyeglasses or a tea pot located on the desk next to the screen. Reflections that stem from the eye of the user also provide good results. However, eyes are harder to spy on at a distance because they are fast-moving objects and require high exposure times. Using more expensive equipment with lower exposure times helps to remedy this problem. The reflections gathered from curved surfaces on close by objects indeed pose a substantial threat to the confidentiality of data displayed on the screen. Fully invalidating this threat without at the same time hiding the screen from the legitimate user seems difficult, without using curtains on the windows or similar forms of strong optical shielding. Most users, however, will not be aware of this risk and may not be willing to close the curtains on a nice day. The reflection of an object, a computer display, in a curved mirror creates a virtual image that is located behind the reflecting surface. For a flat mirror this virtual image has the same size and is located behind the mirror at the same distance as the original object. For curved mirrors, however, the situation is more complex. === Keyboard === ==== Electromagnetic emanations ==== Computer keyboards are often used to transmit confidential data such as passwords. Since they contain electronic components, keyboards emit electromagnetic waves. These emanations could reveal sensitive information such as keystrokes. Electromagnetic emanations have turned out to constitute a security threat to computer equipment. The figure below presents how a keystroke is retrieved and what material is necessary. The approach is to acquire the raw signal directly from the antenna and to process the entire captured electromagnetic spectrum. Thanks to this method, four different kinds of compromising electromagnetic emanations have been detected, generated by wired and wireless keyboards. These emissions lead to a full or a partial recovery of the keystrokes. The best practical attack fully recovered 95% of the keystrokes of a PS/2 keyboard at a distance up to 20 meters, even through walls. Because each keyboard has a specific fingerprint based on the clock frequency inconsistencies, it can determine the source keyboard of a compromising emanation, even if multiple keyboards from the same model are used at the same time. The four different kinds way of compromising electromagnetic emanations are described below. ===== The Falling Edge Transition Technique ===== When a key is pressed, released or held down, the keyboard sends a packet of information known as a scan code to the computer. The protocol used to transmit these scan codes is a bidirectional serial communication, based on four wires: Vcc (5 volts), ground, data and clock. Clock and data signals are identically generated. Hence, the compromising emanation detected is the combination of both signals. However, the edges of the data and the clock lines are not superposed. Thus, they can be easily separated to obtain independent signals. ===== The Generalized Transition Technique ===== The Falling Edge Transition attack is limited to a partial recovery of the keystrokes. This is a significant limitation. The GTT is a falling edge transition attack improved, which recover almost all keystrokes. Indeed, between two traces, there is exactly one data rising edge. If attackers are able to detect this transition, they can fully recover the keystrokes. ===== The Modulation Technique ===== Harmonics compromising electromagnetic emissions come from unintentional emanations such as radiations emitted by the clock, non-linear elements, crosstalk, ground pollution, etc. Determining theoretically the reasons of these compromising radiations is a very complex task. These harmonics correspond to a carrier of approximately 4 MHz which is very likely the internal clock of the micro-controller inside the keyboard. These harmonics are correlated with both clock and data signals, which describe modulated signals (in amplitude and frequency) and the full state of both clock and data signals. This means that the scan code can be completely recovered from these harmonics. ===== The Matrix Scan Technique ===== Keyboard manufacturers arrange the keys in a matrix. The keyboard controller, often an 8-bit processor, parses columns one-by-one and recovers the state of 8 keys at once. This matrix scan process can be described as 192 keys (some keys may not be used, for instance modern keyboards use 104/105 keys) arranged in 24 columns and 8 rows. These columns are continuously pulsed one-by-one for at least 3μs. Thus, these leads may act as an antenna and generate electromagnetic emanations. If an attacker is able to capture these emanations, he can easily recover the column of the pressed key. Even if this signal does not fully describe the pressed key, it still gives partial information on the transmitted scan code, i.e. the column number. Note that the matrix scan routine loops continuously. When no key is pressed, we still have a signal composed of multiple equidistant peaks. These emanations may be used to remotely detect the presence of powered computers. Concerning wireless keyboards, the wireless data burst transmission can be used as an electromagnetic trigger to detect exactly when a key is pressed, while the matrix s

    Read more →
  • Curriculum learning

    Curriculum learning

    Curriculum learning is a technique in machine learning in which a model is trained on examples of increasing difficulty, where the definition of "difficulty" may be provided externally or discovered as part of the training process. This is intended to attain good performance more quickly, or to converge to a better local optimum if the global optimum is not found. == Approach == Most generally, curriculum learning is the technique of successively increasing the difficulty of examples in the training set that is presented to a model over multiple training iterations. This can produce better results than exposing the model to the full training set immediately under some circumstances; most typically, when the model is able to learn general principles from easier examples, and then gradually incorporate more complex and nuanced information as harder examples are introduced, such as edge cases. This has been shown to work in many domains, most likely as a form of regularization. There are several major variations in how the technique is applied: A concept of "difficulty" must be defined. This may come from human annotation or an external heuristic; for example in language modeling, shorter sentences might be classified as easier than longer ones. Another approach is to use the performance of another model, with examples accurately predicted by that model being classified as easier (providing a connection to boosting). Difficulty can be increased steadily or in distinct epochs, and in a deterministic schedule or according to a probability distribution. This may also be moderated by a requirement for diversity at each stage, in cases where easier examples are likely to be disproportionately similar to each other. Applications must also decide the schedule for increasing the difficulty. Simple approaches may use a fixed schedule, such as training on easy examples for half of the available iterations and then all examples for the second half. Other approaches use self-paced learning to increase the difficulty in proportion to the performance of the model on the current set. Since curriculum learning only concerns the selection and ordering of training data, it can be combined with many other techniques in machine learning. The success of the method assumes that a model trained for an easier version of the problem can generalize to harder versions, so it can be seen as a form of transfer learning. Some authors also consider curriculum learning to include other forms of progressively increasing complexity, such as increasing the number of model parameters. It is frequently combined with reinforcement learning, such as learning a simplified version of a game first. Some domains have shown success with anti-curriculum learning: training on the most difficult examples first. One example is the ACCAN method for speech recognition, which trains on the examples with the lowest signal-to-noise ratio first. == History == The term "curriculum learning" was introduced by Yoshua Bengio et al in 2009, with reference to the psychological technique of shaping in animals and structured education for humans: beginning with the simplest concepts and then building on them. The authors also note that the application of this technique in machine learning has its roots in the early study of neural networks such as Jeffrey Elman's 1993 paper Learning and development in neural networks: the importance of starting small. Bengio et al showed good results for problems in image classification, such as identifying geometric shapes with progressively more complex forms, and language modeling, such as training with a gradually expanding vocabulary. They conclude that, for curriculum strategies, "their beneficial effect is most pronounced on the test set", suggesting good generalization. The technique has since been applied to many other domains: Natural language processing: Part-of-speech tagging Intent detection Sentiment analysis Machine translation Speech recognition Language model pre-training Image recognition: Facial recognition Object detection Reinforcement learning: Game-playing Graph learning Matrix factorization

    Read more →
  • Probably approximately correct learning

    Probably approximately correct learning

    In computational learning theory, probably approximately correct (PAC) learning is a framework for mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant. In this framework, the learner receives samples and must select a generalization function (called the hypothesis) from a certain class of possible functions. The goal is that, with high probability (the "probably" part), the selected function will have low generalization error (the "approximately correct" part). The learner must be able to learn the concept given any arbitrary approximation ratio, probability of success, or distribution of the samples. The model was later extended to treat noise (misclassified samples). An important innovation of the PAC framework is the introduction of computational complexity theory concepts to machine learning. In particular, the learner is expected to find efficient functions (time and space requirements bounded to a polynomial of the example size), and the learner itself must implement an efficient procedure (requiring an example count bounded to a polynomial of the concept size, modified by the approximation and likelihood bounds). == Definitions and terminology == In order to give the definition for something that is PAC-learnable, we first have to introduce some terminology. For the following definitions, two examples will be used. The first is the problem of character recognition given an array of n {\displaystyle n} bits encoding a binary-valued image. The other example is the problem of finding an interval that will correctly classify points within the interval as positive and the points outside of the range as negative. Let X {\displaystyle X} be a set called the instance space or the encoding of all the samples. In the character recognition problem, the instance space is X = { 0 , 1 } n {\displaystyle X=\{0,1\}^{n}} . In the interval problem the instance space, X {\displaystyle X} , is the set of all bounded intervals in R {\displaystyle \mathbb {R} } , where R {\displaystyle \mathbb {R} } denotes the set of all real numbers. A concept is a subset c ⊂ X {\displaystyle c\subset X} . One concept is the set of all patterns of bits in X = { 0 , 1 } n {\displaystyle X=\{0,1\}^{n}} that encode a picture of the letter "P". An example concept from the second example is the set of open intervals, { ( a , b ) ∣ 0 ≤ a ≤ π / 2 , π ≤ b ≤ 13 } {\displaystyle \{(a,b)\mid 0\leq a\leq \pi /2,\pi \leq b\leq {\sqrt {13}}\}} , each of which contains only the positive points. A concept class C {\displaystyle C} is a collection of concepts over X {\displaystyle X} . This could be the set of all subsets of the array of bits that are skeletonized 4-connected (width of the font is 1). Let EX ⁡ ( c , D ) {\displaystyle \operatorname {EX} (c,D)} be a procedure that draws an example, x {\displaystyle x} , using a probability distribution D {\displaystyle D} and gives the correct label c ( x ) {\displaystyle c(x)} , that is 1 if x ∈ c {\displaystyle x\in c} and 0 otherwise. Now, given 0 < ϵ , δ < 1 {\displaystyle 0<\epsilon ,\delta <1} , assume there is an algorithm A {\displaystyle A} and a polynomial p {\displaystyle p} in 1 / ϵ , 1 / δ {\displaystyle 1/\epsilon ,1/\delta } (and other relevant parameters of the class C {\displaystyle C} ) such that, given a sample of size p {\displaystyle p} drawn according to EX ⁡ ( c , D ) {\displaystyle \operatorname {EX} (c,D)} , then, with probability of at least 1 − δ {\displaystyle 1-\delta } , A {\displaystyle A} outputs a hypothesis h ∈ C {\displaystyle h\in C} that has an average error less than or equal to ϵ {\displaystyle \epsilon } on X {\displaystyle X} with the same distribution D {\displaystyle D} . Further if the above statement for algorithm A {\displaystyle A} is true for every concept c ∈ C {\displaystyle c\in C} and for every distribution D {\displaystyle D} over X {\displaystyle X} , and for all 0 < ϵ , δ < 1 {\displaystyle 0<\epsilon ,\delta <1} then C {\displaystyle C} is (efficiently) PAC learnable (or distribution-free PAC learnable). We can also say that A {\displaystyle A} is a PAC learning algorithm for C {\displaystyle C} . == Equivalence == Under some regularity conditions these conditions are equivalent: The concept class C is PAC learnable. The VC dimension of C is finite. C is a uniformly Glivenko-Cantelli class. C is compressible in the sense of Littlestone and Warmuth

    Read more →
  • Q-learning

    Q-learning

    Q-learning is a reinforcement learning algorithm that trains an agent to assign values to its possible actions based on its current state, without requiring a model of the environment (model-free). It can handle problems with stochastic transitions and rewards without requiring adaptations. For example, in a grid maze, an agent learns to reach an exit worth 10 points. At a junction, Q-learning might assign a higher value to moving right than left if right gets to the exit faster, improving this choice by trying both directions over time. For any finite Markov decision process, Q-learning finds an optimal policy in the sense of maximizing the expected value of the total reward over any and all successive steps, starting from the current state. Q-learning can identify an optimal action-selection policy for any given finite Markov decision process, given infinite exploration time and a partly random policy. "Q" refers to the function that the algorithm computes: the expected reward—that is, the quality—of an action taken in a given state. == Reinforcement learning == Reinforcement learning involves an agent, a set of states S {\displaystyle {\mathcal {S}}} , and a set A {\displaystyle {\mathcal {A}}} of actions per state. By performing an action a ∈ A {\displaystyle a\in {\mathcal {A}}} , the agent transitions from state to state. Executing an action in a specific state provides the agent with a reward (a numerical score). The goal of the agent is to maximize its total reward. It does this by adding the maximum reward attainable from future states to the reward for achieving its current state, effectively influencing the current action by the potential future reward. This potential reward is a weighted sum of expected values of the rewards of all future steps starting from the current state. As an example, consider the process of boarding a train, in which the reward is measured by the negative of the total time spent boarding (alternatively, the cost of boarding the train is equal to the boarding time). One strategy is to enter the train door as soon as they open, minimizing the initial wait time for yourself. If the train is crowded, however, then you will have a slow entry after the initial action of entering the door as people are fighting you to depart the train as you attempt to board. The total boarding time, or cost, is then: 0 seconds wait time + 15 seconds fight time On the next day, by random chance (exploration), you decide to wait and let other people depart first. This initially results in a longer wait time. However, less time is spent fighting the departing passengers. Overall, this path has a higher reward than that of the previous day, since the total boarding time is now: 5 second wait time + 0 second fight time Through exploration, despite the initial (patient) action resulting in a larger cost (or negative reward) than in the forceful strategy, the overall cost is lower, thus revealing a more rewarding strategy. == Algorithm == After Δ t {\displaystyle \Delta t} steps into the future the agent will decide some next step. The weight for this step is calculated as γ Δ t {\displaystyle \gamma ^{\Delta t}} , where γ {\displaystyle \gamma } (the discount factor) is a number between 0 and 1 ( 0 ≤ γ ≤ 1 {\displaystyle 0\leq \gamma \leq 1} ). Assuming γ < 1 {\displaystyle \gamma <1} , it has the effect of valuing rewards received earlier higher than those received later (reflecting the value of a "good start"). γ {\displaystyle \gamma } may also be interpreted as the probability to succeed (or survive) at every step Δ t {\displaystyle \Delta t} . The algorithm, therefore, has a function that calculates the quality of a state–action combination: Q : S × A → R {\displaystyle Q:{\mathcal {S}}\times {\mathcal {A}}\to \mathbb {R} } . Before learning begins, ⁠ Q {\displaystyle Q} ⁠ is initialized to a possibly arbitrary fixed value (chosen by the programmer). Then, at each time t {\displaystyle t} the agent selects an action A t {\displaystyle A_{t}} , observes a reward R t + 1 {\displaystyle R_{t+1}} , enters a new state S t + 1 {\displaystyle S_{t+1}} (that may depend on both the previous state S t {\displaystyle S_{t}} and the selected action), and Q {\displaystyle Q} is updated. The core of the algorithm is a Bellman equation as a simple value iteration update, using the weighted average of the current value and the new information: Q n e w ( S t , A t ) ← ( 1 − α ⏟ learning rate ) ⋅ Q ( S t , A t ) ⏟ current value + α ⏟ learning rate ⋅ ( R t + 1 ⏟ reward + γ ⏟ discount factor ⋅ max a Q ( S t + 1 , a ) ⏟ estimate of optimal future value ⏟ new value (temporal difference target) ) {\displaystyle Q^{new}(S_{t},A_{t})\leftarrow (1-\underbrace {\alpha } _{\text{learning rate}})\cdot \underbrace {Q(S_{t},A_{t})} _{\text{current value}}+\underbrace {\alpha } _{\text{learning rate}}\cdot {\bigg (}\underbrace {\underbrace {R_{t+1}} _{\text{reward}}+\underbrace {\gamma } _{\text{discount factor}}\cdot \underbrace {\max _{a}Q(S_{t+1},a)} _{\text{estimate of optimal future value}}} _{\text{new value (temporal difference target)}}{\bigg )}} where R t + 1 {\displaystyle R_{t+1}} is the reward received when moving from the state S t {\displaystyle S_{t}} to the state S t + 1 {\displaystyle S_{t+1}} , and α {\displaystyle \alpha } is the learning rate ( 0 < α ≤ 1 ) {\displaystyle (0<\alpha \leq 1)} . Note that Q n e w ( S t , A t ) {\displaystyle Q^{new}(S_{t},A_{t})} is the sum of three terms: ( 1 − α ) Q ( S t , A t ) {\displaystyle (1-\alpha )Q(S_{t},A_{t})} : the current value (weighted by one minus the learning rate) α R t + 1 {\displaystyle \alpha \,R_{t+1}} : the reward R t + 1 {\displaystyle R_{t+1}} to obtain if action A t {\displaystyle A_{t}} is taken when in state S t {\displaystyle S_{t}} (weighted by learning rate) α γ max a Q ( S t + 1 , a ) {\displaystyle \alpha \gamma \max _{a}Q(S_{t+1},a)} : the maximum reward that can be obtained from state S t + 1 {\displaystyle S_{t+1}} (weighted by learning rate and discount factor) An episode of the algorithm ends when state S t + 1 {\displaystyle S_{t+1}} is a final or terminal state. However, Q-learning can also learn in non-episodic tasks (as a result of the property of convergent infinite series). If the discount factor is lower than 1, the action values are finite even if the problem can contain infinite loops or paths. For all final states s f {\displaystyle s_{f}} , Q ( s f , a ) {\displaystyle Q(s_{f},a)} is never updated, but is set to the reward value r {\displaystyle r} observed for state s f {\displaystyle s_{f}} . In most cases, Q ( s f , a ) {\displaystyle Q(s_{f},a)} can be taken to equal zero. == Influence of variables == === Learning rate === The learning rate or step size determines to what extent newly acquired information overrides old information. A factor of 0 makes the agent learn nothing (exclusively exploiting prior knowledge), while a factor of 1 makes the agent consider only the most recent information (ignoring prior knowledge to explore possibilities). In fully deterministic environments, a learning rate of α t = 1 {\displaystyle \alpha _{t}=1} is optimal. When the problem is stochastic, the algorithm converges under some technical conditions on the learning rate that require it to decrease to zero. In practice, often a constant learning rate is used, such as α t = 0.1 {\displaystyle \alpha _{t}=0.1} for all t {\displaystyle t} . === Discount factor === The discount factor ⁠ γ {\displaystyle \gamma } ⁠ determines the importance of future rewards. A factor of 0 will make the agent "myopic" (or short-sighted) by only considering current rewards, i.e. r t {\displaystyle r_{t}} (in the update rule above), while a factor approaching 1 will make it strive for a long-term high reward. If the discount factor meets or exceeds 1, the action values may diverge. For ⁠ γ = 1 {\displaystyle \gamma =1} ⁠, without a terminal state, or if the agent never reaches one, all environment histories become infinitely long, and utilities with additive, undiscounted rewards generally become infinite. Even with a discount factor only slightly lower than 1, Q-function learning leads to propagation of errors and instabilities when the value function is approximated with an artificial neural network. In that case, starting with a lower discount factor and increasing it towards its final value accelerates learning. === Initial conditions (Q0) === Since Q-learning is an iterative algorithm, it implicitly assumes an initial condition before the first update occurs. High initial values, also known as "optimistic initial conditions", can encourage exploration: no matter what action is selected, the update rule will cause it to have lower values than the other alternative, thus increasing their choice probability. The first reward r {\displaystyle r} can be used to reset the initial conditions. According to this idea, the first time an action is taken the reward is used to set the value

    Read more →
  • Arabic Ontology

    Arabic Ontology

    Arabic Ontology is a website offering linguistic ontology services for the Arabic language which can be used like the online site WordNet. Users can use Arabic Ontology to classify or clarify the concepts and meanings of Arabic terms. == Ontology Structure == The ontology structure (i.e., data model) is similar to WordNet's structure. Each concept in the database is given a unique concept identifier (URI), informally described by a gloss, and lexicalized by one or more synonymous lemma terms. Each term-concept pair is called a sense, and is given a SenseID. A set of senses is called synset. Concepts and senses are described by further attributes such as era and area — to specify example usage and ontological analysis. Semantic relations are defined between concepts. Some important entities are included in the ontology, such as individual countries and bodies of water. These individuals are given separate IndividualIDs and linked with their concepts through the InstanceOf relation. == Mappings to other resources == Concepts in the Arabic Ontology are mapped to synsets in WordNet, as well as to BFO and DOLCE. Terms used in the Arabic Ontology are mapped to lemmas in the LDC's SAMA database. == Applications == Arabic Ontology can be used in many application domains, such as: Information retrieval, to enrich queries (e.g., in search engines) and improve the quality of the results, i.e. meaningful search rather than string-matching search; Machine translation and word-sense disambiguation, by finding the exact mapping of concepts across languages, especially that the Arabic ontology is also mapped to the WordNet; Data Integration and interoperability in which the Arabic ontology can be used as a semantic reference to link databases and information systems; Semantic Web and Web 3.0, by using the Arabic ontology as a semantic reference to disambiguate the meanings used in websites; among many other applications. == URLs Design == The URLs in the Arabic Ontology are designed according to the W3C's Best Practices for Publishing Linked Data, as described in the following URL schemes. This allows one to also explore the whole database like exploring a graph: Ontology Concept: Each concept in the Arabic Ontology has a ConceptID and can be accessed using: https://{domain}/concept/{ConceptID | Term}. In case of a term, the set of concepts that this term lexicalizes are all retrieved. In case of a ConceptID, the concept and its direct subtypes are retrieved, e.g. https://ontology.birzeit.edu/concept/293198 Semantic relations: Relationships between concepts can be accessed using these schemes: (i) the URL: https:// {domain}/concept/{RelationName}/{ConceptID} allows retrieval of relationships among ontology concepts. (ii) the URL: https://{domain}/lexicalconcept/{RelationName}/{lexicalConceptID} allows retrieval of relations between lexical concepts. For example, https://ontology.birzeit.edu/concept/instances/293121 retrieves the instances of the concept 293121. The relations that are currently used in our database are: {subtypes, type, instances, parts, related, similar, equivalent}.

    Read more →
  • Generalized iterative scaling

    Generalized iterative scaling

    In statistics, generalized iterative scaling (GIS) and improved iterative scaling (IIS) are two early algorithms used to fit log-linear models, notably multinomial logistic regression (MaxEnt) classifiers and extensions of it such as MaxEnt Markov models and conditional random fields. These algorithms have been largely surpassed by gradient-based methods such as L-BFGS and coordinate descent algorithms.

    Read more →
  • Teaching dimension

    Teaching dimension

    In computational learning theory, the teaching dimension of a concept class C is defined to be max c ∈ C { w C ( c ) } {\displaystyle \max _{c\in C}\{w_{C}(c)\}} , where w C ( c ) {\displaystyle {w_{C}(c)}} is the minimum size of a witness set for c in C. Intuitively, this measures the number of instances that are needed to identify a concept in the class, using supervised learning with examples provided by a helpful teacher who is trying to convey the concept as succinctly as possible. This definition was formulated in 1995 by Sally Goldman and Michael Kearns, based on earlier work by Goldman, Ron Rivest, and Robert Schapire. The teaching dimension of a finite concept class can be used to give a lower and an upper bound on the membership query cost of the concept class. In Stasys Jukna's book "Extremal Combinatorics", a lower bound is given for the teaching dimension in general: Let C be a concept class over a finite domain X. If the size of C is greater than 2 k ( | X | k ) , {\displaystyle 2^{k}{|X| \choose k},} then the teaching dimension of C is greater than k. However, there are more specific teaching models that make assumptions about teacher or learner, and can get lower values for the teaching dimension. For instance, several models are the classical teaching (CT) model, the optimal teacher (OT) model, recursive teaching (RT), preference-based teaching (PBT), and non-clashing teaching (NCT).

    Read more →