AI Generator Website

AI Generator Website — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Canva

    Canva

    Canva Pty Ltd. is an Australian multinational proprietary software company launched in 2013 based in Sydney, Australia. The platform provides a graphic design platform to create visual content for presentations, websites, and other digital products. Its uses include templates for presentations, posters, and social media content, as well as photo and video editing functionality. The platform uses a drag-and-drop interface designed for users without professional design training or experience. Canva operates on a freemium model and has added features such as print services and video editing tools over time. == History == === 2013–2020 === Canva was founded in Perth, Australia, by Melanie Perkins, Cliff Obrecht and Cameron Adams on 1 January 2013. One of the company's early investors was Susan Wu, an American entrepreneur. In its first year, Canva had more than 750,000 users. In 2017, the company reached profitability and had 294,000 paying customers. In January 2018, Perkins announced that the company had raised A$40 million from Sequoia Capital, Blackbird Ventures, and Felicis Ventures, and the company was valued at A$1 billion. It raised A$70 million in May 2019, followed by A$85 million in October 2019 and the launch of Canva for Enterprise. In December 2019, Canva announced Canva for Education, a free product for schools and other educational institutions intended to facilitate collaboration between students and teachers. === 2021–2025 === In June 2020, Canva announced a partnership with FedEx Office and with Office Depot the following month. As of June 2020, Canva's valuation had risen to A$6 billion, rising to A$40 billion by September 2021. In September 2021, Canva raised US$200 million, with its value peaking that year at US$40 billion. By September 2022, the valuation of the company had leveled at US$26 billion. While Canva's value declined from its 2021 peak by mid-2022, it remained one of Australia's most prominent technology companies, alongside Atlassian. In March 2022, Canva had over 75 million monthly active users. In 2023, the pair were named in the Australian Financial Review's AFR Rich List as among the 10 most wealthy people in Australia. On 7 December 2022, Canva launched Magic Write, which is the platform's AI-powered copywriting assistant. On 22 March 2023, Canva announced its new Assistant tool, which makes recommendations on graphics and styles that match the user's existing design. On 11 January 2024, Canva launched its own GPT in OpenAI's GPT Store. The company has announced it intends to compete with Google and Microsoft in the office software category with website and whiteboard products. In May 2024, the company announced the launch of Canva Enterprise, a plan designed for large organisations, alongside new tools including Work Kits, Courses and AI capabilities. In 2024, it announced a co-funded solar energy project to enhance its sustainability efforts. On 10 April 2025, Canva released Visual Suite 2. The new interface combines Canva's design and productivity tools. New features include a spreadsheets application (Canva Sheets), a generative AI coding assistant (Canva Code), a chatbot, and an updated photo editor that can modify or remove background objects. In August 2025, Canva launched a stock sale to employees, valuing the company at US$42 billion. == Acquisitions == In 2018, the company acquired presentations startup Zeetings for an undisclosed amount, as part of its expansion into the presentations space. In May 2019, the company announced the acquisitions of Pixabay and Pexels, two free stock photography sites based in Germany, which enabled Canva users to access their photos for designs. In February 2021, Canva acquired Austrian startup Kaleido.ai and the Czech-based Smartmockups. In 2022, Canva acquired Flourish, a London-based data visualization startup. In March 2024, Canva acquired UK-based Serif, the developers of the Affinity suite of graphic design software, for approximately $380 million. In August 2024, Canva acquired the AI image generation platform and startup, Leonardo AI, for an undisclosed amount. In June 2025, it was announced that Canva had acquired Australian AI marketing startup MagicBrief for an undisclosed amount. In February 2026, Canva acquired two startups: Cavalry, which specializes in animation software, and MangoAI, which focuses on improving advertising performance. In April 2026, Canva acquired Simtheory, an AI Workflow Tool, and Ortto, a marketing automation tool. == Philanthropy == Canva's co-founders, Melanie Perkins and Cliff Obrecht, have publicly stated their intention to donate a significant portion of their personal wealth to charity. In 2021, Canva started a partnership with GiveDirectly, a nonprofit organization operating in low income areas that makes unconditional cash transfers to families living in extreme poverty. Since then, the company has donated $50 million to support GiveDirectly's work across Malawi. In 2025, Canva announced an additional $100 million commitment to expand its GiveDirectly partnership. == Controversies == === Data breach === In May 2019, Canva experienced a data breach in which the data of roughly 139 million users was exposed. The exposed data included real names of users, usernames, email addresses, geographical information, and password hashes for some users. In January 2020, approximately 4 million user passwords were decrypted and shared online. Canva responded by resetting the passwords of every user who had not changed their password since the initial breach. === Russian operations === In May 2022 Canva was criticized for continuing to provide free access to its services in Russia, even after suspending payment processing in the country. Activists from the Ukrainian diaspora in Australia and others said this could be viewed as indirectly supporting Russia’s war effort. They noted the company was the only one of several major Australian firms to receive the lowest “digging in” rating on a tracker run by the Yale School of Management for failing to pull out of Russia. Canva responded that it had suspended financial transactions in Russia from March 2022 and maintained the free version to allow the continued creation and sharing of “pro-peace and anti-war” content for its 1.4 million Russian users.

    Read more →
  • Bayesian network

    Bayesian network

    A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). While it is one of several forms of causal notation, causal networks are special cases of Bayesian networks. Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Efficient algorithms can perform inference and learning in Bayesian networks. Bayesian networks that model sequences of variables (e.g. speech signals or protein sequences) are called dynamic Bayesian networks. Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called influence diagrams. == Graphical model == Formally, Bayesian networks are directed acyclic graphs (DAGs) whose nodes represent variables in the Bayesian sense: they may be observable quantities, latent variables, unknown parameters or hypotheses. Each edge represents a direct conditional dependency. Any pair of nodes that are not connected (i.e. no path connects one node to the other) represent variables that are conditionally independent of each other. Each node is associated with a probability function that takes, as input, a particular set of values for the node's parent variables, and gives (as output) the probability (or probability distribution, if applicable) of the variable represented by the node. For example, if m {\displaystyle m} parent nodes represent m {\displaystyle m} Boolean variables, then the probability function could be represented by a table of 2 m {\displaystyle 2^{m}} entries, one entry for each of the 2 m {\displaystyle 2^{m}} possible parent combinations. Similar ideas may be applied to undirected, and possibly cyclic, graphs such as Markov networks. == Example == Suppose we want to model the dependencies between three variables: the sprinkler (or more appropriately, its state - whether it is on or not), the presence or absence of rain and whether the grass is wet or not. Observe that two events can cause the grass to become wet: an active sprinkler or rain. Rain has a direct effect on the use of the sprinkler (namely that when it rains, the sprinkler usually is not active). This situation can be modeled with a Bayesian network (shown to the right). Each variable has two possible values, T (for true) and F (for false). The joint probability function is, by the chain rule of probability, Pr ( G , S , R ) = Pr ( G ∣ S , R ) Pr ( S ∣ R ) Pr ( R ) {\displaystyle \Pr(G,S,R)=\Pr(G\mid S,R)\Pr(S\mid R)\Pr(R)} where G = "Grass wet (true/false)", S = "Sprinkler turned on (true/false)", and R = "Raining (true/false)". The model can answer questions about the presence of a cause given the presence of an effect (so-called inverse probability) like "What is the probability that it is raining, given the grass is wet?" by using the conditional probability formula and summing over all nuisance variables: Pr ( R = T ∣ G = T ) = Pr ( G = T , R = T ) Pr ( G = T ) = ∑ x ∈ { T , F } Pr ( G = T , S = x , R = T ) ∑ x , y ∈ { T , F } Pr ( G = T , S = x , R = y ) {\displaystyle \Pr(R=T\mid G=T)={\frac {\Pr(G=T,R=T)}{\Pr(G=T)}}={\frac {\sum _{x\in \{T,F\}}\Pr(G=T,S=x,R=T)}{\sum _{x,y\in \{T,F\}}\Pr(G=T,S=x,R=y)}}} Using the expansion for the joint probability function Pr ( G , S , R ) {\displaystyle \Pr(G,S,R)} and the conditional probabilities from the conditional probability tables (CPTs) stated in the diagram, one can evaluate each term in the sums in the numerator and denominator. For example, Pr ( G = T , S = T , R = T ) = Pr ( G = T ∣ S = T , R = T ) Pr ( S = T ∣ R = T ) Pr ( R = T ) = 0.99 × 0.01 × 0.2 = 0.00198. {\displaystyle {\begin{aligned}\Pr(G=T,S=T,R=T)&=\Pr(G=T\mid S=T,R=T)\Pr(S=T\mid R=T)\Pr(R=T)\\&=0.99\times 0.01\times 0.2\\&=0.00198.\end{aligned}}} Then the numerical results (subscripted by the associated variable values) are Pr ( R = T ∣ G = T ) = 0.00198 T T T + 0.1584 T F T 0.00198 T T T + 0.288 T T F + 0.1584 T F T + 0.0 T F F = 891 2491 ≈ 35.77 % . {\displaystyle \Pr(R=T\mid G=T)={\frac {0.00198_{TTT}+0.1584_{TFT}}{0.00198_{TTT}+0.288_{TTF}+0.1584_{TFT}+0.0_{TFF}}}={\frac {891}{2491}}\approx 35.77\%.} To answer an interventional question, such as "What is the probability that it would rain, given that we wet the grass?" the answer is governed by the post-intervention joint distribution function Pr ( S , R ∣ do ( G = T ) ) = Pr ( S ∣ R ) Pr ( R ) {\displaystyle \Pr(S,R\mid {\text{do}}(G=T))=\Pr(S\mid R)\Pr(R)} obtained by removing the factor Pr ( G ∣ S , R ) {\displaystyle \Pr(G\mid S,R)} from the pre-intervention distribution. The do operator forces the value of G to be true. The probability of rain is unaffected by the action: Pr ( R ∣ do ( G = T ) ) = Pr ( R ) . {\displaystyle \Pr(R\mid {\text{do}}(G=T))=\Pr(R).} To predict the impact of turning the sprinkler on: Pr ( R , G ∣ do ( S = T ) ) = Pr ( R ) Pr ( G ∣ R , S = T ) {\displaystyle \Pr(R,G\mid {\text{do}}(S=T))=\Pr(R)\Pr(G\mid R,S=T)} with the term Pr ( S = T ∣ R ) {\displaystyle \Pr(S=T\mid R)} removed, showing that the action affects the grass but not the rain. These predictions may not be feasible given unobserved variables, as in most policy evaluation problems. The effect of the action do ( x ) {\displaystyle {\text{do}}(x)} can still be predicted, however, whenever the back-door criterion is satisfied. It states that, if a set Z of nodes can be observed that d-separates (or blocks) all back-door paths from X to Y then Pr ( Y , Z ∣ do ( x ) ) = Pr ( Y , Z , X = x ) Pr ( X = x ∣ Z ) . {\displaystyle \Pr(Y,Z\mid {\text{do}}(x))={\frac {\Pr(Y,Z,X=x)}{\Pr(X=x\mid Z)}}.} A back-door path is one that ends with an arrow into X. Sets that satisfy the back-door criterion are called "sufficient" or "admissible." For example, the set Z = R is admissible for predicting the effect of S = T on G, because R d-separates the (only) back-door path S ← R → G. However, if S is not observed, no other set d-separates this path and the effect of turning the sprinkler on (S = T) on the grass (G) cannot be predicted from passive observations. In that case P(G | do(S = T)) is not "identified". This reflects the fact that, lacking interventional data, the observed dependence between S and G is due to a causal connection or is spurious (apparent dependence arising from a common cause, R). (see Simpson's paradox) To determine whether a causal relation is identified from an arbitrary Bayesian network with unobserved variables, one can use the three rules of "do-calculus" and test whether all do terms can be removed from the expression of that relation, thus confirming that the desired quantity is estimable from frequency data. Using a Bayesian network can save considerable amounts of memory over exhaustive probability tables, if the dependencies in the joint distribution are sparse. For example, a naive way of storing the conditional probabilities of 10 two-valued variables as a table requires storage space for 2 10 = 1024 {\displaystyle 2^{10}=1024} values. If no variable's local distribution depends on more than three parent variables, the Bayesian network representation stores at most 10 ⋅ 2 3 = 80 {\displaystyle 10\cdot 2^{3}=80} values. One advantage of Bayesian networks is that it is intuitively easier for a human to understand (a sparse set of) direct dependencies and local distributions than complete joint distributions. == Inference and learning == Bayesian networks perform three main inference tasks: Inferring unobserved variables Parameter learning for the probability distributions of each node in the network Structure learning of the graphical network === Inferring unobserved variables === Because a Bayesian network is a complete model for its variables and their relationships, it can be used to answer probabilistic queries about them. For example, the network can be used to update knowledge of the state of a subset of variables when other variables (the evidence variables) are observed. This process of computing the posterior distribution of variables given evidence is called probabilistic inference. The posterior gives a universal sufficient statistic for detection applications, when choosing values for the variable subset that minimize some expected loss function, for instance the probability of decision error. A Bayesian network can thus be considered a mechanism for automatically applying Bayes' theorem to complex problems. The most common exact inference methods are: variable elimination, which eliminates (by integration or summation) the non-observed non-query variables one by one by distributing the sum over the prod

    Read more →
  • KNIME

    KNIME

    KNIME ( ), the Konstanz Information Miner, is a data analytics, reporting and integrating platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" concept. A graphical user interface and use of Java Database Connectivity (JDBC) allows assembly of nodes blending different data sources, including preprocessing (extract, transform, load, or ETL), for modeling, data analysis and visualization with minimal, or no, programming. It is free and open-source software released under a GNU General Public License. Since 2006, KNIME has been used in pharmaceutical research, and in other areas including customer relationship management (CRM) and data analysis, business intelligence, text mining and financial data analysis. Recently, attempts were made to use KNIME as robotic process automation (RPA) tool. KNIME's headquarters are based in Zurich, with other offices in Konstanz, Berlin, and Austin (USA). == History == Development of KNIME began in January 2004, with a team of software engineers at the University of Konstanz, as an open-source platform. The original team, headed by Michael Berthold, came from a Silicon Valley pharmaceutical industry software company. The initial goal was to create a modular, highly scalable and open data processing platform that allows easy integration of different data loading, processing, transforming, analyzing, and visual exploring modules, without focus on any one application area. The platform was intended for collaborating, research, and for integrating various other data analysis projects. In 2006, the first version of KNIME was released. Several pharmaceutical companies began using KNIME, and several life science software vendors began integrating their tools into the platform. Later that year, after an article in the German magazine c't, users from a number of other areas joined ship. As of 2012, KNIME is in use by over 15,000 actual users (i.e. not counting downloads, but users regularly retrieving updates) in the life sciences and at banks, publishers, car manufacturer, telcos, consulting firms, and various other industries, and a large number of research groups, worldwide. Latest updates to KNIME Server and KNIME Big Data Extensions, provide support for Apache Spark 2.3, Parquet and HDFS-type storage. For the sixth year in a row, KNIME has been placed as a leader for data science and machine learning platforms in Gartner's Magic Quadrant. == Design philosophy, features == These are the design principles and features that KNIME software follows: Visual, Interactive Framework: KNIME Software prioritizes a user-friendly and intuitive approach to data analysis. This is achieved through a visual and interactive framework where data flows can be combined using a drag-and-drop interface. Users can develop customized and interactive applications by creating simple to advanced and highly-automated data pipelines. These may include, for example, access to databases, machine learning libraries, logic for workflow control (e.g., loops, switches, etc.), abstraction (e.g., interactive widgets), invocation, dynamic data apps, integrated deployment, or error handling. Modularity: processing units and data containers should remain independent of each other. This design choice enables easy distribution of computation and allows for the independent development of different algorithms. Data types within KNIME are encapsulated, meaning no types are predefined. This design choice facilitates adding new data types, and integrating them with extant types, while including type-specific renderers and comparators. This principle also enables inspecting results at the end of each single data operation. Extensibility: KNIME Software is designed to be extensible. Adding new processing nodes or views is made simple through a plug-in mechanism. This mechanism ensures that users can distribute their custom functionalities without the need for complicated install or uninstall procedures. Interleaving No-Code with Code: the platform supports integrating both visual programming (no-code) and script-based programming (e.g., Python, R, JavaScript) approaches to data analysis. This design principle is termed low-code. Automation and Scalability: for example, the use of parameterization via flow variables, or the encapsulation of workflow segments in components contribute to reduce manual work and errors in analyses. Further, the scheduling of workflow execution (available in KNIME Business Hub and KNIME Community Hub for Teams) reduces dependency on human resources. In terms of scalability, a few examples include the ability to handle large datasets (millions of rows), execute multiple processes simultaneously out of the box and reuse workflow segments. Full Usability: due to the open source nature, KNIME Analytics Platform provides free full usability with no limited trial periods. == Internals == KNIME allows users to visually create data flows (or pipelines), selectively execute some or all analysis steps, and later inspect the results, models, using interactive widgets and views. KNIME is written in Java and based on Eclipse. It makes use of an extension mechanism to add plug-ins providing added functions. The core version includes hundreds of modules for data integration (file input/output (I/O), database nodes supporting all common database management systems through JDBC or native connectors: SQLite, MS-Access, SQL Server, MySQL, Oracle, PostgreSQL, Vertica and H2), data transformation (filter, converter, splitter, combiner, joiner), and the commonly used methods of statistics, data mining, analysis and text analytics. Visualization is supported with the Report Designer extension. KNIME workflows can be used as data sets to create report templates that can be exported to document formats such as doc, ppt, xls, pdf and others. Other KNIME abilities are: KNIMEs core-architecture allows processing of large data volumes that are only limited by the available hard disk space (not limited to the available RAM). E.g., KNIME allows analyzing 300 million customer addresses, 20 million cell images, and 10 million molecular structures. Added plug-ins allow integrating methods for text mining, image mining, time series analysis, and networking. KNIME integrates various other open-source projects, e.g., machine learning algorithms from Weka, H2O, Keras, Spark, the R project and LIBSVM; plotly, JFreeChart, ImageJ, and the Chemistry Development Kit. KNIME is implemented in Java, allows for wrappers calling other code, in addition to providing nodes that allow it to run Java, Python, R, Ruby and other code fragments. Since 2021, KNIME's Python Integration utilizes Anaconda for Python distribution and environment management. == License == In 2024, KNIME version 5.3 is released under the same GPLv3 license as previous versions. As of version 2.1, KNIME is released under the GPLv3 license, with an exception that allow commercial software vendors to use the well-defined node application programming interface (API) to add proprietary extensions, or wrappers calling their tools from KNIME. == Courses == KNIME allows the performance of data analysis without programming skills. Several free, online courses are provided.

    Read more →
  • NOMINATE (scaling method)

    NOMINATE (scaling method)

    NOMINATE (an acronym for nominal three-step estimation) is a multidimensional scaling application developed by US political scientists Keith T. Poole and Howard Rosenthal in the early 1980s to analyze preferential and choice data, such as legislative roll-call voting behavior. In its most well-known application, members of the US Congress are placed on a two-dimensional map, with politicians who are ideologically similar (i.e. who often vote the same) being close together. One of these two dimensions corresponds to the familiar left–right political spectrum (liberal–conservative in the United States). As computing capabilities grew, Poole and Rosenthal developed multiple iterations of their NOMINATE procedure: the original D-NOMINATE method, W-NOMINATE, and most recently DW-NOMINATE (for dynamic, weighted NOMINATE). In 2009, Poole and Rosenthal were the first recipients of the Society for Political Methodology's Best Statistical Software Award for their development of NOMINATE. In 2016, the society awarded Poole its Career Achievement Award, stating that "the modern study of the U.S. Congress would be simply unthinkable without NOMINATE legislative roll call voting scores." == Procedure == The main procedure is an application of multidimensional scaling techniques to political choice data. Though there are important technical differences between these types of NOMINATE scaling procedures, all operate under the same fundamental assumptions. First, that alternative choices can be projected on a basic, low-dimensional (often two-dimensional) Euclidean space. Second, within that space, individuals have utility functions which are bell-shaped (normally distributed), and maximized at their ideal point. Because individuals also have symmetric, single-peaked utility functions which center on their ideal point, ideal points represent individuals' most preferred outcomes. That is, individuals most desire outcomes closest their ideal point, and will choose/vote probabilistically for the closest outcome. Ideal points can be recovered from observing choices, with individuals exhibiting similar preferences placed more closely than those behaving dissimilarly. It is helpful to compare this procedure to producing maps based on driving distances between cities. For example, Los Angeles is about 1,800 miles from St. Louis; St. Louis is about 1,200 miles from Miami; and Miami is about 2,700 miles from Los Angeles. From this (dis)similarities data, any map of these three cities should place Miami far from Los Angeles, with St. Louis somewhere in between (though a bit closer to Miami than Los Angeles). Just as cities like Los Angeles and San Francisco would be clustered on a map, NOMINATE places ideologically similar legislators (e.g., liberal Senators Barbara Boxer (D-Calif.) and Al Franken (D-Minn.)) closer to each other, and farther from dissimilar legislators (e.g., conservative Senator Tom Coburn (R-Okla.)) based on the degree of agreement between their roll call voting records. At the heart of the NOMINATE procedures (and other multidimensional scaling methods, such as Poole's Optimal Classification method) are algorithms they utilize to arrange individuals and choices in low dimensional (usually two-dimensional) space. Thus, NOMINATE scores provide "maps" of legislatures. Using NOMINATE procedures to study congressional roll call voting behavior from the First Congress to the present-day, Poole and Rosenthal published Congress: A Political-Economic History of Roll Call Voting in 1997 and the revised edition Ideology and Congress in 2007. In 2009, Poole and Rosenthal were named the first recipients of the Society for Political Methodology's Best Statistical Software Award for their development of NOMINATE, a recognition conferred to "individual(s) for developing statistical software that makes a significant research contribution". In 2016, Keith T. Poole was awarded the Society for Political Methodology's Career Achievement Award. The citation for this award reads, in part, "One can say perfectly correctly, and without any hyperbole: the modern study of the U.S. Congress would be simply unthinkable without NOMINATE legislative roll call voting scores. NOMINATE has produced data that entire bodies of our discipline—and many in the press—have relied on to understand the U.S. Congress." == Dimensions == Poole and Rosenthal demonstrate that—despite the many complexities of congressional representation and politics—roll call voting in both the House and the Senate can be organized and explained by no more than two dimensions throughout the sweep of American history. The first dimension (horizontal or x-axis) is the familiar left-right (or liberal-conservative) spectrum on economic matters. The second dimension (vertical or y-axis) picks up attitudes on cross-cutting, salient issues of the day (which include or have included slavery, bimetallism, civil rights, regional, and social/lifestyle issues). Rosenthal and Poole have initially argued that the first dimension refers to socio-economic matters and the second dimension to race-relations. However, the often confusing and residual nature of the second dimension has led to the second dimension being largely ignored by other researchers. For the most part, congressional voting is uni-dimensional, with most of the variation in voting patterns explained by placement along the liberal-conservative first dimension. While the first dimension of the DW-NOMINATE score is able to predict results at 83% accuracy, the addition of the second dimension only increases accuracy to 85%. Furthermore, the second dimension only provided a significant increase in accuracy for Congresses 1-99. As late as the 1990s, the second dimension was able to measure partisan splits in abortion and gun rights issues. However, a 2017 analysis found that since 1987, the votes of the US Congress had best fit a one-dimensional model, suggesting increasing party polarization after 1987. == Interpretation of nominate scores == For illustrative purposes, consider the following plots which use W-NOMINATE scores to scale members of Congress and uses the probabilistic voting model (in which legislators farther from the "cutting line" between "yea" and "nay" outcomes become more likely to vote in the predicted manner) to illustrate some major Congressional votes in the 1990s. Some of these votes, like the House's vote on President Clinton's welfare reform package (the Personal Responsibility and Work Opportunity Act of 1996) are best modeled through the use of the first (economic liberal-conservative) dimension. On the welfare reform vote, nearly all Republicans joined the moderate-conservative bloc of House Democrats in voting for the bill, while opposition was virtually confined to the most liberal Democrats in the House. The errors (those representatives on the "wrong" side of the cutting line which separates predicted "yeas" and predicted "nays") are generally close to the cutting line, which is what we would expect. A legislator directly on the cutting line is indifferent between voting "yea" and "nay" on the measure. All members are shown on the left panel of the plot, while only errors are shown on the right panel: Economic ideology also dominates the Senate vote on the Balanced Budget Amendment of 1995: On other votes, however, a second dimension (which has recently come to represent attitudes on cultural and lifestyle issues) is important. For example, roll call votes on gun control routinely split party coalitions, with socially conservative "blue dog" Democrats joining most Republicans in opposing additional regulation and socially liberal Republicans joining most Democrats in supporting gun control. The addition of the second dimension accounts for these inter-party differences, and the cutting line is more horizontal than vertical (meaning the cleavage is found on the second dimension rather than the first dimension on these votes) This pattern was evident in the 1991 House vote to require waiting periods on handguns: == Political ideology == DW-NOMINATE scores have been used widely to describe the political ideology of political actors, political parties and political institutions. For instance, a score in the first dimension that is close to either pole means that such score is located at one of the extremes in the liberal-conservative scale. So, a score closer to 1 is described as conservative whereas a score closer to −1 can be described as liberal. Finally, a score at zero or close to zero is described as moderate. == Political polarization == Poole and Rosenthal (beginning with their 1984 article "The Polarization of American Politics") have also used NOMINATE data to show that, since the 1970s, party delegations in Congress have become ideologically homogeneous and distant from one another (a phenomenon known as "polarization"). Using DW-NOMINATE scores (which permit direct comparisons between members of different Congress

    Read more →
  • Color management

    Color management

    Color management is the process of ensuring consistent and accurate colors across various devices, such as monitors, printers, and cameras. It involves the use of color profiles, which are standardized descriptions of how colors should be displayed or reproduced. Color management is necessary because different devices have different color capabilities and characteristics. For example, a monitor may display colors differently than a printer can reproduce them. Without color management, the same image may appear differently on different devices, leading to inconsistencies and inaccuracies. To achieve color management, a color profile is created for each device involved in the color workflow. This profile describes the device's color capabilities and characteristics, such as its color gamut (range of colors it can display or reproduce) and color temperature. These profiles are then used to translate colors between devices, ensuring consistent and accurate color reproduction. Color management is particularly important in industries such as graphic design, photography, and printing, where accurate color representation is crucial. It helps to maintain color consistency throughout the entire workflow, from capturing an image to displaying or printing it. Parts of color management are implemented in the operating system (OS), helper libraries, the application, and devices. The type of color profile that is typically used is called an ICC profile. A cross-platform view of color management is the use of an ICC-compatible color management system. The International Color Consortium (ICC) is an industry consortium that has defined: an open standard for a Color Matching Module (CMM) at the OS level color profiles for: devices, including DeviceLink profiles that transform one device profile (color space) to another device profile without passing through an intermediate color space, such as LAB, more accurately preserving color working spaces, the color spaces in which color data is meant to be manipulated There are other approaches to color management besides using ICC profiles. This is partly due to history and partly because of other needs than the ICC standard covers. The film and broadcasting industries make use of some of the same concepts, but they frequently rely on more limited boutique solutions. The film industry, for instance, often uses 3D LUTs (lookup table) to represent a complete color transformation for a specific RGB encoding. At the consumer level, system wide color management is available in most of Apple's products (macOS, iOS, iPadOS, watchOS). Microsoft Windows lacks system wide color management and virtually all applications do not employ color management. Windows' media player API is not color space aware, and if applications want to color manage videos manually, they have to incur significant performance and power consumption penalties. Android supports system wide color management, but most devices ship with color management disabled. == Overview == Characterize. Every color-managed device requires a personalized table, or "color profile," which characterizes the color response of that particular device. Standardize. Each color profile describes these colors relative to a standardized set of reference colors (the "Profile Connection Space"). Translate. Color-managed software then uses these standardized profiles to translate color from one device to another. This is usually performed by a color management module (CMM). == Hardware == === Characterization === To describe the behavior of various output devices, they must be compared (measured) in relation to a standard color space. Often a step called linearization is performed first, to undo the effect of gamma correction that was done to get the most out of limited 8-bit color paths. Instruments used for measuring device colors include colorimeters and spectrophotometers. As an intermediate result, the device gamut is described in the form of scattered measurement data. The transformation of the scattered measurement data into a more regular form, usable by the application, is called profiling. Profiling is a complex process involving mathematics, intense computation, judgment, testing, and iteration. After the profiling is finished, an idealized color description of the device is created. This description is called a profile. === Calibration === Calibration is like characterization, except that it can include the adjustment of the device, as opposed to just the measurement of the device. Color management is sometimes sidestepped by calibrating devices to a common standard color space such as sRGB; when such calibration is done well enough, no color translations are needed to get all devices to handle colors consistently. This avoidance of the complexity of color management was one of the goals in the development of sRGB. == Color profiles == === Embedding === Image formats themselves (such as TIFF, JPEG, PNG, EPS, PDF, and SVG) may contain embedded color profiles but are not required to do so by the image format. The International Color Consortium standard was created to bring various developers and manufacturers together. The ICC standard permits the exchange of output device characteristics and color spaces in the form of metadata. This allows the embedding of color profiles into images as well as storing them in a database or a profile directory. === Working spaces === Working spaces, such as sRGB, Adobe RGB or ProPhoto are color spaces that facilitate good results while editing. For instance, pixels with equal values of R,G,B should appear neutral. Using a large (gamut) working space will lead to posterization, while using a small working space will lead to clipping. This trade-off is a consideration for the critical image editor. == Color transformation == Color transformation, or color space conversion, is the transformation of the representation of a color from one color space to another. This calculation is required whenever data is exchanged inside a color-managed chain and carried out by a Color Matching Module. Transforming profiled color information to different output devices is achieved by referencing the profile data into a standard color space. It makes it easier to convert colors from one device to a selected standard color space and from that to the colors of another device. By ensuring that the reference color space covers the many possible colors that humans can see, this concept allows one to exchange colors between many different color output devices. Color transformations can be represented by two profiles (source profile and target profile) or by a devicelink profile. In this process there are approximations involved which make sure that the image keeps its important color qualities and also gives an opportunity to control on how the colors are being changed. === Profile connection space === In the terminology of the International Color Consortium, a translation between two color spaces can go through a profile connection space (PCS): Color Space 1 → PCS (CIELAB or CIEXYZ) → Color space 2; conversions into and out of the PCS are each specified by a profile. === Gamut mapping === In nearly every translation process, we have to deal with the fact that the color gamut of different devices vary in range which makes an accurate reproduction impossible. They therefore need some rearrangement near the borders of the gamut. Some colors must be shifted to the inside of the gamut, as they otherwise cannot be represented on the output device and would simply be clipped. This so-called gamut mismatch occurs for example, when we translate from the RGB color space with a wider gamut into the CMYK color space with a narrower gamut range. In this example, the dark highly saturated purplish-blue color of a typical computer monitor's "blue" primary is impossible to print on paper with a typical CMYK printer. The nearest approximation within the printer's gamut will be much less saturated. Conversely, an inkjet printer's "cyan" primary, a saturated mid-brightness blue, is outside the gamut of a typical computer monitor. The color management system can utilize various methods to achieve desired results and give experienced users control of the gamut mapping behavior. ==== Rendering intent ==== When the gamut of source color space exceeds that of the destination, saturated colors are liable to become clipped (inaccurately represented), or more formally burned. The color management module can deal with this problem in several ways. The ICC specification includes four different rendering intents, listed below. Before the actual rendering intent is carried out, one can temporarily simulate the rendering by soft proofing. It is a useful tool as it predicts the outcome of the colors and is available as an application in many color management systems: Absolute colorimetric Absolute colorimetry and relative colorimetry actually use the same table but differ in the adjust

    Read more →
  • Ni1000

    Ni1000

    The Ni1000 is an artificial neural network chip developed by Nestor Corporation and Intel, developed in the 1990s. It is Intel's second-generation neural network chip, but the first all-digital chip. The chip is aimed at image analysis applications– containing more than 3 million transistors – and can analyze 40,000 patterns per second. Prototypes running Nestor's OCR software in 1994 were capable of recognizing around 100 handwritten characters per second. The development was funded with money from DARPA and Office of Naval Research.

    Read more →
  • Ensemble learning

    Ensemble learning

    In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble consists of only a concrete finite set of alternative models, but typically allows for much more flexible structure to exist among those alternatives. == Overview == Supervised learning algorithms search through a hypothesis space to find a suitable hypothesis that will make good predictions with a particular problem. Even if this space contains hypotheses that are very well-suited for a particular problem, it may be very difficult to find a good one. Ensembles combine multiple hypotheses to form one which should be theoretically better. Ensemble learning trains two or more machine learning algorithms on a specific classification or regression task. The algorithms within the ensemble model are generally referred as "base models", "base learners", or "weak learners" in literature. These base models can be constructed using a single modelling algorithm, or several different algorithms. The idea is to train a diverse set of weak models on the same modelling task, such that the outputs of each weak learner have poor predictive ability (i.e., high bias), and among all weak learners, the outcome and error values exhibit high variance. Fundamentally, an ensemble learning model trains at least two high-bias (weak) and high-variance (diverse) models to be combined into a better-performing model. The set of weak models — which would not produce satisfactory predictive results individually — are combined or averaged to produce a single, high performing, accurate, and low-variance model to fit the task as required. Ensemble learning typically refers to bagging (bootstrap aggregating), boosting or stacking/blending techniques to induce high variance among the base models. Bagging creates diversity by generating random samples from the training observations and fitting the same model to each different sample — also known as homogeneous parallel ensembles. Boosting follows an iterative process by sequentially training each base model on the up-weighted errors of the previous base model, producing an additive model to reduce the final model errors — also known as sequential ensemble learning. Stacking or blending consists of different base models, each trained independently (i.e. diverse/high variance) to be combined into the ensemble model — producing a heterogeneous parallel ensemble. Common applications of ensemble learning include random forests (an extension of bagging), Boosted Tree models, and Gradient Boosted Tree Models. Models in applications of stacking are generally more task-specific — such as combining clustering techniques with other parametric and/or non-parametric techniques. Evaluating the prediction of an ensemble typically requires more computation than evaluating the prediction of a single model. In one sense, ensemble learning may be thought of as a way to compensate for poor learning algorithms by performing a lot of extra computation. On the other hand, the alternative is to do a lot more learning with one non-ensemble model. An ensemble may be more efficient at improving overall accuracy for the same increase in compute, storage, or communication resources by using that increase on two or more methods, than would have been improved by increasing resource use for a single method. Fast algorithms such as decision trees are commonly used in ensemble methods (e.g., random forests), although slower algorithms can benefit from ensemble techniques as well. By analogy, ensemble techniques have been used also in unsupervised learning scenarios, for example in consensus clustering or in anomaly detection. == Ensemble theory == Empirically, ensembles tend to yield better results when there is a significant diversity among the models. Many ensemble methods, therefore, seek to promote diversity among the models they combine. Although perhaps non-intuitive, more random algorithms (like random decision trees) can be used to produce a stronger ensemble than very deliberate algorithms (like entropy-reducing decision trees). Using a variety of strong learning algorithms, however, has been shown to be more effective than using techniques that attempt to dumb-down the models in order to promote diversity. It is possible to increase diversity in the training stage of the model using correlation for regression tasks or using information measures such as cross entropy for classification tasks. Theoretically, one can justify the diversity concept because the lower bound of the error rate of an ensemble system can be decomposed into accuracy, diversity, and the other term. === The geometric framework === Ensemble learning, including both regression and classification tasks, can be explained using a geometric framework. Within this framework, the output of each individual classifier or regressor for the entire dataset can be viewed as a point in a multi-dimensional space. Additionally, the target result is also represented as a point in this space, referred to as the "ideal point." The Euclidean distance is used as the metric to measure both the performance of a single classifier or regressor (the distance between its point and the ideal point) and the dissimilarity between two classifiers or regressors (the distance between their respective points). This perspective transforms ensemble learning into a deterministic problem. For example, within this geometric framework, it can be proved that the averaging of the outputs (scores) of all base classifiers or regressors can lead to equal or better results than the average of all the individual models. It can also be proved that if the optimal weighting scheme is used, then a weighted averaging approach can outperform any of the individual classifiers or regressors that make up the ensemble or as good as the best performer at least. == Ensemble size == While the number of component classifiers of an ensemble has a great impact on the accuracy of prediction, there is a limited number of studies addressing this problem. A priori determining of ensemble size and the volume and velocity of big data streams make this even more crucial for online ensemble classifiers. Mostly statistical tests were used for determining the proper number of components. More recently, a theoretical framework suggested that there is an ideal number of component classifiers for an ensemble such that having more or less than this number of classifiers would deteriorate the accuracy. It is called "the law of diminishing returns in ensemble construction." Their theoretical framework shows that using the same number of independent component classifiers as class labels gives the highest accuracy. == Common types of ensembles == === Bayes optimal classifier === The Bayes optimal classifier is a classification technique. It is an ensemble of all the hypotheses in the hypothesis space. On average, no other ensemble can outperform it. The Naive Bayes classifier is a version of this that assumes that the data is conditionally independent on the class and makes the computation more feasible. Each hypothesis is given a vote proportional to the likelihood that the training dataset would be sampled from a system if that hypothesis were true. To facilitate training data of finite size, the vote of each hypothesis is also multiplied by the prior probability of that hypothesis. The Bayes optimal classifier can be expressed with the following equation: y = a r g m a x c j ∈ C ∑ h i ∈ H P ( c j | h i ) P ( T | h i ) P ( h i ) {\displaystyle y={\underset {c_{j}\in C}{\mathrm {argmax} }}\sum _{h_{i}\in H}{P(c_{j}|h_{i})P(T|h_{i})P(h_{i})}} where y {\displaystyle y} is the predicted class, C {\displaystyle C} is the set of all possible classes, H {\displaystyle H} is the hypothesis space, P {\displaystyle P} refers to a probability, and T {\displaystyle T} is the training data. As an ensemble, the Bayes optimal classifier represents a hypothesis that is not necessarily in H {\displaystyle H} . The hypothesis represented by the Bayes optimal classifier, however, is the optimal hypothesis in ensemble space (the space of all possible ensembles consisting only of hypotheses in H {\displaystyle H} ). This formula can be restated using Bayes' theorem, which says that the posterior is proportional to the likelihood times the prior: P ( h i | T ) ∝ P ( T | h i ) P ( h i ) {\displaystyle P(h_{i}|T)\propto P(T|h_{i})P(h_{i})} hence, y = a r g m a x c j ∈ C ∑ h i ∈ H P ( c j | h i ) P ( h i | T ) {\displaystyle y={\underset {c_{j}\in C}{\mathrm {argmax} }}\sum _{h_{i}\in H}{P(c_{j}|h_{i})P(h_{i}|T)}} === Bootstrap aggregating (bagging) === Bootstrap aggregation (bagging) involves training an ensemble on bootstrapped data sets. A bootstrapped set is cr

    Read more →
  • Proper generalized decomposition

    Proper generalized decomposition

    The proper generalized decomposition (PGD) is an iterative numerical method for solving boundary value problems (BVPs), that is, partial differential equations constrained by a set of boundary conditions, such as the Poisson's equation or the Laplace's equation. The PGD algorithm computes an approximation of the solution of the BVP by successive enrichment. This means that, in each iteration, a new component (or mode) is computed and added to the approximation. In principle, the more modes obtained, the closer the approximation is to its theoretical solution. Unlike POD principal components, PGD modes are not necessarily orthogonal to each other. By selecting only the most relevant PGD modes, a reduced order model of the solution is obtained. Because of this, PGD is considered a dimensionality reduction algorithm. == Description == The proper generalized decomposition is a method characterized by a variational formulation of the problem, a discretization of the domain in the style of the finite element method, the assumption that the solution can be approximated as a separate representation and a numerical greedy algorithm to find the solution. === Variational formulation === In the Proper Generalized Decomposition method, the variational formulation involves translating the problem into a format where the solution can be approximated by minimizing (or sometimes maximizing) a functional. A functional is a scalar quantity that depends on a function, which in this case, represents our problem. The most commonly implemented variational formulation in PGD is the Bubnov-Galerkin method. This method is chosen for its ability to provide an approximate solution to complex problems, such as those described by partial differential equations (PDEs). In the Bubnov-Galerkin approach, the idea is to project the problem onto a space spanned by a finite number of basis functions. These basis functions are chosen to approximate the solution space of the problem. In the Bubnov-Galerkin method, we seek an approximate solution that satisfies the integral form of the PDEs over the domain of the problem. This is different from directly solving the differential equations. By doing so, the method transforms the problem into finding the coefficients that best fit this integral equation in the chosen function space. While the Bubnov-Galerkin method is prevalent, other variational formulations are also used in PGD, depending on the specific requirements and characteristics of the problem, such as: Petrov-Galerkin Method: This method is similar to the Bubnov-Galerkin approach but differs in the choice of test functions. In the Petrov-Galerkin method, the test functions (used to project the residual of the differential equation) are different from the trial functions (used to approximate the solution). This can lead to improved stability and accuracy for certain types of problems. Collocation Method: In collocation methods, the differential equation is satisfied at a finite number of points in the domain, known as collocation points. This approach can be simpler and more direct than the integral-based methods like Galerkin's, but it may also be less stable for some problems. Least Squares Method: This approach involves minimizing the square of the residual of the differential equation over the domain. It is particularly useful when dealing with problems where traditional methods struggle with stability or convergence. Mixed Finite Element Method: In mixed methods, additional variables (such as fluxes or gradients) are introduced and approximated along with the primary variable of interest. This can lead to more accurate and stable solutions for certain problems, especially those involving incompressibility or conservation laws. Discontinuous Galerkin Method: This is a variant of the Galerkin method where the solution is allowed to be discontinuous across element boundaries. This method is particularly useful for problems with sharp gradients or discontinuities. === Domain discretization === The discretization of the domain is a well defined set of procedures that cover (a) the creation of finite element meshes, (b) the definition of basis function on reference elements (also called shape functions) and (c) the mapping of reference elements onto the elements of the mesh. === Separate representation === PGD assumes that the solution u of a (multidimensional) problem can be approximated as a separate representation of the form u ≈ u N ( x 1 , x 2 , … , x d ) = ∑ i = 1 N X 1 i ( x 1 ) ⋅ X 2 i ( x 2 ) ⋯ X d i ( x d ) , {\displaystyle \mathbf {u} \approx \mathbf {u} ^{N}(x_{1},x_{2},\ldots ,x_{d})=\sum _{i=1}^{N}\mathbf {X_{1}} _{i}(x_{1})\cdot \mathbf {X_{2}} _{i}(x_{2})\cdots \mathbf {X_{d}} _{i}(x_{d}),} where the number of addends N and the functional products X1(x1), X2(x2), ..., Xd(xd), each depending on a variable (or variables), are unknown beforehand. === Greedy algorithm === The solution is sought by applying a greedy algorithm, usually the fixed point algorithm, to the weak formulation of the problem. For each iteration i of the algorithm, a mode of the solution is computed. Each mode consists of a set of numerical values of the functional products X1(x1), ..., Xd(xd), which enrich the approximation of the solution. Due to the greedy nature of the algorithm, the term 'enrich' is used rather than 'improve', since some modes may actually worsen the approach. The number of computed modes required to obtain an approximation of the solution below a certain error threshold depends on the stopping criterion of the iterative algorithm. == Features == PGD is suitable for solving high-dimensional problems, since it overcomes the limitations of classical approaches. In particular, PGD avoids the curse of dimensionality, as solving decoupled problems is computationally much less expensive than solving multidimensional problems. Therefore, PGD enables to re-adapt parametric problems into a multidimensional framework by setting the parameters of the problem as extra coordinates: u ≈ u N ( x 1 , … , x d ; k 1 , … , k p ) = ∑ i = 1 N X 1 i ( x 1 ) ⋯ X d i ( x d ) ⋅ K 1 i ( k 1 ) ⋯ K p i ( k p ) , {\displaystyle \mathbf {u} \approx \mathbf {u} ^{N}(x_{1},\ldots ,x_{d};k_{1},\ldots ,k_{p})=\sum _{i=1}^{N}\mathbf {X_{1}} _{i}(x_{1})\cdots \mathbf {X_{d}} _{i}(x_{d})\cdot \mathbf {K_{1}} _{i}(k_{1})\cdots \mathbf {K_{p}} _{i}(k_{p}),} where a series of functional products K1(k1), K2(k2), ..., Kp(kp), each depending on a parameter (or parameters), has been incorporated to the equation. In this case, the obtained approximation of the solution is called computational vademecum: a general meta-model containing all the particular solutions for every possible value of the involved parameters. == Sparse Subspace Learning == The Sparse Subspace Learning (SSL) method leverages the use of hierarchical collocation to approximate the numerical solution of parametric models. With respect to traditional projection-based reduced order modeling, the use of a collocation enables non-intrusive approach based on sparse adaptive sampling of the parametric space. This allows to recover the lowdimensional structure of the parametric solution subspace while also learning the functional dependency from the parameters in explicit form. A sparse low-rank approximate tensor representation of the parametric solution can be built through an incremental strategy that only needs to have access to the output of a deterministic solver. Non-intrusiveness makes this approach straightforwardly applicable to challenging problems characterized by nonlinearity or non affine weak forms.

    Read more →
  • Clip Studio Paint

    Clip Studio Paint

    Clip Studio Paint (previously marketed as Manga Studio in North America), informally known in Japan as Kurisuta (クリスタ), is a family of software applications developed by Japanese graphics software company Celsys. It is used for the digital creation of comics, general illustration, and 2D animation. The software is available in versions for macOS, Windows, iOS, iPadOS, Android, and ChromeOS. The program is widely used by amateur and professional comics creators, and animation studios. The application is sold in editions with varying feature sets. The full-featured edition is a page-based, layered drawing program, with support for bitmap and vector art, text, imported 3D models, and frame-by-frame animation. It is designed for use with a stylus and a graphics tablet or tablet computer. It has drawing tools which emulate natural media such as pencils, ink pens, and brushes, as well as patterns and decorations. It is distinguished from similar programs by features designed for creating comics: tools for creating panel layouts, perspective rulers, sketching, inking, applying tones and textures, coloring, and creating word balloons and captions. == History == The application has it origins in a program for macOS and Windows, released in Japan in 2001 as "Comic Studio". It was sold as "Manga Studio" in the Western market by E Frontier America until 2007, then by Smith Micro Software. Early versions were designed for creating black and white art with only spot color (a typical format for Japanese manga), with version 4 adding support for full-color art. Celsys developed Clip Studio Paint as a replacement for this product, based on the company's Illust Studio application, and it was released on May 31, 2012. It was initially distributed in Western markets as "Manga Studio 5", but in 2016, the branding was unified worldwide as "Clip Studio Paint". At this time, version 1.5.4 introduced a new file format (extension .clip) and frame-by-frame animation. In late 2017, Celsys took over direct support for the software worldwide, and ceased its relationship with Smith Micro. In July 2018, Celsys began a partnership with Graphixly for distribution in North America, South America, and Europe. Clip Studio Paint for the Apple iPad was introduced in November 2017, and for the iPhone in December 2019. Clip Studio Paint for Samsung Galaxy tablets and smartphones was released in August 2020 on the Galaxy Store, with versions for other Android devices and Chromebooks released in December. The Windows and macOS versions of the software have been sold and distributed either from the developer's web site or on DVD, and purchased either with a perpetual license or an ongoing subscription. The versions for iPhone, iPad, and Android-based devices are distributed through the corresponding app stores free of charge, but require a subscription – which includes cloud storage – for unrestricted use. Without a subscription, the tablet versions can be used only for a specified number of months, and the phone versions can be used only for 30 hours per month. From 2013 to 2023, regular updates for version 1 were distributed free of additional charge to both perpetual and subscription users. Since the release of version 2 in 2023, feature updates are included only in subscription plans and are available to perpetual licenses at an additional cost. Perpetual licenses can be upgraded permanently or with an annual "update pass". The "update pass" provides early access to features to be included in subsequent perpetual licenses for 12 months, after which the software reverts to the original license if not renewed. In March 2024, version 3 was released, and version 4 introduced additional features in March 2025. == Editions == Clip Studio Paint is available in three editions, with differing feature sets and prices: Debut (bundle-only grade), Pro (adding support for vector-based drawing, custom textures, and comics-focused features), and EX (adding support for multi-page documents, book exporting, and 2D animation). Companion programs include Clip Studio (for managing and sharing digital assets distributed through the Clip Studio web site, managing licenses, and getting updates and support) and Clip Studio Modeler (for setting up 3D materials to use in Clip Studio Paint).

    Read more →
  • Learning classifier system

    Learning classifier system

    Learning classifier systems, or LCS, are a paradigm of rule-based machine learning methods that combine a discovery component (e.g. typically a genetic algorithm in evolutionary computation) with a learning component (performing either supervised learning, reinforcement learning, or unsupervised learning). Learning classifier systems seek to identify a set of context-dependent rules that collectively store and apply knowledge in a piecewise manner in order to make predictions (e.g. behavior modeling, classification, data mining, regression, function approximation, or game strategy). This approach allows complex solution spaces to be broken up into smaller, simpler parts for the reinforcement learning that is inside artificial intelligence research. The founding concepts behind learning classifier systems came from attempts to model complex adaptive systems, using rule-based agents to form an artificial cognitive system (i.e. artificial intelligence). == Methodology == The architecture and components of a given learning classifier system can be quite variable. It is useful to think of an LCS as a machine consisting of several interacting components. Components may be added or removed, or existing components modified/exchanged to suit the demands of a given problem domain (like algorithmic building blocks) or to make the algorithm flexible enough to function in many different problem domains. As a result, the LCS paradigm can be flexibly applied to many problem domains that call for machine learning. The major divisions among LCS implementations are as follows: (1) Michigan-style architecture vs. Pittsburgh-style architecture, (2) reinforcement learning vs. supervised learning, (3) incremental learning vs. batch learning, (4) online learning vs. offline learning, (5) strength-based fitness vs. accuracy-based fitness, and (6) complete action mapping vs best action mapping. These divisions are not necessarily mutually exclusive. For example, XCS, the best known and best studied LCS algorithm, is Michigan-style, was designed for reinforcement learning but can also perform supervised learning, applies incremental learning that can be either online or offline, applies accuracy-based fitness, and seeks to generate a complete action mapping. === Elements of a generic LCS algorithm === Keeping in mind that LCS is a paradigm for genetic-based machine learning rather than a specific method, the following outlines key elements of a generic, modern (i.e. post-XCS) LCS algorithm. For simplicity let us focus on Michigan-style architecture with supervised learning. See the illustrations on the right laying out the sequential steps involved in this type of generic LCS. ==== Environment ==== The environment is the source of data upon which an LCS learns. It can be an offline, finite training dataset (characteristic of a data mining, classification, or regression problem), or an online sequential stream of live training instances. Each training instance is assumed to include some number of features (also referred to as attributes, or independent variables), and a single endpoint of interest (also referred to as the class, action, phenotype, prediction, or dependent variable). Part of LCS learning can involve feature selection, therefore not all of the features in the training data need to be informative. The set of feature values of an instance is commonly referred to as the state. For simplicity let's assume an example problem domain with Boolean/binary features and a Boolean/binary class. For Michigan-style systems, one instance from the environment is trained on each learning cycle (i.e. incremental learning). Pittsburgh-style systems perform batch learning, where rule sets are evaluated in each iteration over much or all of the training data. ==== Rule/classifier/population ==== A rule is a context dependent relationship between state values and some prediction. Rules typically take the form of an {IF:THEN} expression, (e.g. {IF 'condition' THEN 'action'}, or as a more specific example, {IF 'red' AND 'octagon' THEN 'stop-sign'}). A critical concept in LCS and rule-based machine learning alike, is that an individual rule is not in itself a model, since the rule is only applicable when its condition is satisfied. Think of a rule as a "local-model" of the solution space. Rules can be represented in many different ways to handle different data types (e.g. binary, discrete-valued, ordinal, continuous-valued). Given binary data LCS traditionally applies a ternary rule representation (i.e. rules can include either a 0, 1, or '#' for each feature in the data). The 'don't care' symbol (i.e. '#') serves as a wild card within a rule's condition allowing rules, and the system as a whole to generalize relationships between features and the target endpoint to be predicted. Consider the following rule (#1###0 ~ 1) (i.e. condition ~ action). This rule can be interpreted as: IF the second feature = 1 AND the sixth feature = 0 THEN the class prediction = 1. We would say that the second and sixth features were specified in this rule, while the others were generalized. This rule, and the corresponding prediction are only applicable to an instance when the condition of the rule is satisfied by the instance. This is more commonly referred to as matching. In Michigan-style LCS, each rule has its own fitness, as well as a number of other rule-parameters associated with it that can describe the number of copies of that rule that exist (i.e. the numerosity), the age of the rule, its accuracy, or the accuracy of its reward predictions, and other descriptive or experiential statistics. A rule along with its parameters is often referred to as a classifier. In Michigan-style systems, classifiers are contained within a population [P] that has a user defined maximum number of classifiers. Unlike most stochastic search algorithms (e.g. evolutionary algorithms), LCS populations start out empty (i.e. there is no need to randomly initialize a rule population). Classifiers will instead be initially introduced to the population with a covering mechanism. In any LCS, the trained model is a set of rules/classifiers, rather than any single rule/classifier. In Michigan-style LCS, the entire trained (and optionally, compacted) classifier population forms the prediction model. ==== Matching ==== One of the most critical and often time-consuming elements of an LCS is the matching process. The first step in an LCS learning cycle takes a single training instance from the environment and passes it to [P] where matching takes place. In step two, every rule in [P] is now compared to the training instance to see which rules match (i.e. are contextually relevant to the current instance). In step three, any matching rules are moved to a match set [M]. A rule matches a training instance if all feature values specified in the rule condition are equivalent to the corresponding feature value in the training instance. For example, assuming the training instance is (001001 ~ 0), these rules would match: (###0## ~ 0), (00###1 ~ 0), (#01001 ~ 1), but these rules would not (1##### ~ 0), (000##1 ~ 0), (#0#1#0 ~ 1). Notice that in matching, the endpoint/action specified by the rule is not taken into consideration. As a result, the match set may contain classifiers that propose conflicting actions. In the fourth step, since we are performing supervised learning, [M] is divided into a correct set [C] and an incorrect set [I]. A matching rule goes into the correct set if it proposes the correct action (based on the known action of the training instance), otherwise it goes into [I]. In reinforcement learning LCS, an action set [A] would be formed here instead, since the correct action is not known. ==== Covering ==== At this point in the learning cycle, if no classifiers made it into either [M] or [C] (as would be the case when the population starts off empty), the covering mechanism is applied (fifth step). Covering is a form of online smart population initialization. Covering randomly generates a rule that matches the current training instance (and in the case of supervised learning, that rule is also generated with the correct action. Assuming the training instance is (001001 ~ 0), covering might generate any of the following rules: (#0#0## ~ 0), (001001 ~ 0), (#010## ~ 0). Covering not only ensures that each learning cycle there is at least one correct, matching rule in [C], but that any rule initialized into the population will match at least one training instance. This prevents LCS from exploring the search space of rules that do not match any training instances. ==== Parameter updates/credit assignment/learning ==== In the sixth step, the rule parameters of any rule in [M] are updated to reflect the new experience gained from the current training instance. Depending on the LCS algorithm, a number of updates can take place at this step. For supervised learning, we can simply update the accuracy/error of a

    Read more →
  • Random indexing

    Random indexing

    Random indexing is a dimensionality reduction method and computational framework for distributional semantics, based on the insight that very-high-dimensional vector space model implementations are impractical, that models need not grow in dimensionality when new items (e.g. new terminology) are encountered, and that a high-dimensional model can be projected into a space of lower dimensionality without compromising L2 distance metrics if the resulting dimensions are chosen appropriately. This is the original point of the random projection approach to dimension reduction first formulated as the Johnson–Lindenstrauss lemma, and locality-sensitive hashing has some of the same starting points. Random indexing, as used in representation of language, originates from the work of Pentti Kanerva on sparse distributed memory, and can be described as an incremental formulation of a random projection. It can be also verified that random indexing is a random projection technique for the construction of Euclidean spaces—i.e. L2 normed vector spaces. In Euclidean spaces, random projections are elucidated using the Johnson–Lindenstrauss lemma. The TopSig technique extends the random indexing model to produce bit vectors for comparison with the Hamming distance similarity function. It is used for improving the performance of information retrieval and document clustering. In a similar line of research, Random Manhattan Integer Indexing (RMII) is proposed for improving the performance of the methods that employ the Manhattan distance between text units. Many random indexing methods primarily generate similarity from co-occurrence of items in a corpus. Reflexive Random Indexing (RRI) generates similarity from co-occurrence and from shared occurrence with other items.

    Read more →
  • FastICA

    FastICA

    FastICA is an efficient and popular algorithm for independent component analysis invented by Aapo Hyvärinen at Helsinki University of Technology. Like most ICA algorithms, FastICA seeks an orthogonal rotation of prewhitened data, through a fixed-point iteration scheme, that maximizes a measure of non-Gaussianity of the rotated components. Non-gaussianity serves as a proxy for statistical independence, which is a very strong condition and requires infinite data to verify. FastICA can also be alternatively derived as an approximative Newton iteration. == Algorithm == === Prewhitening the data === Let the X := ( x i j ) ∈ R N × M {\displaystyle \mathbf {X} :=(x_{ij})\in \mathbb {R} ^{N\times M}} denote the input data matrix, M {\displaystyle M} the number of columns corresponding with the number of samples of mixed signals and N {\displaystyle N} the number of rows corresponding with the number of independent source signals. The input data matrix X {\displaystyle \mathbf {X} } must be prewhitened, or centered and whitened, before applying the FastICA algorithm to it. Centering the data entails demeaning each component of the input data X {\displaystyle \mathbf {X} } , that is, for each i = 1 , … , N {\displaystyle i=1,\ldots ,N} and j = 1 , … , M {\displaystyle j=1,\ldots ,M} . After centering, each row of X {\displaystyle \mathbf {X} } has an expected value of 0 {\displaystyle 0} . Whitening the data requires a linear transformation L : R N × M → R N × M {\displaystyle \mathbf {L} :\mathbb {R} ^{N\times M}\to \mathbb {R} ^{N\times M}} of the centered data so that the components of L ( X ) {\displaystyle \mathbf {L} (\mathbf {X} )} are uncorrelated and have variance one. More precisely, if X {\displaystyle \mathbf {X} } is a centered data matrix, the covariance of L x := L ( X ) {\displaystyle \mathbf {L} _{\mathbf {x} }:=\mathbf {L} (\mathbf {X} )} is the ( N × N ) {\displaystyle (N\times N)} -dimensional identity matrix, that is, A common method for whitening is by performing an eigenvalue decomposition on the covariance matrix of the centered data X {\displaystyle \mathbf {X} } , E { X X T } = E D E T {\displaystyle E\left\{\mathbf {X} \mathbf {X} ^{T}\right\}=\mathbf {E} \mathbf {D} \mathbf {E} ^{T}} , where E {\displaystyle \mathbf {E} } is the matrix of eigenvectors and D {\displaystyle \mathbf {D} } is the diagonal matrix of eigenvalues. The whitened data matrix is defined thus by === Single component extraction === The iterative algorithm finds the direction for the weight vector w ∈ R N {\displaystyle \mathbf {w} \in \mathbb {R} ^{N}} that maximizes a measure of non-Gaussianity of the projection w T X {\displaystyle \mathbf {w} ^{T}\mathbf {X} } , with X ∈ R N × M {\displaystyle \mathbf {X} \in \mathbb {R} ^{N\times M}} denoting a prewhitened data matrix as described above. Note that w {\displaystyle \mathbf {w} } is a column vector. To measure non-Gaussianity, FastICA relies on a nonquadratic nonlinear function f ( u ) {\displaystyle f(u)} , its first derivative g ( u ) {\displaystyle g(u)} , and its second derivative g ′ ( u ) {\displaystyle g^{\prime }(u)} . Hyvärinen states that the functions are useful for general purposes, while may be highly robust. The steps for extracting the weight vector w {\displaystyle \mathbf {w} } for single component in FastICA are the following: Randomize the initial weight vector w {\displaystyle \mathbf {w} } Let w + ← E { X g ( w T X ) T } − E { g ′ ( w T X ) } w {\displaystyle \mathbf {w} ^{+}\leftarrow E\left\{\mathbf {X} g(\mathbf {w} ^{T}\mathbf {X} )^{T}\right\}-E\left\{g'(\mathbf {w} ^{T}\mathbf {X} )\right\}\mathbf {w} } , where E { . . . } {\displaystyle E\left\{...\right\}} means averaging over all column-vectors of matrix X {\displaystyle \mathbf {X} } Let w ← w + / ‖ w + ‖ {\displaystyle \mathbf {w} \leftarrow \mathbf {w} ^{+}/\|\mathbf {w} ^{+}\|} If not converged, go back to 2 === Multiple component extraction === The single unit iterative algorithm estimates only one weight vector which extracts a single component. Estimating additional components that are mutually "independent" requires repeating the algorithm to obtain linearly independent projection vectors - note that the notion of independence here refers to maximizing non-Gaussianity in the estimated components. Hyvärinen provides several ways of extracting multiple components with the simplest being the following. Here, 1 M {\displaystyle \mathbf {1_{M}} } is a column vector of 1's of dimension M {\displaystyle M} . Algorithm FastICA Input: C {\displaystyle C} Number of desired components Input: X ∈ R N × M {\displaystyle \mathbf {X} \in \mathbb {R} ^{N\times M}} Prewhitened matrix, where each column represents an N {\displaystyle N} -dimensional sample, where C <= N {\displaystyle C<=N} Output: W ∈ R N × C {\displaystyle \mathbf {W} \in \mathbb {R} ^{N\times C}} Un-mixing matrix where each column projects X {\displaystyle \mathbf {X} } onto independent component. Output: S ∈ R C × M {\displaystyle \mathbf {S} \in \mathbb {R} ^{C\times M}} Independent components matrix, with M {\displaystyle M} columns representing a sample with C {\displaystyle C} dimensions. for p in 1 to C: w p ← {\displaystyle \mathbf {w_{p}} \leftarrow } Random vector of length N while w p {\displaystyle \mathbf {w_{p}} } changes w p ← 1 M X g ( w p T X ) T − 1 M g ′ ( w p T X ) 1 M w p {\displaystyle \mathbf {w_{p}} \leftarrow {\frac {1}{M}}\mathbf {X} g(\mathbf {w_{p}} ^{T}\mathbf {X} )^{T}-{\frac {1}{M}}g'(\mathbf {w_{p}} ^{T}\mathbf {X} )\mathbf {1_{M}} \mathbf {w_{p}} } w p ← w p − ∑ j = 1 p − 1 ( w p T w j ) w j {\displaystyle \mathbf {w_{p}} \leftarrow \mathbf {w_{p}} -\sum _{j=1}^{p-1}(\mathbf {w_{p}} ^{T}\mathbf {w_{j}} )\mathbf {w_{j}} } w p ← w p ‖ w p ‖ {\displaystyle \mathbf {w_{p}} \leftarrow {\frac {\mathbf {w_{p}} }{\|\mathbf {w_{p}} \|}}} output W ← [ w 1 , … , w C ] {\displaystyle \mathbf {W} \leftarrow {\begin{bmatrix}\mathbf {w_{1}} ,\dots ,\mathbf {w_{C}} \end{bmatrix}}} output S ← W T X {\displaystyle \mathbf {S} \leftarrow \mathbf {W^{T}} \mathbf {X} }

    Read more →
  • Ray tracing (graphics)

    Ray tracing (graphics)

    In 3D computer graphics, ray tracing is a technique for modeling light transport for use in a wide variety of rendering algorithms for generating digital images. On a spectrum of computational cost and visual fidelity, ray tracing-based rendering techniques, such as ray casting, recursive ray tracing, distribution ray tracing, photon mapping and path tracing, are generally slower and higher fidelity than scanline rendering methods. Thus, ray tracing was first deployed in applications where taking a relatively long time to render could be tolerated, such as still CGI images, and film and television visual effects (VFX), but was less suited to real-time applications such as video games, where speed is critical in rendering each frame. Since 2018, however, hardware acceleration for real-time ray tracing has become standard on new commercial graphics cards, and graphics APIs have followed suit, allowing developers to use hybrid ray tracing and rasterization-based rendering in games and other real-time applications with a lesser hit to frame render times. Ray tracing is capable of simulating a variety of optical effects, such as reflection, refraction, soft shadows, scattering, depth of field, motion blur, caustics, ambient occlusion and dispersion phenomena (such as chromatic aberration). It can also be used to trace the path of sound waves in a similar fashion to light waves, making it a viable option for more immersive sound design in video games by rendering realistic reverberation and echoes. In fact, any physical wave or particle phenomenon with approximately linear motion can be simulated with ray tracing. Ray tracing–based rendering techniques that sample light over a domain typically generate multiple rays and often rely on denoising to reduce the resulting noise. == History == The idea of ray tracing comes from as early as the 16th century, when it was described by Albrecht Dürer, who is credited for its invention. Dürer described multiple techniques for projecting 3-D scenes onto an image plane. Some of these project chosen geometry onto the image plane, as is done with rasterization today. Others determine what geometry is visible along a given ray, as is done with ray tracing. Using a computer for ray tracing to generate shaded pictures was first accomplished by Arthur Appel in 1968. Appel used ray tracing for primary visibility (determining the closest surface to the camera at each image point) by tracing a ray through each point to be shaded into the scene to identify the visible surface. The closest surface intersected by the ray was the visible one. This non-recursive ray tracing-based rendering algorithm is today called "ray casting". His algorithm then traced secondary rays to the light source from each point being shaded to determine whether the point was in shadow or not. Later, in 1971, Goldstein and Nagel of MAGI (Mathematical Applications Group, Inc.) published "3-D Visual Simulation", wherein ray tracing was used to make shaded pictures of solids. At the ray-surface intersection point found, they computed the surface normal and, knowing the position of the light source, computed the brightness of the pixel on the screen. Their publication describes a short (30-second) film "made using the University of Maryland's display hardware outfitted with a 16mm camera. The film showed the helicopter and a simple ground-level gun emplacement. The helicopter was programmed to undergo a series of maneuvers including turns, take-offs, and landings, etc., until it eventually is shot down and crashed." A CDC 6600 computer was used. MAGI produced an animation video called MAGI/SynthaVision Sampler in 1974. Another early instance of ray casting came in 1976, when Scott Roth created a flip book animation in Bob Sproull's computer graphics course at Caltech. The scanned pages are shown as a video in the accompanying image. Roth's computer program noted an edge point at a pixel location if the ray intersected a bounded plane different from that of its neighbors. Of course, a ray could intersect multiple planes in space, but only the surface point closest to the camera was noted as visible. The platform was a DEC PDP-10, a Tektronix storage-tube display, and a printer which would create an image of the display on rolling thermal paper. Roth extended the framework, introduced the term ray casting in the context of computer graphics and solid modeling, and in 1982 published his work while at GM Research Labs. Turner Whitted was the first to show recursive ray tracing for mirror reflection and for refraction through translucent objects, with an angle determined by the solid's index of refraction, and to use ray tracing for anti-aliasing. Whitted also showed ray traced shadows. He produced a recursive ray traced film called The Compleat Angler in 1979 while an engineer at Bell Labs. Whitted's deeply recursive ray tracing algorithm reframed rendering from being primarily a matter of surface visibility determination to being a matter of light transport. His paper inspired a series of subsequent work by others that included distribution ray tracing and finally unbiased path tracing, which provides the rendering equation framework that has allowed computer-generated imagery to be faithful to reality. For decades, global illumination in major films using computer-generated imagery was approximated with additional lights. Ray tracing-based rendering eventually changed that by enabling physically based light transport. Early feature films rendered entirely using path tracing include Monster House (2006), Cloudy with a Chance of Meatballs (2009), and Monsters University (2013). == Algorithm overview == Optical ray tracing describes a method for producing visual images constructed in 3D computer graphics environments, with more photorealism than either ray casting or scanline rendering techniques. It works by tracing a path from an imaginary eye through each pixel in a virtual screen, and calculating the color of the object visible through it. Scenes in ray tracing are described mathematically by a programmer or by a visual artist (normally using intermediary tools). Scenes may also incorporate data from images and models captured by means such as digital photography. Typically, each ray must be tested for intersection with some subset of all the objects in the scene. Once the nearest object has been identified, the algorithm will estimate the incoming light at the point of intersection, examine the material properties of the object, and combine this information to calculate the final color of the pixel. Certain illumination algorithms and reflective or translucent materials may require more rays to be re-cast into the scene. It may at first seem counterintuitive or "backward" to send rays away from the camera, rather than into it (as actual light does in reality), but doing so is many orders of magnitude more efficient. Since the overwhelming majority of light rays from a given light source do not make it directly into the viewer's eye, a "forward" simulation could potentially waste a tremendous amount of computation on light paths that are never recorded. Therefore, the shortcut taken in ray tracing is to presuppose that a given ray intersects the view frame. After either a maximum number of reflections or a ray traveling a certain distance without intersection, the ray ceases to travel and the pixel's value is updated. === Calculate rays for rectangular viewport === On input we have (in calculation we use vector normalization and cross product): E ∈ R 3 {\displaystyle E\in \mathbb {R^{3}} } eye position T ∈ R 3 {\displaystyle T\in \mathbb {R^{3}} } target position θ ∈ [ 0 , π ] {\displaystyle \theta \in [0,\pi ]} field of view - for humans, we can assume ≈ π / 2 rad = 90 ∘ {\displaystyle \approx \pi /2{\text{ rad}}=90^{\circ }} m , k ∈ N {\displaystyle m,k\in \mathbb {N} } numbers of square pixels on viewport vertical and horizontal direction i , j ∈ N , 1 ≤ i ≤ k ∧ 1 ≤ j ≤ m {\displaystyle i,j\in \mathbb {N} ,1\leq i\leq k\land 1\leq j\leq m} numbers of actual pixel v → ∈ R 3 {\displaystyle {\vec {v}}\in \mathbb {R^{3}} } vertical vector which indicates where is up and down, usually v → = [ 0 , 1 , 0 ] {\displaystyle {\vec {v}}=[0,1,0]} - roll component which determine viewport rotation around point C (where the axis of rotation is the ET section) The idea is to find the position of each viewport pixel center P i j {\displaystyle P_{ij}} which allows us to find the line going from eye E {\displaystyle E} through that pixel and finally get the ray described by point E {\displaystyle E} and vector R → i j = P i j − E {\displaystyle {\vec {R}}_{ij}=P_{ij}-E} (or its normalization r → i j {\displaystyle {\vec {r}}_{ij}} ). First we need to find the coordinates of the bottom left viewport pixel P 1 m {\displaystyle P_{1m}} and find the next pixel by making a shift along directions parallel to viewport (vectors b → n {\displaystyle {\vec {b}}_{n

    Read more →
  • Quadratic unconstrained binary optimization

    Quadratic unconstrained binary optimization

    Quadratic unconstrained binary optimization (QUBO), also known as unconstrained binary quadratic programming (UBQP), is a combinatorial optimization problem with a wide range of applications from finance and economics to machine learning. QUBO is an NP hard problem, and for many classical problems from theoretical computer science, like maximum cut, graph coloring and the partition problem, embeddings into QUBO have been formulated. Embeddings for machine learning models include support-vector machines, clustering and probabilistic graphical models. Moreover, due to its close connection to Ising models, QUBO constitutes a central problem class for adiabatic quantum computation, where it is solved through a physical process called quantum annealing. == Definition == Let B = { 0 , 1 } {\displaystyle \mathbb {B} =\lbrace 0,1\rbrace } the set of binary digits (or bits), then B n {\displaystyle \mathbb {B} ^{n}} is the set of binary vectors of fixed length n ∈ N {\displaystyle n\in \mathbb {N} } . Given a symmetric or upper triangular matrix Q ∈ R n × n {\displaystyle {\boldsymbol {Q}}\in \mathbb {R} ^{n\times n}} , whose entries Q i j {\displaystyle Q_{ij}} define a weight for each pair of indices i , j ∈ { 1 , … , n } {\displaystyle i,j\in \lbrace 1,\dots ,n\rbrace } , we can define the function f Q : B n → R {\displaystyle f_{\boldsymbol {Q}}:\mathbb {B} ^{n}\rightarrow \mathbb {R} } that assigns a value to each binary vector x {\displaystyle {\boldsymbol {x}}} through f Q ( x ) = x ⊺ Q x = ∑ i = 1 n ∑ j = 1 n Q i j x i x j . {\displaystyle f_{\boldsymbol {Q}}({\boldsymbol {x}})={\boldsymbol {x}}^{\intercal }{\boldsymbol {Qx}}=\sum _{i=1}^{n}\sum _{j=1}^{n}Q_{ij}x_{i}x_{j}.} Alternatively, the linear and quadratic parts can be separated as f Q ′ , q ( x ) = x ⊺ Q ′ x + q ⊺ x , {\displaystyle f_{{\boldsymbol {Q}}',{\boldsymbol {q}}}({\boldsymbol {x}})={\boldsymbol {x}}^{\intercal }{\boldsymbol {Q}}'{\boldsymbol {x}}+{\boldsymbol {q}}^{\intercal }{\boldsymbol {x}},} where Q ′ ∈ R n × n {\displaystyle {\boldsymbol {Q}}'\in \mathbb {R} ^{n\times n}} and q ∈ R n {\displaystyle {\boldsymbol {q}}\in \mathbb {R} ^{n}} . This is equivalent to the previous definition through Q = Q ′ + diag ⁡ [ q ] {\displaystyle {\boldsymbol {Q}}={\boldsymbol {Q}}'+\operatorname {diag} [{\boldsymbol {q}}]} using the diag operator, exploiting that x = x ⋅ x {\displaystyle x=x\cdot x} for all binary values x {\displaystyle x} . Intuitively, the weight Q i j {\displaystyle Q_{ij}} is added if both x i = 1 {\displaystyle x_{i}=1} and x j = 1 {\displaystyle x_{j}=1} . The QUBO problem consists of finding a binary vector x ∗ {\displaystyle {\boldsymbol {x}}^{}} that minimizes f Q {\displaystyle f_{\boldsymbol {Q}}} , i.e., ∀ x ∈ B n : f Q ( x ∗ ) ≤ f Q ( x ) {\displaystyle \forall {\boldsymbol {x}}\in \mathbb {B} ^{n}:~f_{\boldsymbol {Q}}({\boldsymbol {x}}^{})\leq f_{\boldsymbol {Q}}({\boldsymbol {x}})} . In general, x ∗ {\displaystyle {\boldsymbol {x}}^{}} is not unique, meaning there may be a set of minimizing vectors with equal value w.r.t. f Q {\displaystyle f_{\boldsymbol {Q}}} . The complexity of QUBO arises from the number of candidate binary vectors to be evaluated, as | B n | = 2 n {\displaystyle \left|\mathbb {B} ^{n}\right|=2^{n}} grows exponentially in n {\displaystyle n} . Sometimes, QUBO is defined as the problem of maximizing f Q {\displaystyle f_{\boldsymbol {Q}}} , which is equivalent to minimizing f − Q = − f Q {\displaystyle f_{-{\boldsymbol {Q}}}=-f_{\boldsymbol {Q}}} . == Properties == QUBO is scale invariant for positive factors α > 0 {\displaystyle \alpha >0} , which leave the optimum x ∗ {\displaystyle {\boldsymbol {x}}^{}} unchanged: f α Q ( x ) = x ⊺ ( α Q ) x = α ( x ⊺ Q x ) = α f Q ( x ) {\displaystyle f_{\alpha {\boldsymbol {Q}}}({\boldsymbol {x}})={\boldsymbol {x}}^{\intercal }(\alpha {\boldsymbol {Q}}){\boldsymbol {x}}=\alpha ({\boldsymbol {x}}^{\intercal }{\boldsymbol {Qx}})=\alpha f_{\boldsymbol {Q}}({\boldsymbol {x}})} . In its general form, QUBO is NP-hard and cannot be solved efficiently by any known polynomial-time algorithm. However, there are polynomially-solvable special cases, where Q {\displaystyle {\boldsymbol {Q}}} has certain properties, for example: If all coefficients are positive, the optimum is trivially x ∗ = ( 0 , … , 0 ) ⊺ {\displaystyle {\boldsymbol {x}}^{}=(0,\dots ,0)^{\intercal }} . Similarly, if all coefficients are negative, the optimum is x ∗ = ( 1 , … , 1 ) ⊺ {\displaystyle {\boldsymbol {x}}^{}=(1,\dots ,1)^{\intercal }} . If Q {\displaystyle {\boldsymbol {Q}}} is diagonal, the bits can be optimized independently, and the problem is solvable in O ( n ) {\displaystyle {\mathcal {O}}(n)} . The optimal variable assignments are simply x i ∗ = 1 {\displaystyle x_{i}^{}=1} if Q i i < 0 {\displaystyle Q_{ii}<0} , and x i ∗ = 0 {\displaystyle x_{i}^{}=0} otherwise. If all off-diagonal elements of Q {\displaystyle {\boldsymbol {Q}}} are non-positive, the corresponding QUBO problem is solvable in polynomial time. QUBO can be solved using integer linear programming solvers like CPLEX or Gurobi Optimizer. This is possible since QUBO can be reformulated as a linear constrained binary optimization problem. To achieve this, substitute the product x i x j {\displaystyle x_{i}x_{j}} by an additional binary variable z i j ∈ B {\displaystyle z_{ij}\in \mathbb {B} } and add the constraints x i ≥ z i j {\displaystyle x_{i}\geq z_{ij}} , x j ≥ z i j {\displaystyle x_{j}\geq z_{ij}} and x i + x j − 1 ≤ z i j {\displaystyle x_{i}+x_{j}-1\leq z_{ij}} . Note that z i j {\displaystyle z_{ij}} can also be relaxed to continuous variables within the bounds zero and one. == Applications == QUBO is a structurally simple, yet computationally hard optimization problem. It can be used to encode a wide range of optimization problems from various scientific areas. === Maximum Cut === Given a graph G = ( V , E ) {\displaystyle G=(V,E)} with vertex set V = { 1 , … , n } {\displaystyle V=\lbrace 1,\dots ,n\rbrace } and edges E ⊆ V × V {\displaystyle E\subseteq V\times V} , the maximum cut (max-cut) problem consists of finding two subsets S , T ⊆ V {\displaystyle S,T\subseteq V} with T = V ∖ S {\displaystyle T=V\setminus S} , such that the number of edges between S {\displaystyle S} and T {\displaystyle T} is maximized. The more general weighted max-cut problem assumes edge weights w i j ≥ 0 ∀ i , j ∈ V {\displaystyle w_{ij}\geq 0~\forall i,j\in V} , with ( i , j ) ∉ E ⇒ w i j = 0 {\displaystyle (i,j)\notin E\Rightarrow w_{ij}=0} , and asks for a partition S , T ⊆ V {\displaystyle S,T\subseteq V} that maximizes the sum of edge weights between S {\displaystyle S} and T {\displaystyle T} , i.e., max S ⊆ V ∑ i ∈ S , j ∉ S w i j . {\displaystyle \max _{S\subseteq V}\sum _{i\in S,j\notin S}w_{ij}.} By setting w i j = 1 {\displaystyle w_{ij}=1} for all ( i , j ) ∈ E {\displaystyle (i,j)\in E} this becomes equivalent to the original max-cut problem above, which is why we focus on this more general form in the following. For every vertex in i ∈ V {\displaystyle i\in V} we introduce a binary variable x i {\displaystyle x_{i}} with the interpretation x i = 0 {\displaystyle x_{i}=0} if i ∈ S {\displaystyle i\in S} and x i = 1 {\displaystyle x_{i}=1} if i ∈ T {\displaystyle i\in T} . As T = V ∖ S {\displaystyle T=V\setminus S} , every i {\displaystyle i} is in exactly one set, meaning there is a 1:1 correspondence between binary vectors x ∈ B n {\displaystyle {\boldsymbol {x}}\in \mathbb {B} ^{n}} and partitions of V {\displaystyle V} into two subsets. We observe that, for any i , j ∈ V {\displaystyle i,j\in V} , the expression x i ( 1 − x j ) + ( 1 − x i ) x j {\displaystyle x_{i}(1-x_{j})+(1-x_{i})x_{j}} evaluates to 1 if and only if i {\displaystyle i} and j {\displaystyle j} are in different subsets, equivalent to logical XOR. Let W ∈ R + n × n {\displaystyle {\boldsymbol {W}}\in \mathbb {R} _{+}^{n\times n}} with W i j = w i j ∀ i , j ∈ V {\displaystyle W_{ij}=w_{ij}~\forall i,j\in V} . By extending above expression to matrix-vector form we find that x ⊺ W ( 1 − x ) + ( 1 − x ) ⊺ W x = − 2 x ⊺ W x + ( W 1 + W ⊺ 1 ) ⊺ x {\displaystyle {\boldsymbol {x}}^{\intercal }{\boldsymbol {W}}({\boldsymbol {1}}-{\boldsymbol {x}})+({\boldsymbol {1}}-{\boldsymbol {x}})^{\intercal }{\boldsymbol {Wx}}=-2{\boldsymbol {x}}^{\intercal }{\boldsymbol {Wx}}+({\boldsymbol {W1}}+{\boldsymbol {W}}^{\intercal }{\boldsymbol {1}})^{\intercal }{\boldsymbol {x}}} is the sum of weights of all edges between S {\displaystyle S} and T {\displaystyle T} , where 1 = ( 1 , 1 , … , 1 ) ⊺ ∈ R n {\displaystyle {\boldsymbol {1}}=(1,1,\dots ,1)^{\intercal }\in \mathbb {R} ^{n}} . As this is a quadratic function over x {\displaystyle {\boldsymbol {x}}} , it is a QUBO problem whose parameter matrix we can read from above expression as Q = 2 W − diag ⁡ [ W 1 + W ⊺ 1 ] , {\displaystyle {\boldsymbol {Q}}=2{\boldsymbol {W}}-\operatorname {diag} [{\boldsymbol {W1}}+{\boldsymbol {W}}^{\intercal }{\bol

    Read more →
  • Linear genetic programming

    Linear genetic programming

    "Linear genetic programming" is unrelated to "linear programming". Linear genetic programming (LGP) is a particular method of genetic programming wherein computer programs in a population are represented as a sequence of register-based instructions from an imperative programming language or machine language. The adjective "linear" stems from the fact that each LGP program is a sequence of instructions and the sequence of instructions is normally executed sequentially. Like in other programs, the data flow in LGP can be modeled as a graph that will visualize the potential multiple usage of register contents and the existence of structurally noneffective code (introns) which are two main differences of this genetic representation from the more common tree-based genetic programming (TGP) variant. Like other Genetic Programming methods, Linear genetic programming requires the input of data to run the program population on. Then, the output of the program (its behaviour) is judged against some target behaviour, using a fitness function. However, LGP is generally more efficient than tree genetic programming due to its two main differences mentioned above: Intermediate results (stored in registers) can be reused and a simple intron removal algorithm exists that can be executed to remove all non-effective code prior to programs being run on the intended data. These two differences often result in compact solutions and substantial computational savings compared to the highly constrained data flow in trees and the common method of executing all tree nodes in TGP. Furthermore, LGP naturally has multiple outputs by defining multiple output registers and easily cooperates with control flow operations. Linear genetic programming has been applied in many domains, including system modeling and system control with considerable success. Linear genetic programming should not be confused with linear tree programs in tree genetic programming, program composed of a variable number of unary functions and a single terminal. Note that linear tree GP differs from bit string genetic algorithms since a population may contain programs of different lengths and there may be more than two types of functions or more than two types of terminals. == Examples of LGP programs == Because LGP programs are basically represented by a linear sequence of instructions, they are simpler to read and to operate on than their tree-based counterparts. For example, a simple program written to solve a Boolean function problem with 3 inputs (in R1, R2, R3) and one output (in R0), could read like this: R1, R2, R3 have to be declared as input (read-only) registers, while R0 and R4 are declared as calculation (read-write) registers. This program is very simple, having just 5 instructions. But mutation and crossover operators could work to increase the length of the program, as well as the content of each of its instructions. Note that one instruction is non-effective or an intron (marked), since it does not impact the output register R0. Recognition of those instructions is the basis for the intron removal algorithm which is used analyze code prior to execution. Technically, this happens by copying an individual and then run the intron removal once. The copy with removed introns is then executed as many times as dictated by the number of training cases. Notably, the original individual is left intact, so as to continue participating in the evolutionary process. It is only the copy that is executed that is compressed by removing these "structural" introns. Another simple program, this one written in the LGP language Slash/A looks like a series of instructions separated by a slash: By representing such code in bytecode format, i.e. as an array of bytes each representing a different instruction, one can make mutation operations simply by changing an element of such an array.

    Read more →