AI Headshot Generator For Linkedin

AI Headshot Generator For Linkedin — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • The Triple Revolution

    The Triple Revolution

    "The Triple Revolution" was an open memorandum sent to U.S. President Lyndon B. Johnson and other government figures on March 22, 1964. It concerned three megatrends of the time: increasing use of automation, the nuclear arms race, and advancements in human rights. Drafted under the auspices of the Center for the Study of Democratic Institutions, it was signed by an array of noted social activists, professors, and technologists who identified themselves as the Ad Hoc Committee on the Triple Revolution. The chief initiator of the proposal was W. H. "Ping" Ferry, at that time a vice-president of CSDI, basing it in large part on the ideas of the futurist Robert Theobald. == Overview == The statement identified three revolutions underway in the world: the cybernation revolution of increasing automation; the weaponry revolution of mutually assured destruction; and the human rights revolution. It discussed primarily the cybernation revolution. The committee claimed that machines would usher in "a system of almost unlimited productive capacity" while continually reducing the number of manual laborers needed, and increasing the skill needed to work, thereby producing increasing levels of unemployment. It proposed that the government should ease this transformation through large-scale public works, low-cost housing, public transit, electrical power development, income redistribution, union representation for the unemployed, and government restraint on technology deployment. == Legacy == Martin Luther King Jr.'s final Sunday sermon, delivered six days before his April 1968 assassination, explicitly references the thesis of "The Triple Revolution": There can be no gainsaying of the fact that a great revolution is taking place in the world today. In a sense it is a triple revolution: that is, a technological revolution, with the impact of automation and cybernation; then there is a revolution in weaponry, with the emergence of atomic and nuclear weapons of warfare; then there is a human rights revolution, with the freedom explosion that is taking place all over the world. Yes, we do live in a period where changes are taking place. And there is still the voice crying through the vista of time saying, "Behold, I make all things new; former things are passed away." In Harlan Ellison's 1967 anthology Dangerous Visions, Philip José Farmer's story "Riders of the Purple Wage" uses the Triple Revolution document as the premise of a future society, in which the "purple wage" of the title is a guaranteed income dole on which most of the population lives. At the 1968 World Science Fiction Convention in San Francisco, Farmer delivered a lengthy Guest of Honor speech in which he called for the founding of a grassroots activist organization called REAP which would work for implementation of the Ad Hoc Committee's recommendations. Looking back on the proposal in his 2008 book, Daniel Bell wrote: "the cybernetic revolution quickly proved to be illusory. There were no spectacular jumps in productivity. ... Cybernation had proved to be one more instance of the penchant for overdramatizing a momentary innovation and blowing it up far out of proportion to its actuality. ... The image of a completely automated production economy—with an endless capacity to turn out goods—was simply a social-science fiction of the early 1960s. Paradoxically, the vision of Utopia was suddenly replaced by the spectre of Doomsday. In place of the early-sixties theme of endless plenty, the picture by the end of the decade was one of a fragile planet of limited resources whose finite stocks were being rapidly depleted, and whose wastes from soaring industrial production were polluting the air and waters." In his 2015 book Rise of the Robots, Martin Ford claims The Triple Revolution's predictions of steady decline in future employment were not wrong, but rather premature. He cites "Seven Deadly Trends" that began in the 1970s-1980s and by the mid-2010s appeared set to continue: Stagnation in real wages Decline in labor's share of national income in many countries (breakdown of Bowley's law), while corporate profits increased Declining labor force participation Diminishing job creation, lengthening jobless recoveries, and soaring long-term unemployment Rising inequality Declining incomes, and underemployment for recent college graduates Polarization and part-time jobs (middle-class jobs are disappearing, to be replaced by a small number of high-paying jobs and large number of low-paying jobs) According to Ford, the 1960s were part of what in retrospect seems like a golden age for labor in the United States, when productivity and wages rose together in near lockstep, and unemployment was low. But after about 1980, wages began stagnating while productivity continued to rise. Labor's share of the economic output began to decline. Ford describes the role that automation and information technology play in these trends, and how new technologies including narrow AI threaten to destroy jobs faster than displaced workers can be retrained for new jobs, before automation takes the new jobs as well. This includes many job categories, such as in transportation, that were never threatened by automation before. According to a 2013 study, about 47% of US jobs are susceptible to automation. == Signatories ==

    Read more →
  • Manifold hypothesis

    Manifold hypothesis

    The manifold hypothesis posits that many high-dimensional data sets that occur in the real world actually lie along low-dimensional latent manifolds inside that high-dimensional space. As a consequence of the manifold hypothesis, many data sets that appear to initially require many variables to describe, can actually be described by a comparatively small number of variables, linked to the local coordinate system of the underlying manifold. It is suggested that this principle underpins the effectiveness of machine learning algorithms in describing high-dimensional data sets by considering a few common features. The manifold hypothesis is related to the effectiveness of nonlinear dimensionality reduction techniques in machine learning. Many techniques of dimensional reduction make the assumption that data lies along a low-dimensional submanifold, such as manifold sculpting, manifold alignment, and manifold regularization. The major implications of this hypothesis is that Machine learning models only have to fit relatively simple, low-dimensional, highly structured subspaces within their potential input space (latent manifolds). Within one of these manifolds, it's always possible to interpolate between two inputs, that is to say, morph one into another via a continuous path along which all points fall on the manifold. The ability to interpolate between samples is the key to generalization in deep learning. == The information geometry of statistical manifolds == An empirically-motivated approach to the manifold hypothesis focuses on its correspondence with an effective theory for manifold learning under the assumption that robust machine learning requires encoding the dataset of interest using methods for data compression. This perspective gradually emerged using the tools of information geometry thanks to the coordinated effort of scientists working on the efficient coding hypothesis, predictive coding and variational Bayesian methods. The argument for reasoning about the information geometry on the latent space of distributions rests upon the existence and uniqueness of the Fisher information metric. In this general setting, we are trying to find a stochastic embedding of a statistical manifold. From the perspective of dynamical systems, in the big data regime this manifold generally exhibits certain properties such as homeostasis: We can sample large amounts of data from the underlying generative process. Machine Learning experiments are reproducible, so the statistics of the generating process exhibit stationarity. In a sense made precise by theoretical neuroscientists working on the free energy principle, the statistical manifold in question possesses a Markov blanket.

    Read more →
  • Commonsense knowledge (artificial intelligence)

    Commonsense knowledge (artificial intelligence)

    In artificial intelligence research, commonsense knowledge consists of facts about the everyday world, such as "Lemons are sour" or "Cows say moo", that all humans are expected to know. It is currently an unsolved problem in artificial general intelligence. The first AI program to address common sense knowledge was Advice Taker in 1959 by John McCarthy. Commonsense knowledge can underpin a commonsense reasoning process, to attempt inferences such as "You might bake a cake because you want people to eat the cake." A natural language processing process can be attached to the commonsense knowledge base to allow the knowledge base to attempt to answer questions about the world. Common sense knowledge also helps to solve problems in the face of incomplete information. Using widely held beliefs about everyday objects, or common sense knowledge, AI systems make common sense assumptions or default assumptions about the unknown similar to the way people do. In an AI system or in English, this is expressed as "Normally P holds", "Usually P" or "Typically P so Assume P". For example, if we know the fact "Tweety is a bird", because we know the commonly held belief about birds, "typically birds fly," without knowing anything else about Tweety, we may reasonably assume the fact that "Tweety can fly." As more knowledge of the world is discovered or learned over time, the AI system can revise its assumptions about Tweety using a truth maintenance process. If we later learn that "Tweety is a penguin" then truth maintenance revises this assumption because we also know "penguins do not fly". == Commonsense reasoning == Commonsense reasoning simulates the human ability to use commonsense knowledge to make presumptions about the type and essence of ordinary situations they encounter every day, and to change their "minds" should new information come to light. This includes time, missing or incomplete information and cause and effect. The ability to explain cause and effect is an important aspect of explainable AI. Truth maintenance algorithms automatically provide an explanation facility because they create elaborate records of presumptions. Compared with humans, all existing computer programs that attempt human-level AI perform extremely poorly on modern "commonsense reasoning" benchmark tests such as the Winograd Schema Challenge. The problem of attaining human-level competency at "commonsense knowledge" tasks is considered to probably be "AI complete" (that is, solving it would require the ability to synthesize a fully human-level intelligence), although some oppose this notion and believe compassionate intelligence is also required for human-level AI. Common sense reasoning has been applied successfully in more limited domains such as natural language processing and automated diagnosis or analysis. == Commonsense knowledge base construction == Compiling comprehensive knowledge bases of commonsense assertions (CSKBs) is a long-standing challenge in AI research. From early expert-driven efforts like CYC and WordNet, significant advances were achieved via the crowdsourced OpenMind Commonsense project, which led to the crowdsourced ConceptNet KB. Several approaches have attempted to automate CSKB construction, most notably, via text mining (WebChild, Quasimodo, TransOMCS, Ascent), as well as harvesting these directly from pre-trained language models (AutoTOMIC). These resources are significantly larger than ConceptNet, though the automated construction mostly makes them of moderately lower quality. Challenges also remain on the representation of commonsense knowledge: Most CSKB projects follow a triple data model, which is not necessarily best suited for breaking more complex natural language assertions. A notable exception here is GenericsKB, which applies no further normalization to sentences, but retains them in full. == Applications == Around 2013, MIT researchers developed BullySpace, an extension of the commonsense knowledgebase ConceptNet, to catch taunting social media comments. BullySpace included over 200 semantic assertions based around stereotypes, to help the system infer that comments like "Put on a wig and lipstick and be who you really are" are more likely to be an insult if directed at a boy than a girl. ConceptNet has also been used by chatbots and by computers that compose original fiction. At Lawrence Livermore National Laboratory, common sense knowledge was used in an intelligent software agent to detect violations of a comprehensive nuclear test ban treaty. == Data == As an example, as of 2012 ConceptNet includes these 21 language-independent relations: IsA (An "RV" is a "vehicle" | X is an instance of a Y) UsedFor (a "cake tin" is used for "making cakes" | X is used for the purpose Y) HasA (A "rabbit" has a "tail" | X possesses Y element or feature) CapableOf (a "cook" is capable of "making baked goods" | X is capable of doing Y) Desires (a "child" desires "the aroma of baking" | X has a desire for Y) CreatedBy ("cake" is created by a "baker" | X is created by Y) PartOf (a "knife" is be part of a "knife set" | X is a part of Y) Causes ("Heat" causes "cooking"| X is what causes Y) LocatedNear (the "oven" is located near the "refrigerator" | X is located near Y) AtLocation (Somewhere a "Cook" can be at a "restaurant" | X is at the location of Y) DefinedAs (a "Cupcake" is defined as a "cake" that also has the qualities of being "small", "baked within a wrapper", and "containing only one area of frosting or icing" | X is defined as Y that also has the properties A, B & C) SymbolOf (a "heart" is a symbol of "affection" | X is a symbolic representation of Y) ReceivesAction ("cake" can receive the action of being "eaten" | X is capable of receiving action Y) HasPrerequisite ("baking" has the prerequisite of obtaining the "ingredients" | X cannot do Y unless A does B) MotivatedByGoal ("baking" is motivated by the goal of "consumption"/"eating" | X has the motivation of Y goal) CausesDesire ("baking" makesYou want to "follow recipe" | X causes the desire to do Y) MadeOf ("Cake" is made of "flour"/"eggs"/"sugar"/"oil"/etc | X is made of Y) HasFirstSubevent ("baking" has first subevent "make batter" | To do X the first thing that needs to be done is Y) HasSubevent ("eat" has subevent "swallow" | Doing X will lead to Y event following) HasLastSubevent ("sleeping" has last subevent of "waking" | Doing X ends with the event Y) == Commonsense knowledge bases == Cyc Open Mind Common Sense (data source) and ConceptNet (datastore and NLP engine) Evi Graphiq

    Read more →
  • Case-based reasoning

    Case-based reasoning

    Case-based reasoning (CBR), broadly construed, is the process of solving new problems based on the solutions of similar past problems. In everyday life, an auto mechanic who fixes an engine by recalling another car that exhibited similar symptoms is using case-based reasoning. A lawyer who advocates a particular outcome in a trial based on legal precedents or a judge who creates case law is using case-based reasoning. So, too, an engineer copying working elements of nature (practicing biomimicry) is treating nature as a database of solutions to problems. Case-based reasoning is a prominent type of analogy solution making. It has been argued that case-based reasoning is not only a powerful method for computer reasoning, but also a pervasive behavior in everyday human problem solving; or, more radically, that all reasoning is based on past cases personally experienced. This view is related to prototype theory, which is most deeply explored in cognitive science. == Process == Case-based reasoning has been formalized for purposes of computer reasoning as a four-step process: Retrieve: Given a target problem, retrieve cases relevant to solving it from memory. A case consists of a problem, its solution, and, typically, annotations about how the solution was derived. For example, suppose Fred wants to prepare blueberry pancakes. Being a novice cook, the most relevant experience he can recall is one in which he successfully made plain pancakes. The procedure he followed for making the plain pancakes, together with justifications for decisions made along the way, constitutes Fred's retrieved case. Reuse: Map the solution from the previous case to the target problem. This may involve adapting the solution as needed to fit the new situation. In the pancake example, Fred must adapt his retrieved solution to include the addition of blueberries. Revise: Having mapped the previous solution to the target situation, test the new solution in the real world (or a simulation) and, if necessary, revise. Suppose Fred adapted his pancake solution by adding blueberries to the batter. After mixing, he discovers that the batter has turned blue – an undesired effect. This suggests the following revision: delay the addition of blueberries until after the batter has been ladled into the pan. Retain: After the solution has been successfully adapted to the target problem, store the resulting experience as a new case in memory. Fred, accordingly, records his new-found procedure for making blueberry pancakes, thereby enriching his set of stored experiences, and better preparing him for future pancake-making demands. == Comparison to other methods == At first glance, CBR may seem similar to the rule induction algorithms of machine learning. Like a rule-induction algorithm, CBR starts with a set of cases or training examples; it forms generalizations of these examples, albeit implicit ones, by identifying commonalities between a retrieved case and the target problem. If for instance a procedure for plain pancakes is mapped to blueberry pancakes, a decision is made to use the same basic batter and frying method, thus implicitly generalizing the set of situations under which the batter and frying method can be used. The key difference, however, between the implicit generalization in CBR and the generalization in rule induction lies in when the generalization is made. A rule-induction algorithm draws its generalizations from a set of training examples before the target problem is even known; that is, it performs eager generalization. For instance, if a rule-induction algorithm were given recipes for plain pancakes, Dutch apple pancakes, and banana pancakes as its training examples, it would have to derive, at training time, a set of general rules for making all types of pancakes. It would not be until testing time that it would be given, say, the task of cooking blueberry pancakes. The difficulty for the rule-induction algorithm is in anticipating the different directions in which it should attempt to generalize its training examples. This is in contrast to CBR, which delays (implicit) generalization of its cases until testing time – a strategy of lazy generalization. In the pancake example, CBR has already been given the target problem of cooking blueberry pancakes; thus it can generalize its cases exactly as needed to cover this situation. CBR therefore tends to be a good approach for rich, complex domains in which there are myriad ways to generalize a case. In law, there is often explicit delegation of CBR to courts, recognizing the limits of rule based reasons: limiting delay, limited knowledge of future context, limit of negotiated agreement, etc. While CBR in law and cognitively inspired CBR have long been associated, the former is more clearly an interpolation of rule based reasoning, and judgment, while the latter is more closely tied to recall and process adaptation. The difference is clear in their attitude toward error and appellate review. Another name for case-based reasoning in problem solving is symptomatic strategies. It does require à priori domain knowledge that is gleaned from past experience which established connections between symptoms and causes. This knowledge is referred to as shallow, compiled, evidential, history-based as well as case-based knowledge. This is the strategy most associated with diagnosis by experts. Diagnosis of a problem transpires as a rapid recognition process in which symptoms evoke appropriate situation categories. An expert knows the cause by virtue of having previously encountered similar cases. Case-based reasoning is the most powerful strategy, and that used most commonly. However, the strategy won't work independently with truly novel problems, or where deeper understanding of whatever is taking place is sought. An alternative approach to problem solving is the topographic strategy which falls into the category of deep reasoning. With deep reasoning, in-depth knowledge of a system is used. Topography in this context means a description or an analysis of a structured entity, showing the relations among its elements. Also known as reasoning from first principles, deep reasoning is applied to novel faults when experience-based approaches aren't viable. The topographic strategy is therefore linked to à priori domain knowledge that is developed from a more a fundamental understanding of a system, possibly using first-principles knowledge. Such knowledge is referred to as deep, causal or model-based knowledge. Hoc and Carlier noted that symptomatic approaches may need to be supported by topographic approaches because symptoms can be defined in diverse terms. The converse is also true – shallow reasoning can be used abductively to generate causal hypotheses, and deductively to evaluate those hypotheses, in a topographical search. == Criticism == Critics of CBR argue that it is an approach that accepts anecdotal evidence as its main operating principle. Without statistically relevant data for backing and implicit generalization, there is no guarantee that the generalization is correct. However, all inductive reasoning where data is too scarce for statistical relevance is inherently based on anecdotal evidence. == History == CBR traces its roots to the work of Roger Schank and his students at Yale University in the early 1980s. Schank's model of dynamic memory was the basis for the earliest CBR systems: Janet Kolodner's CYRUS and Michael Lebowitz's IPP. Other schools of CBR and closely allied fields emerged in the 1980s, which directed at topics such as legal reasoning, memory-based reasoning (a way of reasoning from examples on massively parallel machines), and combinations of CBR with other reasoning methods. In the 1990s, interest in CBR grew internationally, as evidenced by the establishment of an International Conference on Case-Based Reasoning in 1995, as well as European, German, British, Italian, and other CBR workshops. CBR technology has resulted in the deployment of a number of successful systems, the earliest being Lockheed's CLAVIER, a system for laying out composite parts to be baked in an industrial convection oven. CBR has been used extensively in applications such as the Compaq SMART system and has found a major application area in the health sciences, as well as in structural safety management. There is recent work that develops CBR within a statistical framework and formalizes case-based inference as a specific type of probabilistic inference. Thus, it becomes possible to produce case-based predictions equipped with a certain level of confidence. One description of the difference between CBR and induction from instances is that statistical inference aims to find what tends to make cases similar while CBR aims to encode what suffices to claim similarly.

    Read more →
  • Plotly

    Plotly

    Plotly is a technical computing company headquartered in Montreal, Quebec, that develops online data analytics and visualization tools. Plotly provides online graphing, analytics, and statistics tools for individuals and collaboration, as well as scientific graphing libraries for Python, R, MATLAB, Perl, Julia, Arduino, JavaScript and REST. == History == Plotly was founded by Alex Johnson, Jack Parmer, Chris Parmer, and Matthew Sundquist. The founders' backgrounds are in science, energy, and data analysis and visualization. Early employees include Christophe Viau, a Canadian software engineer and Ben Postlethwaite, a Canadian geophysicist. Plotly was named one of the Top 20 Hottest Innovative Companies in Canada by the Canadian Innovation Exchange. Plotly was featured in "startup row" at PyCon 2013, and sponsored the SciPy 2018 conference. Plotly raised $5.5 million during its Series A funding, led by MHS Capital, Siemens Venture Capital, Rho Ventures, Real Ventures, and Silicon Valley Bank. The Boston Globe and Washington Post newsrooms have produced data journalism using Plotly. In 2020, Plotly was named a Best Place to Work by the Canadian SME National Business Awards, and nominated as Business of the Year. == Products == Plotly offers open-source and enterprise products. Dash is an open-source Python, R, and Julia framework for building web-based analytic applications. Many specialized open-source Dash libraries exist that are tailored for building domain-specific Dash components and applications. Some examples are Dash DAQ, for building data acquisition GUIs to use with scientific instruments, and Dash Bio, which enables users to build custom chart types, sequence analysis tools, and 3D rendering tools for bioinformatics applications. Dash Enterprise is Plotly's paid product for building, testing, deploying, managing and scaling Dash applications organization-wide. Chart Studio Cloud is a free, online tool for creating interactive graphs. It has a point-and-click graphical user interface for importing and analyzing data into a grid and using stats tools. Graphs can be embedded or downloaded. Chart Studio Enterprise is a paid product that allows teams to create, style, and share interactive graphs on a single platform. It offers expanded authentication and file export options, and does not limit sharing and viewing. Data visualization libraries Plotly.js is an open-source JavaScript library for creating graphs and powers Plotly.py for Python, as well as Plotly.R for R, MATLAB, Node.js, Julia, and Arduino and a REST API. Plotly can also be used to style interactive graphs with Jupyter notebook. Figure converters which convert matplotlib, ggplot2, and IGOR Pro graphs into interactive, online graphs. == Data visualization libraries == Plotly provides a collection of supported chart types across several programming languages: == Dash == Dash is a Python framework built on top of React, a JavaScript library. Dash also works for R, and most recently supports Julia. While still described as a Python framework, Python isn't used for the other languages: "... describing Dash as a Python framework misses a key feature of its design: the Python side (the back end/server) of Dash was built to be lightweight and stateless [allowing] multiple back-end languages to coexist on an equal footing". It is possible to integrate D3.js charts as Dash components. Dash provides the default CSS (plus HTML and JavaScript), but for custom styling Dash applications, CSS can be added, or Dash Enterprise used. === Dash Enterprise === Dash Enterprise is Plotly's paid product for building, testing, deploying, managing and scaling Dash applications organization-wide. The product integrates with enterprise IT systems to enable organizations to build, deploy and scale low-code Dash applications. With open-source Dash, analytic applications can be run from a local machine, but cannot be easily accessed by others in the organization. ==== Enterprise IT integration ==== Dash Enterprise installs on cloud environments and on-premises. Amazon Web Services, Google Cloud Platform, and Microsoft Azure are supported, as are multiple Linux on-premises servers. Authentication integrations include LDAP, AD, PKI, Okta, SAML, OAuth2, SSO, and email authentication, and Dash application access is managed through a GUI rather than code. Dash Enterprise connects to major big data backends, including Salesforce, PostgreSQL, Databricks via PySpark, Snowflake, Dask, Datashader, and Vaex. In 2020, Plotly partnered with NVIDIA to integrate Dash with RAPIDS, and NVIDIA participated in Plotly's Series C funding round. ==== Low-code capabilities ==== Dash Enterprise enables low-code development of Dash applications, which is not possible with open-source Dash. Enterprise users can write applications in multiple development environments, including Jupyter Notebook. Dash Enterprise ships with several “development engines” for drag-and-drop application editing, application design, and automated reporting, as well as dozens of artificial intelligence and machine learning application templates. ==== Deployment and scaling ==== Dash application code is deployed to Dash Enterprise using the git-push command. Dash application deployments are containerized to avoid dependency conflicts, and can be embedded in existing web platforms without iframes. Deployed applications can be managed and accessed in a single portal called App Manager, where administrators can control user authentication and view usage analytics. Dash Enterprise scales horizontally with Kubernetes. Jobs queuing, GPU acceleration, and CPU parallelization support high performance computing requirements. Plotly also offers professional services for application development and workshop training.

    Read more →
  • And–or tree

    And–or tree

    An and–or tree is a graphical representation of the reduction of problems (or goals) to conjunctions and disjunctions of subproblems (or subgoals). == Example == The and–or tree: represents the search space for solving the problem P, using the goal-reduction methods: P if Q and R P if S Q if T Q if U == Definitions == Given an initial problem P0 and set of problem solving methods of the form: P if P1 and … and Pn the associated and–or tree is a set of labelled nodes such that: The root of the tree is a node labelled by P0. For every node N labelled by a problem or sub-problem P and for every method of the form P if P1 and ... and Pn, there exists a set of children nodes N1, ..., Nn of the node N, such that each node Ni is labelled by Pi. The nodes are conjoined by an arc, to distinguish them from children of N that might be associated with other methods. A node N, labelled by a problem P, is a success node if there is a method of the form P if nothing (i.e., P is a "fact"). The node is a failure node if there is no method for solving P. If all of the children of a node N, conjoined by the same arc, are success nodes, then the node N is also a success node. Otherwise the node is a failure node. == Search strategies == An and–or tree specifies only the search space for solving a problem. Different search strategies for searching the space are possible. These include searching the tree depth-first, breadth-first, or best-first using some measure of desirability of solutions. The search strategy can be sequential, searching or generating one node at a time, or parallel, searching or generating several nodes in parallel. == Relationship with logic programming == The methods used for generating and–or trees are propositional logic programs (without variables). In the case of logic programs containing variables, the solutions of conjoint sub-problems must be compatible. Subject to this complication, sequential and parallel search strategies for and–or trees provide a computational model for executing logic programs. == Relationship with two-player games == And–or trees can also be used to represent the search spaces for two-person games. The root node of such a tree represents the problem of one of the players winning the game, starting from the initial state of the game. Given a node N, labelled by the problem P of the player winning the game from a particular state of play, there exists a single set of conjoint children nodes, corresponding to all of the opponents responding moves. For each of these children nodes, there exists a set of non-conjoint children nodes, corresponding to all of the player's defending moves. For solving game trees with proof-number search family of algorithms, game trees are to be mapped to and–or trees. MAX-nodes (i.e. maximizing player to move) are represented as OR nodes, MIN-nodes map to AND nodes. The mapping is possible, when the search is done with only a binary goal, which usually is "player to move wins the game".

    Read more →
  • Machine-learned interatomic potential

    Machine-learned interatomic potential

    Machine-learned interatomic potentials (MLIPs), or simply machine learning potentials (MLPs), are interatomic potentials constructed using machine learning. Beginning in the 1990s, researchers have employed such programs to construct interatomic potentials by mapping atomic structures to their potential energies. These potentials are referred to as MLIPs or MLPs. Such machine learning potentials promised to fill the gap between density functional theory, a highly accurate but computationally intensive modelling method, and empirically derived or intuitively-approximated potentials, which were far lighter computationally but substantially less accurate. Improvements in artificial intelligence technology heightened the accuracy of MLPs while lowering their computational cost, increasing the role of machine learning in fitting potentials. Machine learning potentials began by using neural networks to tackle low-dimensional systems. While promising, these models could not systematically account for interatomic energy interactions; they could be applied to small molecules in a vacuum, or molecules interacting with frozen surfaces, but not much else – and even in these applications, the models often relied on force fields or potentials derived empirically or with simulations. These models thus remained confined to academia. Modern neural networks construct highly accurate and computationally light potentials, as theoretical understanding of materials science was increasingly built into their architectures and preprocessing. Almost all are local, accounting for all interactions between an atom and its neighbor up to some cutoff radius. There exist some nonlocal models, but these have been experimental for almost a decade. For most systems, reasonable cutoff radii enable highly accurate results. Almost all neural networks intake atomic coordinates and output potential energies. For some, these atomic coordinates are converted into atom-centered symmetry functions. From this data, a separate atomic neural network is trained for each element; each atomic network is evaluated whenever that element occurs in the given structure, and then the results are pooled together at the end. This process – in particular, the atom-centered symmetry functions which convey translational, rotational, and permutational invariances – has greatly improved machine learning potentials by significantly constraining the neural network search space. Other models use a similar process but emphasize bonds over atoms, using pair symmetry functions and training one network per atom pair. Other models to learn their own descriptors rather than using predetermined symmetry-dictating functions. These models, called message-passing neural networks (MPNNs), are graph neural networks. Treating molecules as three-dimensional graphs (where atoms are nodes and bonds are edges), the model takes feature vectors describing the atoms as input, and iteratively updates these vectors as information about neighboring atoms is processed through message functions and convolutions. These feature vectors are then used to predict the final potentials. The flexibility of this method often results in stronger, more generalizable models. In 2017, the first-ever MPNN model (a deep tensor neural network) was used to calculate the properties of small organic molecules. == Gaussian Approximation Potential (GAP) == One popular class of machine-learned interatomic potential is the Gaussian Approximation Potential (GAP), which combines compact descriptors of local atomic environments with Gaussian process regression to machine learn the potential energy surface of a given system. To date, the GAP framework has been used to successfully develop a number of MLIPs for various systems, including for elemental systems such as carbon, silicon, phosphorus, and tungsten, as well as for multicomponent systems such as Ge2Sb2Te5 and austenitic stainless steel, Fe7Cr2Ni. == Equivariant graph neural networks == A significant limitation of early MPNNs was that they were not inherently equivariant to rotations and reflections of atomic structures — meaning predictions could change depending on how a molecule was oriented in space. Beginning around 2021, a new class of models addressed this by incorporating equivariance directly into the message-passing layers using spherical harmonics and irreducible representations. Notable examples include NequIP (2021), MACE (2022), and GemNet-OC (2022). These equivariant architectures proved substantially more data-efficient and accurate than their predecessors, and became the dominant paradigm for high-accuracy MLIPs. == Universal MLIPs and large-scale datasets == Early MLIPs were system-specific, trained on a few thousand structures of a single material. A major shift occurred with the creation of large, chemically diverse datasets enabling models that generalize across many elements, bonding environments, and application domains — so-called universal MLIPs. A key driver was the Open Catalyst Project (OC20, OC22), a collaboration between Meta AI (FAIR) and Carnegie Mellon University launched in 2020. OC20 comprises approximately 1.3 million DFT relaxations across 82 elements, designed to accelerate the discovery of catalysts for renewable energy applications. It was among the first datasets large enough to train GNNs that generalize across diverse chemical systems, and established a widely-used benchmark for the field. A subsequent dataset, Open Direct Air Capture (OpenDAC 2023 and OpenDAC 2025), applied the same approach to carbon capture, providing a large computational database of metal-organic frameworks and sorbent candidates evaluated for CO₂ capture, generated using nearly 400 million CPU hours of quantum chemistry calculations in collaboration with Georgia Tech. These datasets revealed a new challenge: the GNN architectures most effective for atomic simulations were memory-intensive, as they model higher-order interactions between triplets or quadruplets of atoms, making it difficult to scale model size. Graph Parallelism, introduced by Sriram et al. (ICLR 2022), addressed this by distributing a single input graph across multiple GPUs — a distinct strategy from data parallelism (which distributes training examples) or model parallelism (which distributes layers). This enabled training GNNs with hundreds of millions to billions of parameters for the first time. Building on these foundations, Meta FAIR released the Universal Model for Atoms (UMA) in 2025, trained on approximately 500 million unique 3D atomic structures spanning molecules, materials, and catalysts — the largest training run to date for an MLIP. UMA introduced a Mixture of Linear Experts (MoLE) architecture, enabling one model to learn from datasets generated by different DFT codes and settings without significant inference overhead. It matches or surpasses specialized models across catalysis, materials, and molecular benchmarks without task-specific fine-tuning, and has been described as marking a "pre/post-UMA" divide in the field. == Applications == Catalyst discovery: MLIPs have significantly accelerated the computational screening of heterogeneous catalysts by replacing expensive DFT relaxations with fast neural network surrogates. The Open Catalyst Project explicitly targets this application, aiming to identify new catalysts for green hydrogen production and other renewable energy reactions. Carbon capture: The OpenDAC project applies universal MLIPs to screening sorbent materials for direct air capture of CO₂, a key technology for climate change mitigation. AI-accelerated screening allows evaluation of orders of magnitude more candidate materials than traditional DFT workflows. Drug discovery and molecular design: MLIPs are increasingly used in pharmaceutical research to model molecular conformations and binding energies. The Open Molecules 2025 (OMol25) dataset, released by Meta FAIR in 2025, provides high-accuracy calculations for a large set of molecular systems to support this use case. Materials discovery: Universal MLIPs enable high-throughput screening of novel inorganic materials, including battery electrolytes, semiconductors, and superconductors, by rapidly estimating stability and properties across large chemical spaces.

    Read more →
  • Cognitive robotics

    Cognitive robotics

    Cognitive robotics or cognitive technology is a subfield of robotics concerned with endowing a robot with intelligent behavior by providing it with a processing architecture that will allow it to learn and reason about how to behave in response to complex goals in a complex world. Cognitive robotics may be considered the engineering branch of embodied cognitive science and embodied embedded cognition, consisting of robotic process automation, artificial intelligence, machine learning, deep learning, optical character recognition, image processing, process mining, analytics, software development and system integration. == Core issues == While traditional cognitive modeling approaches have assumed symbolic coding schemes as a means for depicting the world, translating the world into these kinds of symbolic representations has proven to be problematic if not untenable. Perception and action and the notion of symbolic representation are therefore core issues to be addressed in cognitive robotics. == Starting point == Cognitive robotics views human or animal cognition as a starting point for the development of robotic information processing, as opposed to more traditional artificial intelligence techniques. Target robotic cognitive capabilities include perception processing, attention allocation, anticipation, planning, complex motor coordination, reasoning about other agents and perhaps even about their own mental states. Robotic cognition embodies the behavior of intelligent agents in the physical world (or a virtual world, in the case of simulated cognitive robotics). Ultimately, the robot must be able to act in the real world. == Learning techniques == === Motor Babble === A preliminary robot learning technique called motor babbling involves correlating pseudo-random complex motor movements by the robot with resulting visual and/or auditory feedback such that the robot may begin to expect a pattern of sensory feedback given a pattern of motor output. Desired sensory feedback may then be used to inform a motor control signal. This is thought to be analogous to how a baby learns to reach for objects or learns to produce speech sounds. For simpler robot systems, where, for instance, inverse kinematics may feasibly be used to transform anticipated feedback (desired motor result) into motor output, this step may be skipped. === Imitation === Once a robot can coordinate its motors to produce a desired result, the technique of learning by imitation may be used. The robot monitors the performance of another agent and then the robot tries to imitate that agent. It is often a challenge to transform imitation information from a complex scene into a desired motor result for the robot. Note that imitation is a high-level form of cognitive behavior and imitation is not necessarily required in a basic model of embodied animal cognition. === Knowledge acquisition === A more complex learning approach is "autonomous knowledge acquisition": the robot is left to explore the environment on its own. A system of goals and beliefs is typically assumed. A somewhat more directed mode of exploration can be achieved by "curiosity" algorithms, such as Intelligent Adaptive Curiosity or Category-Based Intrinsic Motivation. These algorithms generally involve breaking sensory input into a finite number of categories and assigning some sort of prediction system (such as an artificial neural network) to each. The prediction system keeps track of the error in its predictions over time. Reduction in prediction error is considered learning. The robot then preferentially explores categories in which it is learning (or reducing prediction error) the fastest. == Other architectures == Some researchers in cognitive robotics have tried using architectures such as (ACT-R and Soar (cognitive architecture)) as a basis of their cognitive robotics programs. These highly modular symbol-processing architectures have been used to simulate operator performance and human performance when modeling simplistic and symbolized laboratory data. The idea is to extend these architectures to handle real-world sensory input as that input continuously unfolds through time. What is needed is a way to somehow translate the world into a set of symbols and their relationships. == Questions == Some of the fundamental questions to be answered in cognitive robotics are: How much human programming should or can be involved to support the learning processes? How can one quantify progress? Some of the adopted ways are reward and punishment. But what kind of reward and what kind of punishment? In humans, when teaching a child, for example, the reward would be candy or some encouragement, and the punishment can take many forms. But what is an effective way with robots?

    Read more →
  • Photo-consistency

    Photo-consistency

    In computer vision, photo-consistency determines whether a given voxel is occupied. A voxel is considered to be photo consistent when its color appears to be similar to all the cameras that can see it. Most voxel coloring or space carving techniques require using photo consistency as a check condition in Image-based modeling and rendering applications. == Usage == 3D Volumetric Reconstruction. Image registration. Multi-view reconstruction.

    Read more →
  • Representation collapse

    Representation collapse

    Representation collapse is a phenomenon in machine learning and representation learning where a model maps different inputs to the same or very similar embeddings, which means it loses important information about how the data is spread out. It is frequently encountered in self-supervised learning, especially within contrastive and non-contrastive frameworks, when training objectives or model architectures do not maintain variance across representations. Collapse results in degenerate solutions characterized by uninformative learned features, significantly impairing downstream task performance. Various techniques have been proposed to mitigate representation collapse, including the use of negative samples, architectural asymmetry, stop-gradient operations, variance regularization, and redundancy reduction objectives, as seen in methods such as SimCLR, BYOL, and VICReg. Comprehending and averting representation collapse is regarded as a fundamental challenge in the advancement of stable and efficient self-supervised learning systems.

    Read more →
  • GOLOG

    GOLOG

    GOLOG is a high-level logic programming language for the specification and execution of complex actions in dynamical domains. It is based on the situation calculus. It is a first-order logical language for reasoning about action and change. GOLOG was developed at the University of Toronto. == History == The concept of situation calculus on which the GOLOG programming language is based was first proposed by John McCarthy in 1963. == Description == A GOLOG interpreter automatically maintains a direct characterization of the dynamic world being modeled, on the basis of user supplied axioms about preconditions, effects of actions and the initial state of the world. This allows the application to reason about the condition of the world and consider the impacts of different potential actions before focusing on a specific action. Golog is a logic programming language and is very different from conventional programming languages. A procedural programming language like C defines the execution of statements in advance. The programmer creates a subroutine which consists of statements, and the computer executes each statement in a linear order. In contrast, fifth-generation programming languages like Golog work with an abstract model with which the interpreter can generate the sequence of actions. The source code defines the problem and it is up to the solver to find the next action. This approach can facilitate the management of complex problems from the domain of robotics. A Golog program defines the state space in which the agent is allowed to operate. A path in the symbolic domain is found with state space search. To speed up the process, Golog programs are realized as hierarchical task networks. Apart from the original Golog language, there are some extensions available. The ConGolog language provides concurrency and interrupts. Other dialects like IndiGolog and Readylog were created for real time applications in which sensor readings are updated on the fly. == Uses == Golog has been used to model the behavior of autonomous agents. In addition to a logic-based action formalism for describing the environment and the effects of basic actions, they enable the construction of complex actions using typical programming language constructs. It is also used for applications in high level control of robots and industrial processes, virtual agents, discrete event simulation etc. It can be also used to develop Belief Desire Intention-style agent systems. == Planning and scripting == In contrast to the Planning Domain Definition Language, Golog supports planning and scripting as well. Planning means that a goal state in the world model is defined, and the solver brings a logical system into this state. Behavior scripting implements reactive procedures, which are running as a computer program. For example, suppose the idea is to authoring a story. The user defines what should be true at the end of the plot. A solver gets started and applies possible actions to the current situation until the goal state is reached. The specification of a goal state and the possible actions are realized in the logical world model. In contrast, a hardwired reactive behavior doesn't need a solver but the action sequence is provided in a scripting language. The Golog interpreter, which is written in Prolog, executes the script and this will bring the story into the goal state.

    Read more →
  • Decision list

    Decision list

    Decision lists are a representation for Boolean functions which can be easily learned from examples. Single term decision lists are more expressive than disjunctions and conjunctions; however, 1-term decision lists are less expressive than the general disjunctive normal form and the conjunctive normal form. The language specified by a k-length decision list includes as a subset the language specified by a k-depth decision tree. Learning decision lists can be used for attribute efficient learning, a type of machine learning. == Definition == A decision list (DL) of length r is of the form: if f1 then output b1 else if f2 then output b2 ... else if fr then output br where fi is the ith formula and bi is the ith boolean for i ∈ { 1... r } {\displaystyle i\in \{1...r\}} . The last if-then-else is the default case, which means formula fr is always equal to true. A k-DL is a decision list where all of formulas have at most k terms. Sometimes "decision list" is used to refer to a 1-DL, where all of the formulas are either a variable or its negation.

    Read more →
  • List & Label

    List & Label

    List & Label is a professional reporting tool for software developers. It provides comprehensive design, print and export functions. The software component runs on Microsoft Windows and can be implemented in desktop, cloud and web applications. List & Label can be used to create user-defined dashboards, lists, invoices, forms and labels. It supports many development environments, frameworks and programming languages such as Microsoft Visual Studio, Embarcadero RAD Studio, .NET Framework, .NET Core, ASP.NET, C++, Delphi, Java, C Sharp and some more. List & Label either retrieves data from various sources via data binding, or works database independent. Reports are designed and created in the so-called List & Label Designer and then exported into a multitude of formats like PDF, Excel, XHTML and RTF. Since version 27 a web report designer for ASP.NET MVC is available. == History == The product was first released in 1992 by combit. The current version is 30. A new major version of List & Label is released every fall, usually in October. Updates are available several times a year via Service Pack. == Features == === Report Designer === The Designer enables users to graphically layout the report. It offers report objects such as tables, charts, crosstabs, gauges, HTML, conditionally formatted text, barcodes, matrix codes, and graphics, and is extensible using third-party add-ons. User applications can interact with the report via the programmable object model of the report. The real-time preview functionality allows users to view changes instantly. Usability features include layer and appearance management, enabling conditional logic to dynamically control the visibility of objects in reports. The Designer also supports the inclusion of multiple report containers in a single project, accommodating complex layouts such as parallel tables and charts. A formula wizard and support for scripting languages such as C# facilitate advanced calculations and logic. The Designer's object model (DOM) provides developers with the ability to modify layouts and behaviors programmatically. === Web Report Designer === The web report designer works browser-based and independent from printer drivers and spoolers - that makes deployments to the cloud easier. Just like the use of the Visual Studio deployment pipeline. === Data Sources === Depending on the programming language, the product offers automatic support for data sources: Databases such as Microsoft SQL Server, Oracle, MySQL, PostgreSQL, IBM Db2, SQLite, MariaDB, MongoDB, Cosmos DB XML data, CSV Business objects Data sources that can be accessed via OLE DB, ODBC or ADO.NET LINQ data and data from web services GraphQL Additionally, the product offers support for unbound data and can be extended to support other data sources via interfaces. === Output Options === Printer Image Formats (JPEG, BMP, EMF, TIFF, PNG, SVG, HEIF, WebP) Document Formats: PDF, PDF/A, Word (DOCX), Excel (XLS), PowerPoint (PPTX) HTML, XHTML, MHTML Barcodes Plain Text, RTF, CSV, JSON XML, ZIP, Email, JSON List & Label preview file === Target Audience === List & Label can be used in Windows development environments. While it competes most notably on the Microsoft .NET platform with other products such as Crystal Reports, SQL Server Reporting Services, ActiveReports, there are few competing products for other programming languages (e.g. Progress, Alaska Xbase++, Visual DataFlex). == Awards == Reader's Choice Award 2005–2008 Stevie Awards 2021: Best Technology for Data Visualization Top 100 Publisher Award Component Source 2013-2014, 2014-2015,2016, 2018, 2019, 2020, 2021, 2022

    Read more →
  • ELMo

    ELMo

    ELMo (embeddings from language model) is a word embedding method for representing a sequence of words as a corresponding sequence of vectors. It was created by researchers at the Allen Institute for Artificial Intelligence, and University of Washington and first released in February 2018. It is a bidirectional LSTM which takes character-level as inputs and produces word-level embeddings, trained on a corpus of about 30 million sentences and 1 billion words. The architecture of ELMo accomplishes a contextual understanding of tokens. Deep contextualized word representation is useful for many natural language processing tasks, such as coreference resolution and polysemy resolution. ELMo was historically important as a pioneer of self-supervised generative pretraining followed by fine-tuning, where a large model is trained to reproduce a large corpus, then the large model is augmented with additional task-specific weights and fine-tuned on supervised task data. It was an instrumental step in the evolution towards transformer-based language modelling. == Architecture == ELMo is a multilayered bidirectional LSTM on top of a token embedding layer. The output of all LSTMs concatenated together consists of the token embedding. The input text sequence is first mapped by an embedding layer into a sequence of vectors. Then two parts are run in parallel over it. The forward part is a 2-layered LSTM with 4096 units and 512 dimension projections, and a residual connection from the first to second layer. The backward part has the same architecture, but processes the sequence back-to-front. The outputs from all 5 components (embedding layer, two forward LSTM layers, and two backward LSTM layers) are concatenated and multiplied by a linear matrix ("projection matrix") to produce a 512-dimensional representation per input token. ELMo was pretrained on a text corpus of 1 billion words. The forward part is trained by repeatedly predicting the next token, and the backward part is trained by repeatedly predicting the previous token. After the ELMo model is pretrained, its parameters are frozen, except for the projection matrix, which can be fine-tuned to minimize loss on specific language tasks. This is an early example of the pretraining-fine-tune paradigm. The original paper demonstrated this by improving state of the art on six benchmark NLP tasks. === Contextual word representation === The architecture of ELMo accomplishes a contextual understanding of tokens. For example, the first forward LSTM of ELMo would process each input token in the context of all previous tokens, and the first backward LSTM would process each token in the context of all subsequent tokens. The second forward LSTM would then incorporate those to further contextualize each token. Deep contextualized word representation is useful for many natural language processing tasks, such as coreference resolution and polysemy resolution. For example, consider the sentenceShe went to the bank to withdraw money.In order to represent the token "bank", the model must resolve its polysemy in context. The first forward LSTM would process "bank" in the context of "She went to the", which would allow it to represent the word to be a location that the subject is going towards. The first backward LSTM would process "bank" in the context of "to withdraw money", which would allow it to disambiguate the word as referring to a financial institution. The second forward LSTM can then process "bank" using the representation vector provided by the first backward LSTM, thus allowing it to represent it to be a financial institution that the subject is going towards. == Historical context == ELMo is one link in a historical evolution of language modelling. Consider a simple problem of document classification, where we want to assign a label (e.g., "spam", "not spam", "politics", "sports") to a given piece of text. The simplest approach is the "bag of words" approach, where each word in the document is treated independently, and its frequency is used as a feature for classification. This was computationally cheap but ignored the order of words and their context within the sentence. GloVe and Word2Vec built upon this by learning fixed vector representations (embeddings) for words based on their co-occurrence patterns in large text corpora. Like BERT (but unlike "bag of words" such as Word2Vec and GloVe), ELMo word embeddings are context-sensitive, producing different representations for words that share the same spelling. It was trained on a corpus of about 30 million sentences and 1 billion words. Previously, bidirectional LSTM was used for contextualized word representation. ELMo applied the idea to a large scale, achieving state of the art performance. After the 2017 publication of Transformer architecture, the architecture of ELMo was changed from a multilayered bidirectional LSTM to a Transformer encoder, giving rise to BERT. BERT has a similar pretrain-fine-tune workflow, but uses a Transformer with implications for more parallelizable training.

    Read more →
  • Curse of dimensionality

    Curse of dimensionality

    The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. The expression was coined by Richard E. Bellman when considering problems in dynamic programming. The curse generally refers to issues that arise when the number of datapoints is small (in a suitably defined sense) relative to the intrinsic dimension of the data. Dimensionally cursed phenomena occur in domains such as numerical analysis, sampling, combinatorics, machine learning, data mining and databases. The common theme of these problems is that when the dimensionality increases, the volume of the space increases so fast that the available data becomes sparse. In order to obtain a reliable result, the amount of data needed often grows exponentially with the dimensionality. Also, organizing and searching data often relies on detecting areas where objects form groups with similar properties; in high dimensional data, however, all objects appear to be sparse and dissimilar in many ways, which prevents common data organization strategies from being efficient. == Domains == === Combinatorics === In some problems, each variable can take one of several discrete values, or the range of possible values is divided to give a finite number of possibilities. Taking the variables together, a huge number of combinations of values must be considered. This effect is also known as the combinatorial explosion. Even in the simplest case of d {\displaystyle d} binary variables, the number of possible combinations already is 2 d {\displaystyle 2^{d}} , exponential in the dimensionality. Naively, each additional dimension doubles the effort needed to try all combinations. === Sampling === There is an exponential increase in volume associated with adding extra dimensions to a mathematical space. For example, 102 = 100 evenly spaced sample points suffice to sample a unit interval (try to visualize a "1-dimensional" cube, i.e. a line) with no more than 10−2 = 0.01 distance between points; an equivalent sampling of a 10-dimensional unit hypercube with a lattice that has a spacing of 10−2 = 0.01 between adjacent points would require 1020 = [(102)10] sample points. In general, with a spacing distance of 10−n the 10-dimensional hypercube appears to be a factor of 10n(10−1) = [(10n)10/(10n)] "larger" than the 1-dimensional hypercube, which is the unit interval. In the above example n = 2: when using a sampling distance of 0.01 the 10-dimensional hypercube appears to be 1018 "larger" than the unit interval. This effect is a combination of the combinatorics problems above and the distance function problems explained below. === Optimization === When solving dynamic optimization problems by numerical backward induction, the objective function must be computed for each combination of values. This is a significant obstacle when the dimension of the "state variable" is large. === Machine learning === In machine learning problems that involve learning a "state-of-nature" from a finite number of data samples in a high-dimensional feature space with each feature having a range of possible values, typically an enormous amount of training data is required to ensure that there are several samples with each combination of values. In an abstract sense, as the number of features or dimensions grows, the amount of data we need to generalize accurately grows exponentially. A typical rule of thumb is that there should be at least 5 training examples for each dimension in the representation. In machine learning and insofar as predictive performance is concerned, the curse of dimensionality is used interchangeably with the peaking phenomenon, which is also known as Hughes phenomenon. This phenomenon states that with a fixed number of training samples, the average (expected) predictive power of a classifier or regressor first increases as the number of dimensions or features used is increased but beyond a certain dimensionality it starts deteriorating instead of improving steadily. Nevertheless, in the context of a simple classifier (e.g., linear discriminant analysis in the multivariate Gaussian model under the assumption of a common known covariance matrix), Zollanvari et al. showed both analytically and empirically that as long as the relative cumulative efficacy of an additional feature set (with respect to features that are already part of the classifier) is greater (or less) than the size of this additional feature set, the expected error of the classifier constructed using these additional features will be less (or greater) than the expected error of the classifier constructed without them. In other words, both the size of additional features and their (relative) cumulative discriminatory effect are important in observing a decrease or increase in the average predictive power. In metric learning, higher dimensions can sometimes allow a model to achieve better performance. After normalizing embeddings to the surface of a hypersphere, FaceNet achieves the best performance using 128 dimensions as opposed to 64, 256, or 512 dimensions in one ablation study. A loss function for unitary-invariant dissimilarity between word embeddings was found to be minimized in high dimensions. === Data mining === In data mining, the curse of dimensionality refers to a data set with too many features. Consider the first table, which depicts 200 individuals and 2000 genes (features) with a 1 or 0 denoting whether or not they have a genetic mutation in that gene. A data mining application to this data set may be finding the correlation between specific genetic mutations and creating a classification algorithm such as a decision tree to determine whether an individual has cancer or not. A common practice of data mining in this domain would be to create association rules between genetic mutations that lead to the development of cancers. To do this, one would have to loop through each genetic mutation of each individual and find other genetic mutations that occur over a desired threshold and create pairs. They would start with pairs of two, then three, then four until they result in an empty set of pairs. The complexity of this algorithm can lead to calculating all permutations of gene pairs for each individual or row. Given the formula for calculating the permutations of n items with a group size of r is: n ! ( n − r ) ! {\displaystyle {\frac {n!}{(n-r)!}}} , calculating the number of three pair permutations of any given individual would be 7988004000 different pairs of genes to evaluate for each individual. The number of pairs created will grow by an order of factorial as the size of the pairs increase. The growth is depicted in the permutation table (see right). As we can see from the permutation table above, one of the major problems data miners face regarding the curse of dimensionality is that the space of possible parameter values grows exponentially or factorially as the number of features in the data set grows. This problem critically affects both computational time and space when searching for associations or optimal features to consider. Another problem data miners may face when dealing with too many features is that the number of false predictions or classifications tends to increase as the number of features grows in the data set. In terms of the classification problem discussed above, keeping every data point could lead to a higher number of false positives and false negatives in the model. This may seem counterintuitive, but consider the genetic mutation table from above, depicting all genetic mutations for each individual. Each genetic mutation, whether they correlate with cancer or not, will have some input or weight in the model that guides the decision-making process of the algorithm. There may be mutations that are outliers or ones that dominate the overall distribution of genetic mutations when in fact they do not correlate with cancer. These features may be working against one's model, making it more difficult to obtain optimal results. This problem is up to the data miner to solve, and there is no universal solution. The first step any data miner should take is to explore the data, in an attempt to gain an understanding of how it can be used to solve the problem. One must first understand what the data means, and what they are trying to discover before they can decide if anything must be removed from the data set. Then they can create or use a feature selection or dimensionality reduction algorithm to remove samples or features from the data set if they deem it necessary. One example of such methods is the interquartile range method, used to remove outliers in a data set by calculating the standard deviation of a feature or occurrence. === Distance function === When a measure such as a Euclidean distance is defined using many coordinat

    Read more →