AI Content Generation Tools

AI Content Generation Tools — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Clubhouse (app)

    Clubhouse (app)

    Clubhouse is an American social audio app for iOS and Android developed by Alpha Exploration Co. that enables users to participate in real-time, audio-only communication within virtual "rooms". Launched in March 2020 by Paul Davison and Rohan Seth, the platform is characterized by its "drop-in" nature, where users can join live discussions on a wide range of topics as either listeners or speakers. The application gained attention in early 2021, operating on an invite-only model and featuring appearances from public figures such as Elon Musk, Oprah Winfrey, and Mark Zuckerberg. During this period, Clubhouse reached a reported valuation of approximately $4 billion and contributed to the expansion of similar social audio features like Twitter Spaces and Spotify Greenroom. The app later expanded to Android in May 2021 and removed its waitlist in July 2021, opening access to the general public. == History == Clubhouse began as an invite only social media startup by Paul Davison and Rohan Seth in Fall 2019. Originally designed for podcasts with the name Talkshow, the app was rebranded as "Clubhouse" and officially released for the iOS operating system in March 2020 and as of May 2021 the Android systems as well. Clubhouse was valued at $100 million after receiving funding from notable angel investors. These investors included Ryan Hoover (Founder, Product Hunt), Balaji Srinivasan (Former CTO, Coinbase), James Beshara (Co-Founder, Tilt.com), and several venture capitalists, including a $12 million Series A investment from the venture capital firm, Andreessen Horowitz, in May 2020. The app gained popularity in the early months of the COVID-19 pandemic. It had 600,000 registered users by December 2020. In January 2021, CEO Paul Davison announced that the active weekly user base on the app consisted of approximately 2 million individuals. The company announced that it would start working on an Android version of the app. In that month, the app became widely used in Germany when German podcast hosts Philipp Klöckner and Philipp Gloeckler began an invite-chain over a Telegram group. It brought German influencers, journalists, and politicians to the platform. Clubhouse raised their Series B at a $1 billion valuation. On February 1, 2021, Clubhouse had an estimated 3.5 million downloads on a global level which grew rapidly to 8.1 million downloads by February 15. This significant growth in popularity was because celebrities such as Elon Musk and Mark Zuckerberg made appearances on the app. In the same month, Clubhouse hired an Android Software Developer. A year after the app's release, the number of weekly active users was greater than 10 million, but the user base declined 21% during three weeks from late February to early March. This decline was reportedly caused by a decrease in the number of Clubhouse users after its initial release. During its initial roll out, the app was accessible only by invitation, and invitation codes on eBay were selling at up to $400. On April 5, 2021, Clubhouse partnered with Stripe to launch its first monetizing feature called Clubhouse Payments. Although testing began with only 1,000 users, after a week, the company rolled out the functionality to another 60,000 or more users in the US. In the same month, Twitter entered in discussions to purchase Clubhouse for $4 billion. The talks ended with no acquisition. Later, the company raised their Series C round of funding at a $4 billion valuation. The app also received interest in a partnership, with the National Football League announcing a content deal that month; Twitter Spaces later poached Clubhouse's exclusive NFL deal with 20 official NFL Spaces scheduled for the 2021-22 season. Finally, On May 9, 2021, Clubhouse launched a beta version of the Android app for users in the US, and on May 21, 2021, Clubhouse became available worldwide for Android users. In July 2021, Clubhouse announced a partnership with TED to offer exclusive talks. and on July 21, 2021, the company discarded its invitation system and made the application available to all, though a wait list for registration was still applied in order to manage new traffic. As of the time of the announcement, the company stated it had 10 million users on the wait list. On September 23, 2021, the company announced a new feature named "Wave". In October 2021, Clubhouse rolled out new features called "Replays and Clips". In April 2023, the company announced it was reducing its staff by half amid a "resetting" due to post-pandemic market shifts. == Features == === Rooms === The primary feature of Clubhouse is real-time virtual "rooms" in which users can communicate with each other via audio. Rooms are divided into different categories based on levels of privacy. Moderator roles are denoted by a green star that appears next to the user's name. When a user joins a room, they are initially assigned to the role of a "listener" and cannot unmute themselves. Listeners can notify the moderators of their intent to join the stage and speak by clicking on the "raise hand" icon. Users who are invited to the stage become "speakers" and can unmute themselves. Users can exit a room by tapping the "leave quietly" button or with the help of peace sign emoji. === Houses === In August 2022, Clubhouse announced a feature called Houses, an invite-based version of the rooms. === Events === A lot of conversations in Clubhouse are of spontaneous nature. However, users can schedule conversations by creating events. While scheduling an event, users can first name the event and then set the date and time at which the conversation will begin. Users can also add co-hosts to help moderate the event. Once the event has been created, it is added to the Clubhouse "bulletin". The bulletin shows upcoming scheduled events and allows users to set notifications for events by clicking the bell icon corresponding to the event. Users can access the bulletin by clicking on the calendar icon at the top of the home page. === Clubs === At the Clubhouse, clubs are user communities that regularly discuss a common interest. Many clubs are present in Clubhouse which represents a wide array of topics. Users can find clubs by name under the search tab. A club consists of three categories of users: "Admin", "Leader", and "Member". Members can create private rooms and invite more users into the club. Leaders have all the privileges of a member. Apart from that, they are authorized to create/schedule club-branded open rooms. An admin can modify club settings, add/delete users, change user privileges and create/schedule any type of room. There are three types of clubs: "Open", "By Approval", and "Closed" for membership. Any user can join an open club by pressing the "Join The Club" button on the club profile. In case of approval, users need to apply and wait for membership by clicking the "Apply To Join" button on the club profile. The admins of the respective club are privileged to accept or reject the user's request. In a closed club, membership is limited to users selected by the club admin. All users of a club will be notified when a public room within the club is created. The club creation is restricted to active users and whoever creates the club will become the club admin. Eligible users can create a club by going to their profile, press the "+" sign present in the "Member of" section. Clubs in which a user is a member are shown on their profile page. The first club to half a million members was the Human Behavior Club founded by The Digital Doctor (Dr. Sohaib Imtiaz). === Backchannel === Backchannel is the messaging function which allows users to interact individually or within a group via text. The Backchannel feature was initially leaked on June 18, 2021, in response to the launch of Spotify Greenroom. This is notable step because, until this point, Clubhouse was voice only with no way to hyperlink or message. It was entirely dependent on Instagram and Twitter for text messaging. The feature was initially leaked in the App Store, which the company says was an accident on Twitter. A month later, after multiple failed attempts, the Clubhouse Backchannel finally launched on July 14, 2021. === Explore === The homepage of Clubhouse provides access to ongoing chat rooms, which are recommended based on the people and clubs that are followed by the user. As the users tap on the magnifying glass icon, they will be redirected to the explore page. On that page, users can search for people and clubs to follow and also find conversations categorized by topics. === Clubhouse Payments === This is the direct payment service provided by the app, which allows users to send money to content creators. It includes those users who had enabled this functionality in their profile. Money can be sent from users to the creator by clicking on their profile. Press "Send Money" then enter the amount you want to send. When a user does this for the first time, they'll be prompted to reg

    Read more →
  • Decision list

    Decision list

    Decision lists are a representation for Boolean functions which can be easily learned from examples. Single term decision lists are more expressive than disjunctions and conjunctions; however, 1-term decision lists are less expressive than the general disjunctive normal form and the conjunctive normal form. The language specified by a k-length decision list includes as a subset the language specified by a k-depth decision tree. Learning decision lists can be used for attribute efficient learning, a type of machine learning. == Definition == A decision list (DL) of length r is of the form: if f1 then output b1 else if f2 then output b2 ... else if fr then output br where fi is the ith formula and bi is the ith boolean for i ∈ { 1... r } {\displaystyle i\in \{1...r\}} . The last if-then-else is the default case, which means formula fr is always equal to true. A k-DL is a decision list where all of formulas have at most k terms. Sometimes "decision list" is used to refer to a 1-DL, where all of the formulas are either a variable or its negation.

    Read more →
  • Wadhwani Institute for Artificial Intelligence

    Wadhwani Institute for Artificial Intelligence

    Wadhwani AI, based in Mumbai, Maharashtra, is an independent, non-profit institute. Founded in 2018, it is dedicated to developing Artificial intelligence solutions for social good. Their mission is to build AI-based innovations and solutions for underserved communities in developing countries, for a wide range of domains including agriculture, education, financial inclusion, healthcare, and infrastructure. == History and funding == The institute was founded with a $30 million philanthropic effort by the Wadhwani brothers, Romesh Wadhwani and Sunil Wadhwani. The institute was inaugurated and dedicated to the nation by Narendra Modi, the 14th Prime Minister of India. In 2019, the institute received a $2 million grant from Google.org to create technologies to help reduce crop losses in cotton farming, through integrated pest management. The United States Agency for International Development awarded $2 million to the institute in 2020 to develop tools, using mathematical modeling techniques and digital technologies such as artificial intelligence and machine learning, to forecast COVID-19 disease patterns, estimate resources needed, and plan interventions. == Collaboration == With assistance from Google, the Ministry of Agriculture and Farmers' Welfare and the Wadhwani AI developed Krishi 24/7, the first AI-powered automated agricultural news monitoring and analysis tool. Through better decision-making, Krishi 24/7 will support the identification of valuable news, provide timely notifications, and respond quickly to safeguard farmers' interests and advance sustainable agricultural growth. The application converts news articles into English after scanning them in several languages. It ensures that the ministry is informed in a timely manner about pertinent occurrences that are published online by extracting key information from news items, including the headline, crop name, event type, date, location, severity, summary, and source link. The National Center for Disease Control has effectively implemented a comparable automated surveillance and analysis tool for disease outbreaks.

    Read more →
  • Wadhwani Institute for Artificial Intelligence

    Wadhwani Institute for Artificial Intelligence

    Wadhwani AI, based in Mumbai, Maharashtra, is an independent, non-profit institute. Founded in 2018, it is dedicated to developing Artificial intelligence solutions for social good. Their mission is to build AI-based innovations and solutions for underserved communities in developing countries, for a wide range of domains including agriculture, education, financial inclusion, healthcare, and infrastructure. == History and funding == The institute was founded with a $30 million philanthropic effort by the Wadhwani brothers, Romesh Wadhwani and Sunil Wadhwani. The institute was inaugurated and dedicated to the nation by Narendra Modi, the 14th Prime Minister of India. In 2019, the institute received a $2 million grant from Google.org to create technologies to help reduce crop losses in cotton farming, through integrated pest management. The United States Agency for International Development awarded $2 million to the institute in 2020 to develop tools, using mathematical modeling techniques and digital technologies such as artificial intelligence and machine learning, to forecast COVID-19 disease patterns, estimate resources needed, and plan interventions. == Collaboration == With assistance from Google, the Ministry of Agriculture and Farmers' Welfare and the Wadhwani AI developed Krishi 24/7, the first AI-powered automated agricultural news monitoring and analysis tool. Through better decision-making, Krishi 24/7 will support the identification of valuable news, provide timely notifications, and respond quickly to safeguard farmers' interests and advance sustainable agricultural growth. The application converts news articles into English after scanning them in several languages. It ensures that the ministry is informed in a timely manner about pertinent occurrences that are published online by extracting key information from news items, including the headline, crop name, event type, date, location, severity, summary, and source link. The National Center for Disease Control has effectively implemented a comparable automated surveillance and analysis tool for disease outbreaks.

    Read more →
  • Symbolic regression

    Symbolic regression

    Symbolic regression (SR) is a type of regression analysis that searches the space of mathematical expressions to find the model that best fits a given dataset, both in terms of accuracy and simplicity. No particular model is provided as a starting point for symbolic regression. Instead, initial expressions are formed by randomly combining mathematical building blocks such as mathematical operators, analytic functions, constants, and state variables. Usually, a subset of these primitives will be specified by the person operating it, but that's not a requirement of the technique. The symbolic regression problem for mathematical functions has been tackled with a variety of methods, including recombining equations most commonly using genetic programming, as well as more recent methods utilizing Bayesian methods and neural networks. Another non-classical alternative method to SR is called Universal Functions Originator (UFO), which has a different mechanism, search-space, and building strategy. Further methods such as Exact Learning attempt to transform the fitting problem into a moments problem in a natural function space, usually built around generalizations of the Meijer-G function. By not requiring a priori specification of a model, symbolic regression isn't affected by human bias, or unknown gaps in domain knowledge. It attempts to uncover the intrinsic relationships of the dataset, by letting the patterns in the data itself reveal the appropriate models, rather than imposing a model structure that is deemed mathematically tractable from a human perspective. The fitness function that drives the evolution of the models takes into account not only error metrics (to ensure the models accurately predict the data), but also special complexity measures, thus ensuring that the resulting models reveal the data's underlying structure in a way that's understandable from a human perspective. This facilitates reasoning and favors the odds of getting insights about the data-generating system, as well as improving generalisability and extrapolation behaviour by preventing overfitting. Accuracy and simplicity may be left as two separate objectives of the regression—in which case the optimum solutions form a Pareto front—or they may be combined into a single objective by means of a model selection principle such as minimum description length. It has been proven that symbolic regression is an NP-hard problem. Nevertheless, if the sought-for equation is not too complex it is possible to solve the symbolic regression problem exactly by generating every possible function (built from some predefined set of operators) and evaluating them on the dataset in question. == Difference from classical regression == While conventional regression techniques seek to optimize the parameters for a pre-specified model structure, symbolic regression avoids imposing prior assumptions, and instead infers the model from the data. In other words, it attempts to discover both model structures and model parameters. This approach has the disadvantage of having a much larger space to search, because not only the search space in symbolic regression is infinite, but there are an infinite number of models which will perfectly fit a finite data set (provided that the model complexity isn't artificially limited). This means that it will possibly take a symbolic regression algorithm longer to find an appropriate model and parametrization, than traditional regression techniques. This can be attenuated by limiting the set of building blocks provided to the algorithm, based on existing knowledge of the system that produced the data; but in the end, using symbolic regression is a decision that has to be balanced with how much is known about the underlying system. Nevertheless, this characteristic of symbolic regression also has advantages: because the evolutionary algorithm requires diversity in order to effectively explore the search space, the result is likely to be a selection of high-scoring models (and their corresponding set of parameters). Examining this collection could provide better insight into the underlying process, and allows the user to identify an approximation that better fits their needs in terms of accuracy and simplicity. == Benchmarking == === SRBench === In 2021, SRBench was proposed as a large benchmark for symbolic regression. In its inception, SRBench featured 14 symbolic regression methods, 7 other ML methods, and 252 datasets from PMLB. The benchmark intends to be a living project: it encourages the submission of improvements, new datasets, and new methods, to keep track of the state of the art in SR. === SRBench Competition 2022 === In 2022, SRBench announced the competition Interpretable Symbolic Regression for Data Science, which was held at the GECCO conference in Boston, MA. The competition pitted nine leading symbolic regression algorithms against each other on a novel set of data problems and considered different evaluation criteria. The competition was organized in two tracks, a synthetic track and a real-world data track. ==== Synthetic Track ==== In the synthetic track, methods were compared according to five properties: re-discovery of exact expressions; feature selection; resistance to local optima; extrapolation; and sensitivity to noise. Rankings of the methods were: QLattice PySR (Python Symbolic Regression) uDSR (Deep Symbolic Optimization) ==== Real-world Track ==== In the real-world track, methods were trained to build interpretable predictive models for 14-day forecast counts of COVID-19 cases, hospitalizations, and deaths in New York State. These models were reviewed by a subject expert and assigned trust ratings and evaluated for accuracy and simplicity. The ranking of the methods was: uDSR (Deep Symbolic Optimization) QLattice geneticengine (Genetic Engine) == Non-standard methods == Most symbolic regression algorithms prevent combinatorial explosion by implementing evolutionary algorithms that iteratively improve the best-fit expression over many generations. Recently, researchers have proposed algorithms utilizing other tactics in AI. Silviu-Marian Udrescu and Max Tegmark developed the "AI Feynman" algorithm, which attempts symbolic regression by training a neural network to represent the mystery function, then runs tests against the neural network to attempt to break up the problem into smaller parts. For example, if f ( x 1 , . . . , x i , x i + 1 , . . . , x n ) = g ( x 1 , . . . , x i ) + h ( x i + 1 , . . . , x n ) {\displaystyle f(x_{1},...,x_{i},x_{i+1},...,x_{n})=g(x_{1},...,x_{i})+h(x_{i+1},...,x_{n})} , tests against the neural network can recognize the separation and proceed to solve for g {\displaystyle g} and h {\displaystyle h} separately and with different variables as inputs. This is an example of divide and conquer, which reduces the size of the problem to be more manageable. AI Feynman also transforms the inputs and outputs of the mystery function in order to produce a new function which can be solved with other techniques, and performs dimensional analysis to reduce the number of independent variables involved. The algorithm was able to "discover" 100 equations from The Feynman Lectures on Physics, while a leading software using evolutionary algorithms, Eureqa, solved only 71. AI Feynman, in contrast to classic symbolic regression methods, requires a very large dataset in order to first train the neural network and is naturally biased towards equations that are common in elementary physics.

    Read more →
  • Statistical learning theory

    Statistical learning theory

    Statistical learning theory is a framework for machine learning drawing from the fields of statistics and functional analysis. Statistical learning theory deals with the statistical inference problem of finding a predictive function based on data. Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition, and bioinformatics. == Introduction == The goals of learning are understanding and prediction. Learning falls into many categories, including supervised learning, unsupervised learning, online learning, and reinforcement learning. From the perspective of statistical learning theory, supervised learning is best understood. Supervised learning involves learning from a training set of data. Every point in the training is an input–output pair, where the input maps to an output. The learning problem consists of inferring the function that maps between the input and the output, such that the learned function can be used to predict the output from future input. Depending on the type of output, supervised learning problems are either problems of regression or problems of classification. If the output takes a continuous range of values, it is a regression problem. Using Ohm's law as an example, a regression could be performed with voltage as input and current as an output. The regression would find the functional relationship between voltage and current to be R {\displaystyle R} , such that V = I R {\displaystyle V=IR} Classification problems are those for which the output will be an element from a discrete set of labels. Classification is very common for machine learning applications. In facial recognition, for instance, a picture of a person's face would be the input, and the output label would be that person's name. The input would be represented by a large multidimensional vector whose elements represent pixels in the picture. After learning a function based on the training set data, that function is validated on a test set of data, data that did not appear in the training set. == Formal description == Take X {\displaystyle X} to be the vector space of all possible inputs, and Y {\displaystyle Y} to be the vector space of all possible outputs. Statistical learning theory takes the perspective that there is some unknown probability distribution over the product space Z = X × Y {\displaystyle Z=X\times Y} , i.e. there exists some unknown p ( z ) = p ( x , y ) {\displaystyle p(z)=p(\mathbf {x} ,y)} . The training set is made up of n {\displaystyle n} samples from this probability distribution, and is notated S = { ( x 1 , y 1 ) , … , ( x n , y n ) } = { z 1 , … , z n } {\displaystyle S=\{(\mathbf {x} _{1},y_{1}),\dots ,(\mathbf {x} _{n},y_{n})\}=\{\mathbf {z} _{1},\dots ,\mathbf {z} _{n}\}} Every x i {\displaystyle \mathbf {x} _{i}} is an input vector from the training data, and y i {\displaystyle y_{i}} is the output that corresponds to it. In this formalism, the inference problem consists of finding a function f : X → Y {\displaystyle f:X\to Y} such that f ( x ) ∼ y {\displaystyle f(\mathbf {x} )\sim y} . Let H {\displaystyle {\mathcal {H}}} be a space of functions f : X → Y {\displaystyle f:X\to Y} called the hypothesis space. The hypothesis space is the space of functions the algorithm will search through. Let V ( f ( x ) , y ) {\displaystyle V(f(\mathbf {x} ),y)} be the loss function, a metric for the difference between the predicted value f ( x ) {\displaystyle f(\mathbf {x} )} and the actual value y {\displaystyle y} . The expected risk is defined to be I [ f ] = ∫ X × Y V ( f ( x ) , y ) p ( x , y ) d x d y {\displaystyle I[f]=\int _{X\times Y}V(f(\mathbf {x} ),y)\,p(\mathbf {x} ,y)\,d\mathbf {x} \,dy} The target function, the best possible function f {\displaystyle f} that can be chosen, is given by the f {\displaystyle f} that satisfies f = argmin h ∈ H ⁡ I [ h ] {\displaystyle f=\mathop {\operatorname {argmin} } _{h\in {\mathcal {H}}}I[h]} Because the probability distribution p ( x , y ) {\displaystyle p(\mathbf {x} ,y)} is unknown, a proxy measure for the expected risk must be used. This measure is based on the training set, a sample from this unknown probability distribution. It is called the empirical risk I S [ f ] = 1 n ∑ i = 1 n V ( f ( x i ) , y i ) {\displaystyle I_{S}[f]={\frac {1}{n}}\sum _{i=1}^{n}V(f(\mathbf {x} _{i}),y_{i})} A learning algorithm that chooses the function f S {\displaystyle f_{S}} that minimizes the empirical risk is called empirical risk minimization. == Loss functions == The choice of loss function is a determining factor on the function f S {\displaystyle f_{S}} that will be chosen by the learning algorithm. The loss function also affects the convergence rate for an algorithm. It is important for the loss function to be convex. Different loss functions are used depending on whether the problem is one of regression or one of classification. === Regression === The most common loss function for regression is the square loss function (also known as the L2-norm). This familiar loss function is used in Ordinary Least Squares regression. The form is: V ( f ( x ) , y ) = ( y − f ( x ) ) 2 {\displaystyle V(f(\mathbf {x} ),y)=(y-f(\mathbf {x} ))^{2}} The absolute value loss (also known as the L1-norm) is also sometimes used: V ( f ( x ) , y ) = | y − f ( x ) | {\displaystyle V(f(\mathbf {x} ),y)=|y-f(\mathbf {x} )|} === Classification === In some sense the 0-1 indicator function is the most natural loss function for classification. It takes the value 0 if the predicted output is the same as the actual output, and it takes the value 1 if the predicted output is different from the actual output. For binary classification with Y = { − 1 , 1 } {\displaystyle Y=\{-1,1\}} , this is: V ( f ( x ) , y ) = θ ( − y f ( x ) ) {\displaystyle V(f(\mathbf {x} ),y)=\theta (-yf(\mathbf {x} ))} where θ {\displaystyle \theta } is the Heaviside step function. == Regularization == In machine learning problems, a major problem that arises is that of overfitting. Because learning is a prediction problem, the goal is not to find a function that most closely fits the (previously observed) data, but to find one that will most accurately predict output from future input. Empirical risk minimization runs this risk of overfitting: finding a function that matches the data exactly but does not predict future output well. Overfitting is symptomatic of unstable solutions; a small perturbation in the training set data would cause a large variation in the learned function. It can be shown that if the stability for the solution can be guaranteed, generalization and consistency are guaranteed as well. Regularization can solve the overfitting problem and give the problem stability. Regularization can be accomplished by restricting the hypothesis space H {\displaystyle {\mathcal {H}}} . A common example would be restricting H {\displaystyle {\mathcal {H}}} to linear functions: this can be seen as a reduction to the standard problem of linear regression. H {\displaystyle {\mathcal {H}}} could also be restricted to polynomial of degree p {\displaystyle p} , exponentials, or bounded functions on L1. Restriction of the hypothesis space avoids overfitting because the form of the potential functions are limited, and so does not allow for the choice of a function that gives empirical risk arbitrarily close to zero. One example of regularization is Tikhonov regularization. This consists of minimizing 1 n ∑ i = 1 n V ( f ( x i ) , y i ) + γ ‖ f ‖ H 2 {\displaystyle {\frac {1}{n}}\sum _{i=1}^{n}V(f(\mathbf {x} _{i}),y_{i})+\gamma \left\|f\right\|_{\mathcal {H}}^{2}} where γ {\displaystyle \gamma } is a fixed and positive parameter, the regularization parameter. Tikhonov regularization ensures existence, uniqueness, and stability of the solution. == Bounding empirical risk == Consider a binary classifier f : X → { 0 , 1 } {\displaystyle f:{\mathcal {X}}\to \{0,1\}} . We can apply Hoeffding's inequality to bound the probability that the empirical risk deviates from the true risk to be a Sub-Gaussian distribution. P ( | R ^ ( f ) − R ( f ) | ≥ ϵ ) ≤ 2 e − 2 n ϵ 2 {\displaystyle \mathbb {P} (|{\hat {R}}(f)-R(f)|\geq \epsilon )\leq 2e^{-2n\epsilon ^{2}}} But generally, when we do empirical risk minimization, we are not given a classifier; we must choose it. Therefore, a more useful result is to bound the probability of the supremum of the difference over the whole class. P ( sup f ∈ F | R ^ ( f ) − R ( f ) | ≥ ϵ ) ≤ 2 S ( F , n ) e − n ϵ 2 / 8 ≈ n d e − n ϵ 2 / 8 {\displaystyle \mathbb {P} {\bigg (}\sup _{f\in {\mathcal {F}}}|{\hat {R}}(f)-R(f)|\geq \epsilon {\bigg )}\leq 2S({\mathcal {F}},n)e^{-n\epsilon ^{2}/8}\approx n^{d}e^{-n\epsilon ^{2}/8}} where S ( F , n ) {\displaystyle S({\mathcal {F}},n)} is the shattering number and n {\displaystyle n} is the number of samples in your dataset. The exponential term comes from Hoeffding but there is an extra cost of taking the supremum over the whole cla

    Read more →
  • Slopaganda

    Slopaganda

    Slopaganda is a portmanteau of "AI slop" and "propaganda", referring to AI-generated content designed to manipulate beliefs, emotions, and political decision-making at scale. The term is credited to Michał Klincewicz, an assistant professor in the Department of Computational Cognitive Science at Tilburg University, in 2025. == Definition == Slopaganda is distinguished from traditional propaganda by three features: scale, scope, and speed. Generative AI makes it possible to produce large volumes of content quickly and at low cost, allows for highly personalised and targeted messaging to specific sub-audiences, and leverages the hyper-connectivity of social networks to accelerate dissemination beyond what conventional media could achieve. Unlike traditional propaganda, which delivers a uniform message to all recipients, slopaganda can be micro-targeted — tailored to individuals based on estimated prior beliefs to reinforce political biases or emotional associations. The authors note that it need not aim at literal deception: much slopaganda is expressive rather than truth-apt, designed to create emotional associations rather than false factual beliefs. == Relation to AI slop == Slopaganda is a subset of AI slop — low-quality, mass-produced AI-generated content — distinguished by intent. Where AI slop may be produced indifferently for commercial or engagement-farming purposes, slopaganda is deployed with a deliberate political or ideological goal. == Notable examples == Examples discussed by the term's originators include Donald Trump's prolific use of AI in Truth Social posts and Iranian Lego-themed music videos. AI-generated videos posted by the White House mixing real military footage with clips from films and video games; and deepfake audio imitating political candidates during the 2024 US presidential campaign have also been given the label slopaganda.

    Read more →
  • Learning to rank

    Learning to rank

    Learning to rank (LTR) or machine-learned ranking (MLR) is the application of machine learning, often supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval and recommender systems. Training data may, for example, consist of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment (e.g. "relevant" or "not relevant") for each item. The goal of constructing the ranking model is to rank new, unseen lists in a similar way to rankings in the training data. == Applications == === In information retrieval === Ranking is a central part of many information retrieval problems, such as document retrieval, collaborative filtering, sentiment analysis, and online advertising. A possible architecture of a machine-learned search engine is shown in the accompanying figure. Training data consists of queries and documents matching them together with the relevance degree of each match. It may be prepared manually by human assessors (or raters, as Google calls them), who check results for some queries and determine relevance of each result. It is not feasible to check the relevance of all documents, and so typically a technique called pooling is used — only the top few documents, retrieved by some existing ranking models are checked. This technique may introduce selection bias. Alternatively, training data may be derived automatically by analyzing clickthrough logs (i.e. search results which got clicks from users), query chains, or such search engines' features as Google's (since-replaced) SearchWiki. Clickthrough logs can be biased by the tendency of users to click on the top search results on the assumption that they are already well-ranked. Training data is used by a learning algorithm to produce a ranking model which computes the relevance of documents for actual queries. Typically, users expect a search query to complete in a short time (such as a few hundred milliseconds for web search), which makes it impossible to evaluate a complex ranking model on each document in the corpus, and so a two-phase scheme is used. First, a small number of potentially relevant documents are identified using simpler retrieval models which permit fast query evaluation, such as the vector space model, Boolean model, weighted AND, or BM25. This phase is called top- k {\displaystyle k} document retrieval and many heuristics were proposed in the literature to accelerate it, such as using a document's static quality score and tiered indexes. In the second phase, a more accurate but computationally expensive machine-learned model is used to re-rank these documents. === In other areas === Learning to rank algorithms have been applied in areas other than information retrieval: In machine translation for ranking a set of hypothesized translations; In computational biology for ranking candidate 3-D structures in protein structure prediction problems; In recommender systems for identifying a ranked list of related news articles to recommend to a user after he or she has read a current news article. == Feature vectors == For the convenience of MLR algorithms, query-document pairs are usually represented by numerical vectors, which are called feature vectors. Such an approach is sometimes called bag of features and is analogous to the bag of words model and vector space model used in information retrieval for representation of documents. Components of such vectors are called features, factors or ranking signals. They may be divided into three groups (features from document retrieval are shown as examples): Query-independent or static features — those features, which depend only on the document, but not on the query. For example, PageRank or document's length. Such features can be precomputed in off-line mode during indexing. They may be used to compute document's static quality score (or static rank), which is often used to speed up search query evaluation. Query-dependent or dynamic features — those features, which depend both on the contents of the document and the query, such as TF-IDF score or other non-machine-learned ranking functions. Query-level features or query features, which depend only on the query. For example, the number of words in a query. Some examples of features, which were used in the well-known LETOR dataset: TF, TF-IDF, BM25, and language modeling scores of document's zones (title, body, anchors text, URL) for a given query; Lengths and IDF sums of document's zones; Document's PageRank, HITS ranks and their variants. Selecting and designing good features is an important area in machine learning, which is called feature engineering. == Evaluation measures == There are several measures (metrics) which are commonly used to judge how well an algorithm is doing on training data and to compare the performance of different MLR algorithms. Often a learning-to-rank problem is reformulated as an optimization problem with respect to one of these metrics. Examples of ranking quality measures: Mean average precision (MAP); DCG and NDCG; Precision@n, NDCG@n, where "@n" denotes that the metrics are evaluated only on top n documents; Mean reciprocal rank; Kendall's tau; Spearman's rho. DCG and its normalized variant NDCG are usually preferred in academic research when multiple levels of relevance are used. Other metrics such as MAP, MRR and precision, are defined only for binary judgments. Recently, there have been proposed several new evaluation metrics which claim to model user's satisfaction with search results better than the DCG metric: Expected reciprocal rank (ERR); Yandex's pfound. Both of these metrics are based on the assumption that the user is more likely to stop looking at search results after examining a more relevant document, than after a less relevant document. == Approaches == Learning to Rank approaches are often categorized using one of three approaches: pointwise (where individual documents are ranked), pairwise (where pairs of documents are ranked into a relative order), and listwise (where an entire list of documents are ordered). Tie-Yan Liu of Microsoft Research Asia has analyzed existing algorithms for learning to rank problems in his book Learning to Rank for Information Retrieval. He categorized them into three groups by their input spaces, output spaces, hypothesis spaces (the core function of the model) and loss functions: the pointwise, pairwise, and listwise approach. In practice, listwise approaches often outperform pairwise approaches and pointwise approaches. This statement was further supported by a large scale experiment on the performance of different learning-to-rank methods on a large collection of benchmark data sets. In this section, without further notice, x {\displaystyle x} denotes an object to be evaluated, for example, a document or an image, f ( x ) {\displaystyle f(x)} denotes a single-value hypothesis, h ( ⋅ ) {\displaystyle h(\cdot )} denotes a bi-variate or multi-variate function and L ( ⋅ ) {\displaystyle L(\cdot )} denotes the loss function. === Pointwise approach === In this case, it is assumed that each query-document pair in the training data has a numerical or ordinal score. Then the learning-to-rank problem can be approximated by a regression problem — given a single query-document pair, predict its score. Formally speaking, the pointwise approach aims at learning a function f ( x ) {\displaystyle f(x)} predicting the real-value or ordinal score of a document x {\displaystyle x} using the loss function L ( f ; x j , y j ) {\displaystyle L(f;x_{j},y_{j})} . A number of existing supervised machine learning algorithms can be readily used for this purpose. Ordinal regression and classification algorithms can also be used in pointwise approach when they are used to predict the score of a single query-document pair, and it takes a small, finite number of values. === Pairwise approach === In this case, the learning-to-rank problem is approximated by a classification problem — learning a binary classifier h ( x u , x v ) {\displaystyle h(x_{u},x_{v})} that can tell which document is better in a given pair of documents. The classifier shall take two documents as its input and the goal is to minimize a loss function L ( h ; x u , x v , y u , v ) {\displaystyle L(h;x_{u},x_{v},y_{u,v})} . The loss function typically reflects the number and magnitude of inversions in the induced ranking. In many cases, the binary classifier h ( x u , x v ) {\displaystyle h(x_{u},x_{v})} is implemented with a scoring function f ( x ) {\displaystyle f(x)} . As an example, RankNet adapts a probability model and defines h ( x u , x v ) {\displaystyle h(x_{u},x_{v})} as the estimated probability of the document x u {\displaystyle x_{u}} has higher quality than x v {\displaystyle x_{v}} : P u , v ( f ) = CDF ( f ( x u ) − f ( x v ) ) , {\displaystyle P_{u,v}(f)={\text{CDF}

    Read more →
  • Random (software)

    Random (software)

    Random was an iOS mobile app that used algorithms and human-curation to create an adaptive interface to the Internet. The app served a remix of relevance and serendipity that allowed people to find diverse topics and interesting content that they might not have encountered otherwise. Random did not require a login or sign-up - the use of the app was anonymous. The app was powered by an artificial intelligence that learned from direct and indirect user interactions inside the app. While learning and adapting to a person, Random created a unique anonymous choice profile that was then used for recommending topics and content. The app didn't recommend the same content twice. == User interface == Random's user interface was made of ever-changing topic blocks that contained keywords and images. By choosing any of the blocks, the user would see related web content. By closing the web content, the user could access new related topics. The user interface allowed people to get more information about a specific topic area or then just leap freely from topic to topic. The content recommended by Random could be any type of web content, varying from news articles to long-form stories and from photographs to videos. Every user of the Random was curating content for other users by using the app. == History == Random was launched in March 2014. The startup was backed by Skype co-founder Janus Friis. The Random app received a strong reception from the likes of The New York Times, TechCrunch, New Scientist, Vice, and other leading publications. The app went on to gain traction with an active and loyal user community of several hundreds of thousands. This was not enough to support the free app model the team strongly believed in, and the service was terminated in December 2015. == Reception == Various reviews in media have emphasized that Random enables people to break their filter bubble and find diverse content they might not find elsewhere. Alan Henry of Lifehacker wrote: "Random... breaks you out by intentionally guiding you to new topics and interesting articles at sites you may not otherwise read." Vice Motherboard's Claire Evans says that: "Random never turns into a filter bubble, because it perpetually injects the irrational into my experience… in a cocktail of relevancy and serendipity." The app has been said to have a unique, minimalistic user experience. Kit Eaton of The New York Times commented that Random "let's you browse the news in a different way to all the other news sites you've probably ever used." Mashable reviewed Random by concluding that the "app may be one of the most simple content-discovery apps on the market."

    Read more →
  • Ameca (robot)

    Ameca (robot)

    Ameca is a robotic humanoid created in 2021 by Engineered Arts, headquarters in Falmouth, Cornwall, United Kingdom. The project commenced in February 2021, and the first public demonstration was at the CES 2022 show in Las Vegas. Ameca's appearance features grey rubber skin on the face and hands, and is specifically designed to appear genderless. In 2024, an Ameca unit was installed in Edinburgh in the UK to reside at the National Robotarium. Ameca generation 3 has been released and showcased at ICRA 2025 along with Ami. == History == The first generation of Ameca was developed at Engineered Arts headquarters in Falmouth, Cornwall, United Kingdom. The project started in February 2021, with the first video revealed publicly on 1 December 2021. Ameca gained widespread attention on Twitter and TikTok ahead of its first public demonstration at the Consumer Electronics Show 2022, where it was covered by CNET and other news outlets. In 2022, Ameca presented an Alternative Christmas message by British TV Channel 4 for Christmas Day. Ameca was associated with the Museum of the Future's robotic family, where it could interact with visitors. In 2024, an Ameca unit was installed in Edinburgh in the UK to reside at the National Robotarium. In January 2026, Ameca served as an ambassador for the European Space Agency (ESA) at the 18th European Space Conference. == Features == It is designed as a platform for further developing robotics technologies involving human-robot interaction. utilizes embedded microphones, binocular eye mounted cameras, a chest camera and facial recognition software to interact with the public. Interactions can be governed by either OpenAI's GPT-3 or human telepresence. It also features articulated motorized arms, fingers, neck and facial features. Ameca's appearance features grey rubber skin on the face and hands, and is specifically designed to appear genderless. == Public appearances == Computer History Museum, California Heinz Nixdorf MuseumsForum, Paderborn, Germany Copernicus Science Center, Warsaw, Poland Museum of the Future, Dubai Consumer Electronics Show 2022 Deutsches Museum Nuremberg OMR Festival 2022 Hosted by Vodafone GITEX 2022 International Conference on Robotics and Automation 2023 International Telecommunication Union AI for Good Global Summit 2023 Sphere (Not Ameca, Custom humanoid named Aura built on Ameca technology)

    Read more →
  • Semantic folding

    Semantic folding

    Semantic folding theory describes a procedure for encoding the semantics of natural language text in a semantically grounded binary representation. This approach provides a framework for modelling how language data is processed by the neocortex. == Theory == Semantic folding theory draws inspiration from Douglas R. Hofstadter's Analogy as the Core of Cognition which suggests that the brain makes sense of the world by identifying and applying analogies. The theory hypothesises that semantic data must therefore be introduced to the neocortex in such a form as to allow the application of a similarity measure and offers, as a solution, the sparse binary vector employing a two-dimensional topographic semantic space as a distributional reference frame. The theory builds on the computational theory of the human cortex known as hierarchical temporal memory (HTM), and positions itself as a complementary theory for the representation of language semantics. A particular strength claimed by this approach is that the resulting binary representation enables complex semantic operations to be performed simply and efficiently at the most basic computational level. == Two-dimensional semantic space == Analogous to the structure of the neocortex, Semantic Folding theory posits the implementation of a semantic space as a two-dimensional grid. This grid is populated by context-vectors in such a way as to place similar context-vectors closer to each other, for instance, by using competitive learning principles. This vector space model is presented in the theory as an equivalence to the well known word space model described in the information retrieval literature. Given a semantic space (implemented as described above) a word-vector can be obtained for any given word Y by employing the following algorithm: For each position X in the semantic map (where X represents cartesian coordinates) if the word Y is contained in the context-vector at position X then add 1 to the corresponding position in the word-vector for Y else add 0 to the corresponding position in the word-vector for Y The result of this process will be a word-vector containing all the contexts in which the word Y appears and will therefore be representative of the semantics of that word in the semantic space. It can be seen that the resulting word-vector is also in a sparse distributed representation (SDR) format [Schütze, 1993] & [Sahlgreen, 2006]. Some properties of word-SDRs that are of particular interest with respect to computational semantics are: high noise resistance: As a result of similar contexts being placed closer together in the underlying map, word-SDRs are highly tolerant of false or shifted "bits". boolean logic: It is possible to manipulate word-SDRs in a meaningful way using boolean (OR, AND, exclusive-OR) and/or arithmetical (SUBtract) functions . sub-sampling: Word-SDRs can be sub-sampled to a high degree without any appreciable loss of semantic information. topological two-dimensional representation: The SDR representation maintains the topological distribution of the underlying map therefore words with similar meanings will have similar word-vectors. This suggests that a variety of measures can be applied to the calculation of semantic similarity, from a simple overlap of vector elements, to a range of distance measures such as: Euclidean distance, Hamming distance, Jaccard distance, cosine similarity, Levenshtein distance, Sørensen-Dice index, etc. == Semantic spaces == Semantic spaces in the natural language domain aim to create representations of natural language that are capable of capturing meaning. The original motivation for semantic spaces stems from two core challenges of natural language: Vocabulary mismatch (the fact that the same meaning can be expressed in many ways) and ambiguity of natural language (the fact that the same term can have several meanings). The application of semantic spaces in natural language processing (NLP) aims at overcoming limitations of rule-based or model-based approaches operating on the keyword level. The main drawback with these approaches is their brittleness, and the large manual effort required to create either rule-based NLP systems or training corpora for model learning. Rule-based and machine learning-based models are fixed on the keyword level and break down if the vocabulary differs from that defined in the rules or from the training material used for the statistical models. Research in semantic spaces dates back more than 20 years. In 1996, two papers were published that raised a lot of attention around the general idea of creating semantic spaces: latent semantic analysis from Microsoft and Hyperspace Analogue to Language from the University of California. However, their adoption was limited by the large computational effort required to construct and use those semantic spaces. A breakthrough with regard to the accuracy of modelling associative relations between words (e.g. "spider-web", "lighter-cigarette", as opposed to synonymous relations such as "whale-dolphin", "astronaut-driver") was achieved by explicit semantic analysis (ESA) in 2007. ESA was a novel (non-machine learning) based approach that represented words in the form of vectors with 100,000 dimensions (where each dimension represents an Article in Wikipedia). However practical applications of the approach are limited due to the large number of required dimensions in the vectors. More recently, advances in neural networking techniques in combination with other new approaches (tensors) led to a host of new recent developments: Word2vec from Google and GloVe from Stanford University. Semantic folding represents a novel, biologically inspired approach to semantic spaces where each word is represented as a sparse binary vector with 16,000 dimensions (a semantic fingerprint) in a 2D semantic map (the semantic universe). Sparse binary representation are advantageous in terms of computational efficiency, and allow for the storage of very large numbers of possible patterns. == Visualization == The topological distribution over a two-dimensional grid (outlined above) lends itself to a bitmap type visualization of the semantics of any word or text, where each active semantic feature can be displayed as e.g. a pixel. As can be seen in the images shown here, this representation allows for a direct visual comparison of the semantics of two (or more) linguistic items. Image 1 clearly demonstrates that the two disparate terms "dog" and "car" have, as expected, very obviously different semantics. Image 2 shows that only one of the meaning contexts of "jaguar", that of "Jaguar" the car, overlaps with the meaning of Porsche (indicating partial similarity). Other meaning contexts of "jaguar" e.g. "jaguar" the animal clearly have different non-overlapping contexts. The visualization of semantic similarity using Semantic Folding bears a strong resemblance to the fMRI images produced in a research study conducted by A.G. Huth et al., where it is claimed that words are grouped in the brain by meaning. voxels, little volume segments of the brain, were found to follow a pattern were semantic information is represented along the boundary of the visual cortex with visual and linguistic categories represented on posterior and anterior side respectively.

    Read more →
  • Kernel embedding of distributions

    Kernel embedding of distributions

    In machine learning, the kernel embedding of distributions (also called the kernel mean or mean map) comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space Ω {\displaystyle \Omega } on which a sensible kernel function (measuring similarity between elements of Ω {\displaystyle \Omega } ) may be defined. For example, various kernels have been proposed for learning from data which are: vectors in R d {\displaystyle \mathbb {R} ^{d}} , discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song, Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in. The analysis of distributions is fundamental in machine learning and statistics, and many algorithms in these fields rely on information theoretic approaches such as entropy, mutual information, or Kullback–Leibler divergence. However, to estimate these quantities, one must first either perform density estimation, or employ sophisticated space-partitioning/bias-correction strategies which are typically infeasible for high-dimensional data. Commonly, methods for modeling complex distributions rely on parametric assumptions that may be unfounded or computationally challenging (e.g. Gaussian mixture models), while nonparametric methods like kernel density estimation (Note: the smoothing kernels in this context have a different interpretation than the kernels discussed here) or characteristic function representation (via the Fourier transform of the distribution) break down in high-dimensional settings. Methods based on the kernel embedding of distributions sidestep these problems and also possess the following advantages: Data may be modeled without restrictive assumptions about the form of the distributions and relationships between variables Intermediate density estimation is not needed Practitioners may specify the properties of a distribution most relevant for their problem (incorporating prior knowledge via choice of the kernel) If a characteristic kernel is used, then the embedding can uniquely preserve all information about a distribution, while thanks to the kernel trick, computations on the potentially infinite-dimensional RKHS can be implemented in practice as simple Gram matrix operations Dimensionality-independent rates of convergence for the empirical kernel mean (estimated using samples from the distribution) to the kernel embedding of the true underlying distribution can be proven. Learning algorithms based on this framework exhibit good generalization ability and finite sample convergence, while often being simpler and more effective than information theoretic methods Thus, learning via the kernel embedding of distributions offers a principled drop-in replacement for information theoretic approaches and is a framework which not only subsumes many popular methods in machine learning and statistics as special cases, but also can lead to entirely new learning algorithms. == Definitions == Let X {\displaystyle X} denote a random variable with domain Ω {\displaystyle \Omega } and distribution P {\displaystyle P} . Given a symmetric, positive-definite kernel k : Ω × Ω → R {\displaystyle k:\Omega \times \Omega \rightarrow \mathbb {R} } the Moore–Aronszajn theorem asserts the existence of a unique RKHS H {\displaystyle {\mathcal {H}}} on Ω {\displaystyle \Omega } (a Hilbert space of functions f : Ω → R {\displaystyle f:\Omega \to \mathbb {R} } equipped with an inner product ⟨ ⋅ , ⋅ ⟩ H {\displaystyle \langle \cdot ,\cdot \rangle _{\mathcal {H}}} and a norm ‖ ⋅ ‖ H {\displaystyle \|\cdot \|_{\mathcal {H}}} ) for which k {\displaystyle k} is a reproducing kernel, i.e., in which the element k ( x , ⋅ ) {\displaystyle k(x,\cdot )} satisfies the reproducing property ⟨ f , k ( x , ⋅ ) ⟩ H = f ( x ) ∀ f ∈ H , ∀ x ∈ Ω . {\displaystyle \langle f,k(x,\cdot )\rangle _{\mathcal {H}}=f(x)\qquad \forall f\in {\mathcal {H}},\quad \forall x\in \Omega .} One may alternatively consider x ↦ k ( x , ⋅ ) {\displaystyle x\mapsto k(x,\cdot )} as an implicit feature mapping φ : Ω → H {\displaystyle \varphi :\Omega \rightarrow {\mathcal {H}}} (which is therefore also called the feature space), so that k ( x , x ′ ) = ⟨ φ ( x ) , φ ( x ′ ) ⟩ H {\displaystyle k(x,x')=\langle \varphi (x),\varphi (x')\rangle _{\mathcal {H}}} can be viewed as a measure of similarity between points x , x ′ ∈ Ω . {\displaystyle x,x'\in \Omega .} While the similarity measure is linear in the feature space, it may be highly nonlinear in the original space depending on the choice of kernel. === Kernel embedding === The kernel embedding of the distribution P {\displaystyle P} in H {\displaystyle {\mathcal {H}}} (also called the kernel mean or mean map) is given by: μ X := E [ k ( X , ⋅ ) ] = E [ φ ( X ) ] = ∫ Ω φ ( x ) d P ( x ) {\displaystyle \mu _{X}:=\mathbb {E} [k(X,\cdot )]=\mathbb {E} [\varphi (X)]=\int _{\Omega }\varphi (x)\ \mathrm {d} P(x)} If P {\displaystyle P} allows a square integrable density p {\displaystyle p} , then μ X = E k p {\displaystyle \mu _{X}={\mathcal {E}}_{k}p} , where E k {\displaystyle {\mathcal {E}}_{k}} is the Hilbert–Schmidt integral operator. A kernel is characteristic if the mean embedding μ : { family of distributions over Ω } → H {\displaystyle \mu :\{{\text{family of distributions over }}\Omega \}\to {\mathcal {H}}} is injective. Each distribution can thus be uniquely represented in the RKHS and all statistical features of distributions are preserved by the kernel embedding if a characteristic kernel is used. === Empirical kernel embedding === Given n {\displaystyle n} training examples { x 1 , … , x n } {\displaystyle \{x_{1},\ldots ,x_{n}\}} drawn independently and identically distributed (i.i.d.) from P , {\displaystyle P,} the kernel embedding of P {\displaystyle P} can be empirically estimated as μ ^ X = 1 n ∑ i = 1 n φ ( x i ) {\displaystyle {\widehat {\mu }}_{X}={\frac {1}{n}}\sum _{i=1}^{n}\varphi (x_{i})} === Joint distribution embedding === If Y {\displaystyle Y} denotes another random variable (for simplicity, assume the co-domain of Y {\displaystyle Y} is also Ω {\displaystyle \Omega } with the same kernel k {\displaystyle k} which satisfies ⟨ φ ( x ) ⊗ φ ( y ) , φ ( x ′ ) ⊗ φ ( y ′ ) ⟩ = k ( x , x ′ ) k ( y , y ′ ) {\displaystyle \langle \varphi (x)\otimes \varphi (y),\varphi (x')\otimes \varphi (y')\rangle =k(x,x')k(y,y')} ), then the joint distribution P ( x , y ) ) {\displaystyle P(x,y))} can be mapped into a tensor product feature space H ⊗ H {\displaystyle {\mathcal {H}}\otimes {\mathcal {H}}} via C X Y = E [ φ ( X ) ⊗ φ ( Y ) ] = ∫ Ω × Ω φ ( x ) ⊗ φ ( y ) d P ( x , y ) {\displaystyle {\mathcal {C}}_{XY}=\mathbb {E} [\varphi (X)\otimes \varphi (Y)]=\int _{\Omega \times \Omega }\varphi (x)\otimes \varphi (y)\ \mathrm {d} P(x,y)} By the equivalence between a tensor and a linear map, this joint embedding may be interpreted as an uncentered cross-covariance operator C X Y : H → H {\displaystyle {\mathcal {C}}_{XY}:{\mathcal {H}}\to {\mathcal {H}}} from which the cross-covariance of functions f , g ∈ H {\displaystyle f,g\in {\mathcal {H}}} can be computed as Cov ⁡ ( f ( X ) , g ( Y ) ) := E [ f ( X ) g ( Y ) ] − E [ f ( X ) ] E [ g ( Y ) ] = ⟨ f , C X Y g ⟩ H = ⟨ f ⊗ g , C X Y ⟩ H ⊗ H {\displaystyle \operatorname {Cov} (f(X),g(Y)):=\mathbb {E} [f(X)g(Y)]-\mathbb {E} [f(X)]\mathbb {E} [g(Y)]=\langle f,{\mathcal {C}}_{XY}g\rangle _{\mathcal {H}}=\langle f\otimes g,{\mathcal {C}}_{XY}\rangle _{{\mathcal {H}}\otimes {\mathcal {H}}}} Given n {\displaystyle n} pairs of training examples { ( x 1 , y 1 ) , … , ( x n , y n ) } {\displaystyle \{(x_{1},y_{1}),\dots ,(x_{n},y_{n})\}} drawn i.i.d. from P {\displaystyle P} , we can also empirically estimate the joint distribution kernel embedding via C ^ X Y = 1 n ∑ i = 1 n φ ( x i ) ⊗ φ ( y i ) {\displaystyle {\widehat {\mathcal {C}}}_{XY}={\frac {1}{n}}\sum _{i=1}^{n}\varphi (x_{i})\otimes \varphi (y_{i})} === Conditional distribution embedding === Given a conditional distribution P ( y ∣ x ) , {\displaystyle P(y\mid x),} one can define the corresponding RKHS embedding as μ Y ∣ x = E [ φ ( Y ) ∣ X ] = ∫ Ω φ ( y ) d P ( y ∣ x ) {\displaystyle \mu _{Y\mid x}=\mathbb {E} [\varphi (Y)\mid X]=\int _{\Omega

    Read more →
  • Cloud-to-cloud integration

    Cloud-to-cloud integration

    Cloud-to-Cloud Integration ( C2I ) allows users to connect disparate cloud computing platforms. While Paas (Platform as a service) and Saas (Software as a service) continue to gain momentum, different vendors have different implementations for cloud computing, e.g. Database, REST, SOAP API. Another name for Cloud-to-Cloud Integration is Cloud-Surfing. See also Cloud-based integration

    Read more →
  • Environmental impact of AI

    Environmental impact of AI

    The environmental impact of the design, training, deployment and use of artificial intelligence includes the greenhouse gas emissions from generating electricity for data centres and computing hardware, operational and upstream water use, and material impacts from hardware manufacturing, mining and electronic waste. Estimating AI's environmental effects can be difficult because results depend on how impacts are measured, including whether accounting includes only model computation or also data-centre overhead, idle capacity, hardware manufacture, and local electricity supply. As these issues have received greater attention, governments and regulators have increasingly considered data-centre reporting requirements, energy-efficiency standards, and broader transparency measures for AI-related resource use. == Carbon footprint and energy use == AI-related energy use arises at multiple stages, including model training, fine-tuning, inference, storage, networking, and supporting infrastructure such as cooling and power conversion. === Individual level === Published estimates of energy use per AI request vary widely across models, tasks and measurement methods. A benchmark study presented at the 2024 ACM Conference on Fairness, Accountability, and Transparency found substantial differences between task types, with lower energy use for some text tasks and much higher energy use for image generation in the study's test conditions. In that benchmark, simple classification tasks consumed about 0.002–0.007 Wh per prompt on average (about 9% of a smartphone charge for 1,000 prompts), while text generation and text summarisation each used about 0.05 Wh per prompt; image generation averaged 2.91 Wh per prompt, and the least efficient image model in the study used 11.49 Wh per image (roughly equivalent to half a smartphone charge). First-party measurements in production environments have also been published. A 2025 Google study on Gemini assistant serving reported median per-prompt energy, emissions, and water-use estimates under the authors' accounting framework, while noting that different system boundaries can produce substantially different results. The study reported a median text-prompt estimate of about 0.24 Wh, which is roughly as much energy as watching nine seconds of television. The study also stated that software and infrastructure improvements reduced energy use by a factor of 33 and carbon emissions by a factor of 44 for a typical prompt over one year within the authors' framework. Researchers at the University of Michigan measured the energy consumption of various Meta Llama 3.1 models released in 2024 and found that smaller language models (8 billion parameters) use about 114 joules (0.03167 Wh) per response, while larger models (405 billion parameters) require up to 6,700 joules (1.861 Wh) per response. This corresponds to the energy needed to run a microwave oven for roughly one-tenth of a second and eight seconds, respectively. Comparisons between AI systems and human labour for specific tasks have produced mixed results and remain sensitive to assumptions about output quality, workload and system boundaries. A 2024 study in Scientific Reports reported 130 to 2900 times lower estimated carbon emissions for selected AI systems than for human writers and illustrators under its assumptions. A later Scientific Reports paper reported a counterexample for programming tasks under its assumptions, finding 5 to 19 times higher estimated emissions for the evaluated AI system than for human programmers on the benchmark used in that study. === System level === ==== Energy use and efficiency ==== AI electricity intensity depends not only on model architecture but also on hardware and facility efficiency. Data-centre operators commonly report Power usage effectiveness (PUE), which measures the ratio of total facility energy to IT equipment energy; a lower PUE indicates less overhead energy for cooling and other supporting infrastructure. Operators may also publish metrics and case studies on hardware efficiency, cooling systems and power sourcing. In its 2024 environmental report, Google stated that its 2023 total greenhouse gas emissions increased 13% year over year, primarily because of increased data-centre energy consumption and supply-chain emissions, while also reporting lower PUE than industry averages for its own facilities. The International Energy Agency has also reported that data centres remain a relatively small share of global electricity use overall, but that their local effects can be much more pronounced because demand is geographically concentrated. ==== Carbon footprint ==== At system level, AI contributes to rising electricity demand in data centres and related infrastructure. The International Energy Agency estimated that data centres used about 415 TWh of electricity in 2024, or around 1.5% of global electricity consumption, and projected that data-centre electricity use could rise to about 945 TWh by 2030, with AI identified as the main driver of that growth alongside other digital services. The carbon footprint of AI systems depends strongly on electricity sources, hardware efficiency, utilisation rates, and what stages are included in the accounting. Training large models can require substantial electricity, while total lifecycle impacts also depend on deployment scale and the amount of inference performed after training. Early analyses of frontier-model development reported rapid historical growth in training compute for selected systems, although later trends have depended on changes in model design, hardware and efficiency gains. Accounting methods that include upstream or embodied impacts, such as hardware manufacture and facilities construction, can materially affect estimates of AI-related emissions. === Decisions and strategies by individual companies === Large technology companies have reported that the expansion of AI and cloud infrastructure affects their sustainability targets, electricity demand, and resource use. Google, for example, attributed part of its emissions growth in 2023 to increased data-centre energy consumption and supply-chain emissions in its 2024 environmental report. Cloud and AI companies have also announced measures intended to reduce environmental impacts, including investment in more efficient hardware, low-carbon electricity procurement, alternative cooling systems, and water stewardship programmes. The extent, comparability, and third-party verification of such disclosures vary between firms and jurisdictions. == Water usage == Data centres can use water directly for cooling and indirectly through the water used in electricity generation, depending on the local energy mix. Public reporting on data-centre water use has often been inconsistent, making comparisons between operators and regions difficult. To standardise operational reporting, The Green Grid proposed the metric water usage effectiveness (WUE), defined as annual site water use divided by IT equipment energy use. WUE does not by itself measure local water stress, source sustainability, or all upstream water impacts. Studies of AI water use also distinguish between water withdrawal and water consumption. Research on AI-specific water use has argued that the water footprint of AI systems can be difficult to observe and may vary substantially by location, cooling design, and electricity source. A 2025 Communications of the ACM article summarised methods for estimating AI water footprints and emphasised the distinction between water withdrawal and water consumption. Li and colleagues estimated that global AI water withdrawal could reach 4.2–6.6 billion cubic metres in 2027 under the scenarios examined in their article. Using GPT-3, released by OpenAI in 2020, as an example, they estimated that training the model in Microsoft's U.S. data centres could consume about 700,000 litres of onsite water and about 5.4 million litres in total when offsite electricity-related water use was included; they also estimated that 10–50 medium-length GPT-3 responses could consume about 500 mL of water, depending on when and where the model was deployed. Published prompt-level estimates have also varied by system and accounting framework: the 2025 Google study on Gemini assistant serving reported a median text-prompt estimate of about 0.26 mL under its framework. Location can materially affect the significance of data-centre water use. Research on U.S. data centres found that one-fifth of servers' direct water footprint came from moderately to highly water-stressed watersheds, while nearly half of servers were fully or partially powered by plants located in water-stressed regions. A 2025 Reuters report, citing data from Verisk Maplecroft and NatureFinance, said that an average mid-sized data centre uses about 1.4 million litres of water per day for cooling and that Phoenix would experience a 32% increase in annual water stress if currently pl

    Read more →
  • Coupled pattern learner

    Coupled pattern learner

    Coupled Pattern Learner (CPL) is a machine learning algorithm which couples the semi-supervised learning of categories and relations to forestall the problem of semantic drift associated with boot-strap learning methods. == Coupled Pattern Learner == Semi-supervised learning approaches using a small number of labeled examples with many unlabeled examples are usually unreliable as they produce an internally consistent, but incorrect set of extractions. CPL solves this problem by simultaneously learning classifiers for many different categories and relations in the presence of an ontology defining constraints that couple the training of these classifiers. It was introduced by Andrew Carlson, Justin Betteridge, Estevam R. Hruschka Jr. and Tom M. Mitchell in 2009. == CPL overview == CPL is an approach to semi-supervised learning that yields more accurate results by coupling the training of many information extractors. Basic idea behind CPL is that semi-supervised training of a single type of extractor such as ‘coach’ is much more difficult than simultaneously training many extractors that cover a variety of inter-related entity and relation types. Using prior knowledge about the relationships between these different entities and relations CPL makes unlabeled data as a useful constraint during training. For e.g., ‘coach(x)’ implies ‘person(x)’ and ‘not sport(x)’. == CPL description == === Coupling of predicates === CPL primarily relies on the notion of coupling the learning of multiple functions so as to constrain the semi-supervised learning problem. CPL constrains the learned function in two ways. Sharing among same-arity predicates according to logical relations Relation argument type-checking === Sharing among same-arity predicates === Each predicate P in the ontology has a list of other same-arity predicates with which P is mutually exclusive. If A is mutually exclusive with predicate B, A’s positive instances and patterns become negative instances and negative patterns for B. For example, if ‘city’, having an instance ‘Boston’ and a pattern ‘mayor of arg1’, is mutually exclusive with ‘scientist’, then ‘Boston’ and ‘mayor of arg1’ will become a negative instance and a negative pattern respectively for ‘scientist.’ Further, Some categories are declared to be a subset of another category. For e.g., ‘athlete’ is a subset of ‘person’. === Relation argument type-checking === This is a type checking information used to couple the learning of relations and categories. For example, the arguments of the ‘ceoOf’ relation are declared to be of the categories ‘person’ and ‘company’. CPL does not promote a pair of noun phrases as an instance of a relation unless the two noun phrases are classified as belonging to the correct argument types. === Algorithm description === Following is a quick summary of the CPL algorithm. Input: An ontology O, and a text corpus C Output: Trusted instances/patterns for each predicate for i=1,2,...,∞ do foreach predicate p in O do EXTRACT candidate instances/contextual patterns using recently promoted patterns/instances; FILTER candidates that violate coupling; RANK candidate instances/patterns; PROMOTE top candidates; end end ==== Inputs ==== A large corpus of Part-Of-Speech tagged sentences and an initial ontology with predefined categories, relations, mutually exclusive relationships between same-arity predicates, subset relationships between some categories, seed instances for all predicates, and seed patterns for the categories. ==== Candidate extraction ==== CPL finds new candidate instances by using newly promoted patterns to extract the noun phrases that co-occur with those patterns in the text corpus. CPL extracts, Category Instances Category Patterns Relation Instances Relation Patterns ==== Candidate filtering ==== Candidate instances and patterns are filtered to maintain high precision, and to avoid extremely specific patterns. An instance is only considered for assessment if it co-occurs with at least two promoted patterns in the text corpus, and if its co-occurrence count with all promoted patterns is at least three times greater than its co-occurrence count with negative patterns. ==== Candidate ranking ==== CPL ranks candidate instances using the number of promoted patterns that they co-occur with so that candidates that occur with more patterns are ranked higher. Patterns are ranked using an estimate of the precision of each pattern. ==== Candidate promotion ==== CPL ranks the candidates according to their assessment scores and promotes at most 100 instances and 5 patterns for each predicate. Instances and patterns are only promoted if they co-occur with at least two promoted patterns or instances, respectively. == Meta-Bootstrap Learner == Meta-Bootstrap Learner (MBL) was also proposed by the authors of CPL. Meta-Bootstrap learner couples the training of multiple extraction techniques with a multi-view constraint, which requires the extractors to agree. It makes addition of coupling constraints on top of existing extraction algorithms, while treating them as black boxes, feasible. MBL assumes that the errors made by different extraction techniques are independent. Following is a quick summary of MBL. Input: An ontology O, a set of extractors ε Output: Trusted instances for each predicate for i=1,2,...,∞ do foreach predicate p in O do foreach extractor e in ε do Extract new candidates for p using e with recently promoted instances; end FILTER candidates that violate mutual-exclusion or type-checking constraints; PROMOTE candidates that were extracted by all extractors; end end Subordinate algorithms used with MBL do not promote any instance on their own, they report the evidence about each candidate to MBL and MBL is responsible for promoting instances. == Applications == In their paper authors have presented results showing the potential of CPL to contribute new facts to existing repository of semantic knowledge, Freebase

    Read more →