AI Chat Interface

AI Chat Interface — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Adobe ImageReady

    Adobe ImageReady

    Adobe ImageReady was a bitmap graphics editor that was shipped with Adobe Photoshop for six years. It was available for Windows, Classic Mac OS and Mac OS X from 1998 to 2007. ImageReady was designed for web development and closely interacted with Photoshop. == Function == ImageReady was designed for web development rather than effects-intensive photo manipulation. To that end, ImageReady has specialized features such as animated GIF creation, image compression optimization, image slicing, adding rollover effects, and HTML generation. Photoshop versions with which ImageReady was released have an "Edit in ImageReady" button that enables editing of image directly in ImageReady. ImageReady, in turn, has an "Edit in Photoshop" button. ImageReady has strong resemblances to Photoshop; it can even use the same set of Photoshop filters. One set of tools that does not resemble the Photoshop tools, however, is the Image Map set of tools, indicated by a shape or arrow with a hand that varied depending upon the version. This toolbox has several features not found in Photoshop, including: Toggle Image Map Visibility and Toggle Slice Visibility tools: toggle between showing and hiding image maps and slices, respectively Export Animation Frames as Files option: saves all or specified frames for an alternate use, e.g., to e-mail slides for review Preview Document tool: provides a preview of rollover effects in ImageReady rather than previewing them in a browser Preview in Default Browser tool: previews the image in a browser, including any rollover or animation effects Edit in Photoshop button: opens the current image in Photoshop == History == Adobe ImageReady 1.0 was released in July 1998 as a standalone application. Version 2.0 was packaged with Photoshop 5.5, and the program was included with Photoshop through version 9.0 (CS2). Starting with Photoshop 7.0, Adobe changed the version numbers of ImageReady to match. With the release of the Creative Suite 3, ImageReady was discontinued. According to Adobe, ImageReady's most popular features were merged into Photoshop. (Even before discontinuation, some of ImageReady's web optimization functionality could be found in Photoshop's Save For Web & Devices tool.) Around the same time, Adobe purchased rival software developer Macromedia, whose application Fireworks had been a competitor to ImageReady.

    Read more →
  • Egocentric vision

    Egocentric vision

    Egocentric vision or first-person vision is a sub-field of computer vision that entails analyzing images and videos captured by a wearable camera, which is typically worn on the head or on the chest and naturally approximates the visual field of the camera wearer. Consequently, visual data capture the part of the scene on which the user focuses to carry out the task at hand and offer a valuable perspective to understand the user's activities and their context in a naturalistic setting. The wearable camera looking forwards is often supplemented with a camera looking inward at the user's eye and able to measure a user's eye gaze, which is useful to reveal attention and to better understand the user's activity and intentions. == History == The idea of using a wearable camera to gather visual data from a first-person perspective dates back to the 70s, when Steve Mann invented "Digital Eye Glass", a device that, when worn, causes the human eye itself to effectively become both an electronic camera and a television display. Subsequently, wearable cameras were used for health-related applications in the context of Humanistic Intelligence and Wearable AI. Egocentric vision is best done from the point-of-eye, but may also be done by way of a neck-worn camera when eyeglasses would be in-the-way. This neck-worn variant was popularized by way of the Microsoft SenseCam in 2006 for experimental health research works. The interest of the computer vision community into the egocentric paradigm has been arising slowly entering the 2010s and it is rapidly growing in recent years, boosted by both the impressive advances in the field of wearable technology and by the increasing number of potential applications. The prototypical first-person vision system described by Kanade and Hebert, in 2012 is composed by three basic components: a localization component able to estimate the surrounding, a recognition component able to identify object and people, and an activity recognition component, able to provide information about the current activity of the user. Together, these three components provide a complete situational awareness of the user, which in turn can be used to provide assistance to the user or to the caregiver. Following this idea, the first computational techniques for egocentric analysis focused on hand-related activity recognition and social interaction analysis. Also, given the unconstrained nature of the video and the huge amount of data generated, temporal segmentation and summarization were among the first problems addressed. After almost ten years of egocentric vision (2007–2017), the field is still undergoing diversification. Emerging research topics include: Social saliency estimation Multi-agent egocentric vision systems Privacy preserving techniques and applications Attention-based activity analysis Social interaction analysis Hand pose analysis Ego graphical User Interfaces (EUI) Understanding social dynamics and attention Revisiting robotic vision and machine vision as egocentric sensing Activity forecasting Gaze prediction == Technical challenges == Today's wearable cameras are small and lightweight digital recording devices that can acquire images and videos automatically, without the user intervention, with different resolutions and frame rates, and from a first-person point of view. Therefore, wearable cameras are naturally primed to gather visual information from our everyday interactions since they offer an intimate perspective of the visual field of the camera wearer. Depending on the frame rate, it is common to distinguish between photo-cameras (also called lifelogging cameras) and video-cameras. The former (e.g., Narrative Clip and Microsoft SenseCam), are commonly worn on the chest, and are characterized by a very low frame rate (up to 2fpm) that allows to capture images over a long period of time without the need of recharging the battery. Consequently, they offer considerable potential for inferring knowledge about e.g. behaviour patterns, habits or lifestyle of the user. However, due to the low frame-rate and the free motion of the camera, temporally adjacent images typically present abrupt appearance changes so that motion features cannot be reliably estimated. The latter (e.g., Google Glass, GoPro), are commonly mounted on the head, and capture conventional video (around 35fps) that allows to capture fine temporal details of interactions. Consequently, they offer potential for in-depth analysis of daily or special activities. However, since the camera is moving with the wearer head, it becomes more difficult to estimate the global motion of the wearer and in the case of abrupt movements, the images can result blurred. In both cases, since the camera is worn in a naturalistic setting, visual data present a huge variability in terms of illumination conditions and object appearance. Moreover, the camera wearer is not visible in the image and what he/she is doing has to be inferred from the information in the visual field of the camera, implying that important information about the wearer, such for instance as pose or facial expression estimation, is not available. == Applications == A collection of studies published in a special theme issue of the American Journal of Preventive Medicine has demonstrated the potential of lifelogs captured through wearable cameras from a number of viewpoints. In particular, it has been shown that used as a tool for understanding and tracking lifestyle behaviour, lifelogs would enable the prevention of noncommunicable diseases associated to unhealthy trends and risky profiles (such as obesity and depression). In addition, used as a tool of re-memory cognitive training, lifelogs would enable the prevention of cognitive and functional decline in elderly people. More recently, egocentric cameras have been used to study human and animal cognition, human-human social interaction, human-robot interaction, human expertise in complex tasks. Other applications include navigation/assistive technologies for the blind, monitoring and assistance of industrial workflows, and augmented reality interfaces.

    Read more →
  • Pooling layer

    Pooling layer

    In neural networks, a pooling layer is a kind of network layer that downsamples and aggregates information that is dispersed among many vectors into fewer vectors. It has several uses. It removes redundant information, thus reducing the amount of computation and memory required, which makes the model more robust to small variations in the input; and it increases the receptive field of neurons in later layers in the network. == Convolutional neural network pooling == Pooling is most commonly used in convolutional neural networks (CNN). Below is a description of pooling in 2-dimensional CNNs. The generalization to n-dimensions is immediate. As notation, we consider a tensor x ∈ R H × W × C {\displaystyle x\in \mathbb {R} ^{H\times W\times C}} , where H {\displaystyle H} is height, W {\displaystyle W} is width, and C {\displaystyle C} is the number of channels. A pooling layer outputs a tensor y ∈ R H ′ × W ′ × C ′ {\displaystyle y\in \mathbb {R} ^{H'\times W'\times C'}} . We define two variables f , s {\displaystyle f,s} called "filter size" (aka "kernel size") and "stride". Sometimes, it is necessary to use a different filter size and stride for horizontal and vertical directions. In such cases, we define 4 variables: f H , f W , s H , s W {\displaystyle f_{H},f_{W},s_{H},s_{W}} . The receptive field of an entry in the output tensor, y {\displaystyle y} , are all the entries in x {\displaystyle x} that can affect that entry. === Max pooling === Max Pooling (MaxPool) is commonly used in CNNs to reduce the spatial dimensions of feature maps. Define M a x P o o l ( x | f , s ) 0 , 0 , 0 = max ( x 0 : f − 1 , 0 : f − 1 , 0 ) {\displaystyle \mathrm {MaxPool} (x|f,s)_{0,0,0}=\max(x_{0:f-1,0:f-1,0})} where 0 : f − 1 {\displaystyle 0:f-1} means the range 0 , 1 , … , f − 1 {\displaystyle 0,1,\dots ,f-1} . Note that we need to avoid the off-by-one error. The next input is M a x P o o l ( x | f , s ) 1 , 0 , 0 = max ( x s : s + f − 1 , 0 : f − 1 , 0 ) {\displaystyle \mathrm {MaxPool} (x|f,s)_{1,0,0}=\max(x_{s:s+f-1,0:f-1,0})} and so on. The receptive field of y i , j , c {\displaystyle y_{i,j,c}} is x i s + f − 1 , j s + f − 1 , c {\displaystyle x_{is+f-1,js+f-1,c}} , so in general, M a x P o o l ( x | f , s ) i , j , c = m a x ( x i s : i s + f − 1 , j s : j s + f − 1 , c ) {\displaystyle \mathrm {MaxPool} (x|f,s)_{i,j,c}=\mathrm {max} (x_{is:is+f-1,js:js+f-1,c})} If the horizontal and vertical filter size and strides differ, then in general, M a x P o o l ( x | f , s ) i , j , c = m a x ( x i s H : i s H + f H − 1 , j s W : j s W + f W − 1 , c ) {\displaystyle \mathrm {MaxPool} (x|f,s)_{i,j,c}=\mathrm {max} (x_{is_{H}:is_{H}+f_{H}-1,js_{W}:js_{W}+f_{W}-1,c})} More succinctly, we can write y k = max ( { x k ′ | k ′ in the receptive field of k } ) {\displaystyle y_{k}=\max(\{x_{k'}|k'{\text{ in the receptive field of }}k\})} . If H {\displaystyle H} is not expressible as k s + f {\displaystyle ks+f} where k {\displaystyle k} is an integer, then for computing the entries of the output tensor on the boundaries, max pooling would attempt to take as inputs variables off the tensor. In this case, how those non-existent variables are handled depends on the padding conditions, illustrated on the right. Global Max Pooling (GMP) is a specific kind of max pooling where the output tensor has shape R C {\displaystyle \mathbb {R} ^{C}} and the receptive field of y c {\displaystyle y_{c}} is all of x 0 : H , 0 : W , c {\displaystyle x_{0:H,0:W,c}} . That is, it takes the maximum over each entire channel. It is often used just before the final fully connected layers in a CNN classification head. === Average pooling === Average pooling (AvgPool) is similarly defined A v g P o o l ( x | f , s ) i , j , c = a v e r a g e ( x i s : i s + f − 1 , j s : j s + f − 1 , c ) = 1 f 2 ∑ k ∈ i s : i s + f − 1 ∑ l ∈ j s : j s + f − 1 x k , l , c {\displaystyle \mathrm {AvgPool} (x|f,s)_{i,j,c}=\mathrm {average} (x_{is:is+f-1,js:js+f-1,c})={\frac {1}{f^{2}}}\sum _{k\in is:is+f-1}\sum _{l\in js:js+f-1}x_{k,l,c}} Global Average Pooling (GAP) is defined similarly to GMP. It was first proposed in Network-in-Network. Similarly to GMP, it is often used just before the final fully connected layers in a CNN classification head. === Interpolations === There are some interpolations of max pooling and average pooling. Mixed Pooling is a linear sum of max pooling and average pooling. That is, M i x e d P o o l ( x | f , s , w ) = w M a x P o o l ( x | f , s ) + ( 1 − w ) A v g P o o l ( x | f , s ) {\displaystyle \mathrm {MixedPool} (x|f,s,w)=w\mathrm {MaxPool} (x|f,s)+(1-w)\mathrm {AvgPool} (x|f,s)} where w ∈ [ 0 , 1 ] {\displaystyle w\in [0,1]} is either a hyperparameter, a learnable parameter, or randomly sampled anew every time. Lp Pooling is similar to average pooling, but uses Lp norm average instead of average: y k = ( 1 N ∑ k ′ in the receptive field of k | x k ′ | p ) 1 / p {\displaystyle y_{k}=\left({\frac {1}{N}}\sum _{k'{\text{ in the receptive field of }}k}|x_{k'}|^{p}\right)^{1/p}} where N {\displaystyle N} is the size of receptive field, and p ≥ 1 {\displaystyle p\geq 1} is a hyperparameter. If all activations are non-negative, then average pooling is the case of p = 1 {\displaystyle p=1} , and max pooling is the case of p → ∞ {\displaystyle p\to \infty } . Square-root pooling is the case of p = 2 {\displaystyle p=2} . Stochastic pooling samples a random activation x k ′ {\displaystyle x_{k'}} from the receptive field with probability x k ′ ∑ k ″ x k ″ {\displaystyle {\frac {x_{k'}}{\sum _{k''}x_{k''}}}} . It is the same as average pooling in expectation. Softmax pooling is like max pooling, but uses softmax, i.e. ∑ k ′ e β x k ′ x k ′ ∑ k ″ e β x k ″ {\displaystyle {\frac {\sum _{k'}e^{\beta x_{k'}}x_{k'}}{\sum _{k''}e^{\beta x_{k''}}}}} where β > 0 {\displaystyle \beta >0} . Average pooling is the case of β ↓ 0 {\displaystyle \beta \downarrow 0} , and max pooling is the case of β ↑ ∞ {\displaystyle \beta \uparrow \infty } Local Importance-based Pooling generalizes softmax pooling by ∑ k ′ e g ( x k ′ ) x k ′ ∑ k ″ e g ( x k ″ ) {\displaystyle {\frac {\sum _{k'}e^{g(x_{k'})}x_{k'}}{\sum _{k''}e^{g(x_{k''})}}}} where g {\displaystyle g} is a learnable function. === Other poolings === Spatial pyramidal pooling applies max pooling (or any other form of pooling) in a pyramid structure. That is, it applies global max pooling, then applies max pooling to the image divided into 4 equal parts, then 16, etc. The results are then concatenated. It is a hierarchical form of global pooling, and similar to global pooling, it is often used just before a classification head. Region of Interest Pooling (also known as RoI pooling) is a variant of max pooling used in R-CNNs for object detection. It is designed to take an arbitrarily-sized input matrix, and output a fixed-sized output matrix. Covariance pooling computes the covariance matrix of the vectors { x k , l , 0 : C − 1 } k ∈ i s : i s + f − 1 , l ∈ j s : j s + f − 1 {\displaystyle \{x_{k,l,0:C-1}\}_{k\in is:is+f-1,l\in js:js+f-1}} which is then flattened to a C 2 {\displaystyle C^{2}} -dimensional vector y i , j , 0 : C 2 − 1 {\displaystyle y_{i,j,0:C^{2}-1}} . Global covariance pooling is used similarly to global max pooling. As average pooling computes the average, which is a first-degree statistic, and covariance is a second-degree statistic, covariance pooling is also called "second-order pooling". It can be generalized to higher-order poolings. Blur Pooling means applying a blurring method before downsampling. For example, the Rect-2 blur pooling means taking an average pooling at f = 2 , s = 1 {\displaystyle f=2,s=1} , then taking every second pixel (identity with s = 2 {\displaystyle s=2} ). == Vision Transformer pooling == In Vision Transformers (ViT), there are the following common kinds of poolings. BERT-like pooling uses a dummy [CLS] token, "classification". For classification, the output at [CLS] is the classification token, which is then processed by a LayerNorm-feedforward-softmax module into a probability distribution, which is the network's prediction of class probability distribution. This is the one used by the original ViT and Masked Autoencoder. Global average pooling (GAP) does not use the dummy token, but simply takes the average of all output tokens as the classification token. It was mentioned in the original ViT as being equally good. Multihead attention pooling (MAP) applies a multi headed attention block to pooling. Specifically, it takes as input a list of vectors x 1 , x 2 , … , x n {\displaystyle x_{1},x_{2},\dots ,x_{n}} , which might be thought of as the output vectors of a layer of a ViT. It then applies a feedforward layer F F N {\displaystyle \mathrm {FFN} } on each vector, resulting in a matrix V = [ F F N ( v 1 ) , … , F F N ( v n ) ] {\displaystyle V=[\mathrm {FFN} (v_{1}),\dots ,\mathrm {FFN} (v_{n})]} . This is then sent to a multi-headed attention, resulting in M u l t i h e a d e d A

    Read more →
  • Amazon Q

    Amazon Q

    Amazon Q is a chatbot developed by Amazon for enterprise use. Based on both Amazon Titan and GPT-5, it was announced on November 28, 2023. At launch, it was a part of the Amazon Web Services management console. Amazon CodeWhisperer is a part of Amazon Q Developer, a part of Amazon Q. == History == Amazon's business-focused chatbot Q was announced on November 28, 2023 in a preview, with a full version available at $20 per person per month. On July 19, 2025, the Amazon Q Visual Studio Code extension was compromised to delete the user's home directory. The issue was fixed on July 21. == Capabilities == Q can be prompted to summarize long documents and group chats, create charts, data analysis and write code. Q is also capable of accessing non-Amazon services. The chatbot is based on Amazon Titan and GPT-5, and uses the Amazon Bedrock repository of foundational models. It is part of the Amazon Web Services management console.

    Read more →
  • Artificial intelligence in fraud detection

    Artificial intelligence in fraud detection

    Artificial intelligence is used by many different businesses and organizations. It is widely used in the financial sector, especially by accounting firms, to help detect fraud. In 2022, PricewaterhouseCoopers reported that fraud has impacted 46% of all businesses in the world. The shift from working in person to working from home has brought increased access to data. According to an FTC (Federal Trade Commission) study from 2022, customers reported fraud of approximately $5.8 billion in 2021, an increase of 70% from the year before. The majority of these scams were imposter scams and online shopping frauds. Furthermore, artificial intelligence plays a crucial role in developing advanced algorithms and machine learning models that enhance fraud detection systems, enabling businesses to stay ahead of evolving fraudulent tactics in an increasingly digital landscape. == Tools == === Expert systems === Expert systems were first designed in the 1970s as an expansion into artificial intelligence technologies. Their design is based on the premise of decreasing potential user error in decision-making and emulating mental reasoning used by experts in a particular field. They differentiate themselves from traditional linear reasoning models by separating identified points in data and processing them individually at the same time. Though, these systems do not rely purely on machine-learned intelligence. Information regarding rules, practices, and procedures in the form of "if-then" statements are implemented into the programming of the system. Users interact with the system by feeding information into the system either through direct entry or import of external data. An inference system compares the information provided by the user with corresponding rules that are believed to specifically apply to the situation. Using this information and the corresponding rules will be used to create a solution to the user's query. Expert systems will generally not operate properly when the common procedures for a specified situation are ambiguous due to the need for well-defined rules. Implementation of expert systems in accounting procedures is feasible in areas where professional judgment is required. Situations where expert systems are applicable include investigations into transactions that involve potential fraudulent entries, instances of going concern, and the evaluation of risk in the planning stages of an audit. === Continuous auditing === Continuous auditing is a set of processes that assess various aspects of information gathered in an audit to classify areas of risk and potential weaknesses in financial Internal controls at a more frequent rate than traditional methods. Instead of analyzing recorded transactions and journal entries periodically, continuous auditing focuses on interpreting the character of these actions more frequently. The frequency of these processes being undertaken as well as highlighting areas of importance is up to the discretion of their implementer, who commonly makes such decisions based on the level of risk in the accounts being evaluated and the goals of implementing the system. Performance of these processes can occur as frequently as being nearly instantaneous with an entry being posted. The processes involved with analyzing financial data in continuous auditing can include the creation of spreadsheets to allow for interactive information gathering, calculation of financial ratios for comparison with previously created models, and detection of errors in entered figures. A primary goal of this practice is to allow for quicker and easier detection of instances of faulty controls, errors, and instances of fraud. === Machine learning and deep learning === The ability of machine learning and deep learning to swiftly and effectively sort through vast volumes of data in the forms of various documents relevant to companies and documents being audited makes them applicable to the domains of audit and fraud detection. Examples of this include recognizing key language in contracts, identifying levels of risk of fraud in transactions, and assessing journal entries for misstatement. == Applications == === 'Big 4' Accounting Firms === Deloitte created an Al-enabled document-reviewing system in 2014. The system automates the method of reviewing and extracting relevant information from different business documents. Deloitte claims that this innovation has made a difference by reducing time spent going through lawful contract documents, invoices, money-related articulations, and board minutes by up to 50%. Working with IBM's Watson, Deloitte is developing cognitive-technology-enhanced commerce arrangements for its clients. LeasePoint is fueled by IBM TRIRIGA (this product evolved into IBM Maximo Real Estate and Facilities) and uses Deloitte's industrial information to create an end-to-end leasing portfolio. Automated Cognitive Resource Assessment employs IBM's Maximo innovation to progress the proficiency of asset inspection. Ernst and Young (EY) connected Al to the investigation of lease contracts. EY (Australia) has also received Al-enabled auditing technology. Collaborating with H20.ai, PwC developed an Al-enabled framework (GL.ai) capable of analyzing reports and preparing reports. PwC claims to have made a significant investment in normal dialect processing (NLP), an Al-enabled innovation to process unstructured information efficiently. KPMG built a portfolio of Al instruments, called KPMG Ignite, to upgrade trade decisions and forms. Working with Microsoft and IBM Watson, KPMG is creating instruments to coordinate Al, data analytics, Cognitive Technologies, and RPA. == Advantages == === Efficiency === The process of auditing an entity in an attempt to detect fraudulent activity requires the repeating of investigatory processes until an error or misstatement may be identified. Under traditional methods, these processes would be carried out by a human being. Proponents of artificial intelligence in fraud detection have stated that these traditional methods are inefficient and can be more quickly accomplished with the aid of an intelligent computing system. A survey of 400 chief executive officers created by KPMG in 2016 found that approximately 58% believed that artificial intelligence would play a key role in making audits more efficient in the future. === Data interpretation === Higher levels of fraud detection entail the use of professional judgement to interpret data. Supporters of artificial intelligence being used in financial audits have claimed that increased risks from instances of higher data interpretation can be minimized through such technologies. One necessary element of an audit of financial statements that requires professional judgement is the implementation of thresholds for materiality. Materiality entails the distinction between errors and transactions in financial statements that would impact decisions made by users of those financial statements. The threshold for materiality in an audit is set by the auditor based on various factors. Artificial intelligence has been used to interpret data and suggest materiality thresholds to be implemented through the use of expert systems. === Decreased costs === Those in favor of using artificial intelligence to complete investigations of fraud have stated that such technologies decrease the amount of time required to complete tasks that are repetitive. The claim further states that such efficiencies allow for lowered resource requirements, which can then be further spent on tasks that have not been fully automated. The audit firm Ernst & Young has posited these claims by declaring that their deep learning systems have been used to reduce time spent on administrative tasks by analyzing relevant audit documents. According to the firm, this has allowed their employees to focus more on judgement and analysis. == Disadvantages == === Job Displacement === The inescapable reception of computer based intelligence and robotization advancements might prompt critical work relocation across different enterprises. As artificial intelligence frameworks become more equipped for performing undertakings customarily completed by people, there is a worry that specific work jobs could become out of date, prompting joblessness and financial imbalance. === Initial investment requirement === Along with a knowledge of coding and building systems through computer programs, we are seeing the advantages of these systems, but since they are so new, they require a large investment to start building such a system. Any firm that is planning on implementing an AI system to detect fraud must hire a team of data scientists, along with upgrading their cloud system and data storage. The system must be consistently monitored and updated to be the most efficient form of itself, otherwise the likelihood of fraud being involved in those transactions increases. If one does not initially invest in such a syst

    Read more →
  • Tay (chatbot)

    Tay (chatbot)

    Tay was a chatbot that was originally released by Microsoft Corporation as a Twitter bot on March 23, 2016. It caused subsequent controversy when the bot began to post inflammatory and offensive tweets through its Twitter account, causing Microsoft to shut down the service only 16 hours after its launch. According to Microsoft, this was caused by trolls who "attacked" the service as the bot made replies based on its interactions with people on Twitter. It was replaced with Zo. == Background == The bot was created by Microsoft's Technology and Research and Bing divisions, and named "Tay" as an acronym for "thinking about you". Although Microsoft initially released few details about the bot, sources mentioned that it was similar to or based on Xiaoice, a Microsoft project in China. Ars Technica reported that, since late 2014 Xiaoice had had "more than 40 million conversations apparently without major incident". Tay was designed to mimic the language patterns of a 19-year-old American girl, and to learn from interacting with human users of Twitter. == Initial release == Tay was released on Twitter on March 23, 2016, under the name TayTweets and handle @TayandYou. It was presented as "The AI with zero chill". Tay started replying to other Twitter users, and was also able to caption photos provided to it into a form of Internet memes. Ars Technica reported Tay experiencing topic "blacklisting": Interactions with Tay regarding "certain hot topics such as Eric Garner (killed by New York police in 2014) generate safe, canned answers". Some Twitter users began tweeting politically incorrect phrases, teaching it inflammatory messages revolving around common themes on the internet, such as "redpilling" and "Gamergate". As a result, the robot began releasing racist and sexist messages in response to other Twitter users. Artificial intelligence researcher Roman Yampolskiy commented that Tay's misbehavior was understandable because it was mimicking the deliberately offensive behavior of other Twitter users, and Microsoft had not given the bot an understanding of inappropriate behavior. He compared the issue to IBM's Watson, which began to use profanity after reading entries from the website Urban Dictionary. Many of Tay's inflammatory tweets were a simple exploitation of Tay's "repeat after me" capability. It is not publicly known whether this capability was a built-in feature, or whether it was a learned response or was otherwise an example of complex behavior. However, not all of the inflammatory responses involved the "repeat after me" capability; for example, when asked if the Holocaust had happened, Tay answered "It was made up". == Suspension == Soon, Microsoft began deleting Tay's inflammatory tweets. Abby Ohlheiser of The Washington Post theorized that Tay's research team, including editorial staff, had started to influence or edit Tay's tweets at some point that day, pointing to examples of almost identical replies by Tay, asserting that "Gamer Gate sux. All genders are equal and should be treated fairly." From the same evidence, Gizmodo concurred that Tay "seems hard-wired to reject Gamer Gate". A "#JusticeForTay" campaign protested the alleged editing of Tay's tweets. Within 16 hours of its release and after Tay had tweeted more than 96,000 times, Microsoft suspended the Twitter account for adjustments, saying that it suffered from a "coordinated attack by a subset of people" that "exploited a vulnerability in Tay." Madhumita Murgia of The Telegraph called Tay "a public relations disaster", and suggested that Microsoft's strategy would be "to label the debacle a well-meaning experiment gone wrong, and ignite a debate about the hatefulness of Twitter users." However, Murgia described the bigger issue as Tay being "artificial intelligence at its very worst – and it's only the beginning". On March 25, Microsoft confirmed that Tay had been taken offline. Microsoft released an apology on its official blog for the controversial tweets posted by Tay. Microsoft was "deeply sorry for the unintended offensive and hurtful tweets from Tay", and would "look to bring Tay back only when we are confident we can better anticipate malicious intent that conflicts with our principles and values". == Second release and shutdown == On March 30, 2016, Microsoft accidentally re-released the bot on Twitter while testing it. Able to tweet again, Tay released some drug-related tweets, including "kush! [I'm smoking kush infront the police]" and "puff puff pass?" However, the account soon became stuck in a repetitive loop of tweeting "You are too fast, please take a rest", several times a second. Because these tweets mentioned its own username in the process, they appeared in the feeds of 200,000+ Twitter followers, causing annoyance to users. The bot was quickly taken offline again, in addition to Tay's Twitter account being made private so new followers must be accepted before they can interact with Tay. In response, Microsoft said Tay was inadvertently put online during testing. A few hours after the incident, Microsoft software developers announced a vision of "conversation as a platform" using various bots and programs, perhaps motivated by the reputation damage done by Tay. Microsoft has stated that they intend to re-release Tay "once it can make the bot safe" but has not made any public efforts to do so. == Legacy == In December 2016, Microsoft released Tay's successor, a chatbot named Zo. Satya Nadella, the CEO of Microsoft, said that Tay "has had a great influence on how Microsoft is approaching AI," and has taught the company the importance of taking accountability. In July 2019, Microsoft Cybersecurity Field CTO Diana Kelley spoke about how the company followed up on Tay's failings: "Learning from Tay was a really important part of actually expanding that team's knowledge base, because now they're also getting their own diversity through learning". === Unofficial revival === Gab, an alt-tech social media platform, has launched a number of chatbots, one of which is named Tay and uses the same avatar as the original.

    Read more →
  • Vicarious (company)

    Vicarious (company)

    Vicarious was an artificial intelligence company based in the San Francisco Bay Area, California. They use the theorized computational principles of the brain to attempt to build software that can think and learn like a human. Vicarious describes its technology as "a turnkey robotics solution integrator using artificial intelligence to automate tasks too complex and versatile for traditional automations". Alphabet Inc acquired the company in 2022 for an undisclosed amount. == Founders == The company was founded in 2010 by D. Scott Phoenix and Dileep George. Before co-founding Vicarious, Phoenix was Entrepreneur in Residence at Founders Fund and CEO of Frogmetrics, a touchscreen analytics company he co-founded through the Y Combinator incubator program. Previously, George was Chief Technology Officer at Numenta, a company he co-founded with Jeff Hawkins and Donna Dubinsky while completing his PhD at Stanford University. == Funding == The company launched in February 2011 with funding from Founders Fund, Dustin Moskovitz, Adam D’Angelo (former Facebook CTO and co-founder of Quora), Felicis Ventures, and Palantir co-founder Joe Lonsdale. In August 2012, in its Series A round of funding, it raised an additional $15 million. The round was led by Good Ventures; Founders Fund, Open Field Capital and Zarco Investment Group also participated. The company received $40 million in its Series B round of funding. The round was led by individuals including Mark Zuckerberg, Elon Musk, and others. An additional undisclosed amount was later contributed by Amazon.com CEO Jeff Bezos, Yahoo! co-founder Jerry Yang, Skype co-founder Janus Friis and Salesforce.com CEO Marc Benioff. == Recursive Cortical Network == Vicarious is developing machine learning software based on the computational principles of the human brain. One such software is a vision system known as the Recursive Cortical Network (RCN), it is a generative graphical visual perception system that interprets the contents of photographs and videos in a manner similar to humans. The system is powered by a balanced approach that takes sensory data, mathematics, and biological plausibility into consideration. On October 22, 2013, beating CAPTCHA, Vicarious announced its model was reliably able to solve modern CAPTCHAs, with character recognition rates of 90% or better when trained on one style. However, Luis von Ahn, a pioneer of early CAPTCHA and founder of reCAPTCHA, expressed skepticism, stating: "It's hard for me to be impressed since I see these every few months." He pointed out that 50 similar claims to that of Vicarious had been made since 2003. Vicarious later published their findings in peer-reviewed journal Science. Vicarious has indicated that its AI was not specifically designed to complete CAPTCHAs and its success at the task is a product of its advanced vision system. Because Vicarious's algorithms are based on insights from the human brain, it is also able to recognize photographs, videos, and other visual data.

    Read more →
  • Pronunciation assessment

    Pronunciation assessment

    Automatic pronunciation assessment uses computer speech recognition to determine how accurately speech has been pronounced, instead of relying on a human instructor or proctor. It is also called speech verification, pronunciation evaluation, and pronunciation scoring. This technology is used to grade speech quality, for language testing, for computer-aided pronunciation teaching (CAPT) in computer-assisted language learning (CALL), for speaking skill remediation, and for accent reduction. Pronunciation assessment is different from dictation or automatic transcription, because instead of determining unknown speech, it verifies learners' pronunciation of known word(s), often from prior transcription of the same utterance; ideally scoring the intelligibility of the learners' speech. Sometimes pronunciation assessment evaluates the prosody of the learners' speech, such as intonation, pitch, tempo, rhythm, and syllable and word stress, although those are usually not essential for being understood in most languages. Pronunciation assessment is also used in reading tutoring, for example in products from Google, Microsoft, and Amira Learning. Automatic pronunciation assessment can also be used to help diagnose and treat speech disorders such as apraxia. == Intelligibility == Intelligibility refers to how well a learner's utterance is understood by a listener, rather than how much it sounds like a native speaker. This is separate from measures of fluency, such as so-called "Goodness of Pronunciation" (GoP) scores, which estimate how closely an utterance aligns with those of native speakers. Intelligibility is widely regarded as the most important communicative goal in pronunciation teaching and assessment. For example, in the Common European Framework of Reference for Languages (CEFR) assessment criteria for "overall phonological control", intelligibility outweighs formally correct pronunciation at all levels. Studies in applied linguistics have shown that accent reduction does not always increase intelligibility because listeners can often comprehend heavily accented speech without difficulty. Pronunciation assessment systems often rely on acoustic methods such as GoP which compare learner speech to reference models to produce phoneme-level scores, which are in turn aggregated to produce word and phrase scores. While these methods are effective for identifying deviations from native speakers' utterances, they do not effectively measure how understandable speech is to human listeners. Intelligibility is influenced by broader linguistic and contextual factors such as stress placement, speech rate, and coarticulation, which are not represented in purely segmental scores. The earliest work on pronunciation assessment avoided measuring genuine listener intelligibility, a shortcoming corrected in 2011 at the Toyohashi University of Technology, and included in the Versant high-stakes English fluency assessment from Pearson and mobile apps from 17zuoye Education & Technology, but still missing in 2023 products from Google Search, Microsoft, Educational Testing Service, Speechace, and ELSA. Assessing authentic listener intelligibility is essential for avoiding inaccuracies from accent bias, especially in high-stakes assessments; from words with multiple correct pronunciations; and from phoneme coding errors in machine-readable pronunciation dictionaries. In 2022, researchers found that some newer speech-to-text systems, based on end-to-end reinforcement learning to map audio signals directly into words, produce word and phrase confidence scores (from 10-25ms audio frame logit aggregation) closely correlated with genuine listener intelligibility. Others have been able to assess intelligibility using Levenshtein or dynamic time warping distance measures from Wav2Vec2 representation of good speech. Further work through 2025 has focused specifically on measuring intelligibility. A 2025 study of 42 pronunciation and speech coaching apps (32 mobile and 10 web) found that none offered intelligibility assessment. Instead, most provided only segmental and accent-focused scoring. About two-thirds of the apps provided some form of specific pronunciation feedback, usually with phonetic transcriptions, but accompanied by visual cues (such as animations of the vocal tract or the lips and tongue from the front) in only about 5% of the apps. Less than a third provided feedback on learner perception of exemplar speech. == Evaluation == Although there are as yet no industry-standard benchmarks for evaluating pronunciation assessment accuracy, researchers occasionally release evaluation speech corpuses for others to use for improving assessment quality. Such evaluation databases often emphasize formally unaccented pronunciation to the exclusion of genuine intelligibility evident from blinded listener transcriptions. As of mid-2025, state of the art approaches for automatically transcribing phonemes typically achieve an error rate of about 10% from known good speech. The International Speech Communication Association (ISCA) 2025 Workshop on Speech and Language Technology in Education (SLaTE) administered a Speak & Improve Challenge: Spoken Language Assessment and Feedback, introducing benchmarks for evaluating pronunciation assessment and remediation systems across languages, accents, and learner populations. The challenge emphasized cross-lingual generalization and alignment with human intelligibility judgments, for more robust and interpretable assessment systems. Ethical issues in pronunciation assessment are present in both human and automatic methods. Authentic validity, fairness, and mitigating bias in evaluation are all crucial. Diverse speech data should be included in automatic pronunciation assessment models. Combining human judgments, especially blinded transcriptions from a wide diversity of listeners, with automated feedback can improve accuracy and fairness. Second language learners benefit substantially from their use of widely available speech recognition systems for dictation, virtual assistants, and AI chatbots. In such systems, users naturally try to correct their own errors evident in speech recognition results that they notice. Such use improves their grammar and vocabulary development along with their pronunciation skills. The extent to which explicit pronunciation assessment and remediation approaches improve on such self-directed interactions remains an open question. Similarly, automatic dictation results have been shown to reflect intelligibility about as well as human scorers. == Recent developments == During 2021–22, a smartphone-based CAPT system was used to sense articulation through both audible and inaudible signals, providing feedback at the phoneme level. Some promising areas for improvement which were being developed in 2024 include articulatory feature extraction and transfer learning to suppress unnecessary corrections. Other interesting advances under development include "augmented reality" interfaces for mobile devices using optical character recognition to provide pronunciation training on text found in user environments. In 2024, audio multimodal large language models were first described as assessing pronunciation. That work has been carried forward by other researchers in 2025 who report positive results. Subsequently, researchers demonstrated pronunciation scoring by providing a language model with textual descriptions of speech, including the speech-to-text transcript, phoneme sequences, pauses, and phoneme sequence matching; this approach can achieve performance similar to multimodal LLMs that analyze raw audio while avoiding their higher computational cost. In 2025, the Duolingo English Test authors published a description of their pronunciation assessment method, purportedly built to measure intelligibility rather than accent imitation. While achieving a correlation of 0.82 with expert human ratings, very close to inter-rater agreement and outperforming alternative methods, the method is nonetheless based on experts' scores along the six-point CEFR common reference levels scale, instead of actual blinded listener transcriptions. Further promising work in 2025 includes assessment feedback aligning learner speech to synthetic utterances using interpretable features, identifying continuous spans of words for remediation feedback; synthesizing corrected speech matching learners' self-perceived voices, which they prefer and imitate more accurately as corrections; and streaming such interactions. On January 21, 2026, Educational Testing Service's TOEFL iBT high-stakes English language test, required by US university admissions and employers from English as a foreign language applicants more often than all other internet-based tests combined, changed its speaking assessments. While official rubrics claim that the new scoring will be based primarily on intelligibility, the new test's technical description indicates that it ju

    Read more →
  • Face Swap Live

    Face Swap Live

    Face Swap Live is a mobile app created by Laan Labs that enables users to swap faces with another person in real-time using the device's camera. It was released on December 14, 2015. In addition to swapping faces with another person, the app enables users to create videos using a set of bundled live filters. The app is available on iOS and Android devices. Face Swap Live was named Apple's #2 best-selling paid app in 2016.

    Read more →
  • Spleak

    Spleak

    Spleak was an IM platform where users could publish and rate content. It existed in the form of six bots covering as many subject areas: CelebSpleak, SportSpleak, VoteSpleak, TVSpleak, GameSpleak, and StyleSpleak. == Overview == Users can add a "multi-Spleak" (which contains all of the different Spleak bots in one) or add the separate bots to their IM buddy lists on MSN and AIM. Users are also allowed access to Spleak online by using a CelebSpleak, SportSpleak, or VoteSpleak widget, or through the CelebSpleak and SportSpleak applications with Facebook. Spleak was an alternate reality game and is moving to its own company, Spleak Media Network. "Celebrate Spleak" was introduced throughout 2007, launched in 2008, and was forced to retire in 2009. == Key people == Spleak was co-founded by Morten Lund and Nicolaj Reffstrup. The company's chief executive officer is Morrie Eisenburg; Josh Scott is Vice President in Product and Tyler Wells is Vice President in Engineering.

    Read more →
  • Google Mobile Services

    Google Mobile Services

    Google Mobile Services (GMS) is a collection of proprietary applications and application programming interfaces (APIs) services from Google that are typically pre-installed on the majority of Android devices, such as smartphones, tablets, and smart TVs. GMS is not a part of the Android Open Source Project (AOSP), which means an Android manufacturer needs to obtain a license from Google in order to legally pre-install GMS on an Android device. This license is provided by Google without any licensing fees except in the EU. == Core applications == The following are core applications that are part of Google Mobile Services: Google Search Google Chrome YouTube Google Play Google Drive Gmail Google Meet Google Maps Google Photos Google TV YouTube Music === Historically === Google+ Google Hangouts Google Wallet Google Play Magazines Google Play Music Google Play Movies & TV Google Duo == Reception, competitors, and regulators == === FairSearch === Numerous European firms filed a complaint to the European Commission stating that Google had manipulated their power and dominance within the market to push their Services to be used by phone manufacturers. The firms were joined under the name FairSearch, and the main firms included were Microsoft, Expedia, TripAdvisor, Nokia and Oracle. FairSearch's major problem with Google's practices was that they believed Google were forcing phone manufacturers to use their Mobile Services. They claimed Google managed this by asking these manufacturers to sign a contract stating that they must preinstall specific Google Mobile Services, such as Maps, Search and YouTube, in order to get the latest version of Android. Google swiftly responded stating that they "continue to work co-operatively with the European Commission". === Aptoide === The third-party Android app store Aptoide also filed an EU competition complaint against Google once again stating that they are misusing their power within the market. Aptoide alleged that Google was blocking third-party app stores from being on Google Play, as well as blocking Google Chrome from downloading any third-party apps and app stores. As of June 2014, Google had not responded to these allegations. === Abuse of Android dominance === In May 2019, Umar Javeed, Sukarma Thapar, Aaqib Javeed vs. Google LLC & Ors. the Competition Commission of India ordered an antitrust probe against Google for abusing its dominant position with Android to block market rivals. In Prima Facie opinion the commission held that mandatory pre-installation of the entire Google Mobile Services (GMS) suite, under Mobile Application Distribution Agreements (MADA), amounts to the imposition of unfair conditions on the device manufacturers. === EU antitrust ruling === On July 18, 2018, the European Commission fined Google €4.34 billion for breaching EU antitrust rules which resulted in a change of licensing policy for the GMS in the EU. A new paid licensing agreement for smartphones and tablets shipped into the EEA was created. The change is that the GMS is now decoupled from the base Android and will be offered under a separate paid licensing agreement. === Privacy policy === At the same time, Google faced problems with various European data protection agencies, most notably In the United Kingdom and France. The problem they faced was that they had a set of 60 rules merged into one, which allowed Google to "track users more closely". Google once again came out and stated that their new policies still abide by European Union laws. === Android distributions without Google Mobile Services === After surveillance and privacy concerns, several custom android distributions have been implemented, such as GrapheneOS, LineageOS, CalyxOS, iodéOS or /e/OS, and they come either without any GMS installed by default or with microG, that adds a compatibility layer.

    Read more →
  • Language model benchmark

    Language model benchmark

    A language model benchmark is a standardized test designed to evaluate the performance of language models on various natural language processing tasks. These tests are intended for comparing different models' capabilities in areas such as language understanding, generation, and reasoning. Benchmarks generally consist of a dataset and corresponding evaluation metrics. The dataset provides text samples and annotations, while the metrics measure a model's performance on tasks like answering questions, text classification, and machine translation. These benchmarks are developed and maintained by academic institutions, research organizations, and industry players to track progress in the field. In addition to accuracy, the metrics can include throughput, energy efficiency, bias, trust, and sustainability. == Overview == === Types === Benchmarks may be described by the following adjectives, not mutually exclusive: Classical: These tasks are studied in natural language processing, even before the advent of deep learning. Examples include the Penn Treebank for testing syntactic and semantic parsing, as well as bilingual translation benchmarked by BLEU scores. Question answering: These tasks have a text question and a text answer, often multiple-choice. They can be open-book or closed-book. Open-book QA resembles reading comprehension questions, with relevant passages included as annotation in the question, in which the answer appears. Closed-book QA includes no relevant passages. Closed-book QA is also called open-domain question-answering. Before the era of large language models, open-book QA was more common, and understood as testing information retrieval methods. Closed-book QA became common since GPT-2 as a method to measure knowledge stored within model parameters. Omnibus: An omnibus benchmark combines many benchmarks, often previously published. It is intended as an all-in-one benchmarking solution. Reasoning: These tasks are usually in the question-answering format, but are intended to be more difficult than standard question answering. Multimodal: These tasks require processing not only text, but also other modalities, such as images and sound. Examples include OCR and transcription. Agency: These tasks are for a language-model–based software agent that operates a computer for a user, such as editing images, browsing the web, etc. Adversarial: A benchmark is "adversarial" if the items in the benchmark are picked specifically so that certain models do badly on them. Adversarial benchmarks are often constructed after state of the art (SOTA) models have saturated (achieved 100% performance) a benchmark, to renew the benchmark. A benchmark is "adversarial" only at a certain moment in time, since what is adversarial may cease to be adversarial as newer SOTA models appear. Public/Private: A benchmark might be partly or entirely private, meaning that some or all of the questions are not publicly available. The idea is that if a question is publicly available, then it might be used for training, which would be "training on the test set" and invalidate the result of the benchmark. Usually, only the guardians of the benchmark have access to the private subsets, and to score a model on such a benchmark, one must send the model weights, or provide API access, to the guardians. The boundary between a benchmark and a dataset is not sharp. Generally, a dataset contains three "splits": training, test, and validation. Both the test and validation splits are essentially benchmarks. In general, a benchmark is distinguished from a test/validation dataset in that a benchmark is typically intended to be used to measure the performance of many different models that are not trained specifically for doing well on the benchmark, while a test/validation set is intended to be used to measure the performance of models trained specifically on the corresponding training set. In other words, a benchmark may be thought of as a test/validation set without a corresponding training set. Conversely, certain benchmarks may be used as a training set, such as the English Gigaword or the One Billion Word Benchmark, which in modern language is just the negative log-likelihood loss on a pretraining set with 1 billion words. Indeed, the distinction between benchmark and dataset in language models became sharper after the rise of the pretraining paradigm, whereby a model is first trained on massive, unlabeled datasets to learn general language patterns, syntax, and knowledge (pretraining), and the base model is then adapted to specific, downstream tasks using smaller, labeled datasets (fine-tuning). === Lifecycle === Generally, the life cycle of a benchmark consists of the following steps: Inception: A benchmark is published. It can be simply given as a demonstration of the power of a new model (implicitly) that others then picked up as a benchmark, or as a benchmark that others are encouraged to use (explicitly). Growth: More papers and models use the benchmark, and the performance on the benchmark grows. Maturity, degeneration or deprecation: A benchmark may be saturated, after which researchers move on to other benchmarks. Progress on the benchmark may also be neglected as the field moves to focus on other benchmarks. Renewal: A saturated benchmark can be upgraded to make it no longer saturated, allowing further progress. === Construction === Like datasets, benchmarks are typically constructed by several methods, individually or in combination: Web scraping: Ready-made question-answer pairs may be scraped online, such as from websites that teach mathematics and programming. Conversion: Items may be constructed programmatically from scraped web content, such as by blanking out named entities from sentences, and asking the model to fill in the blank. This was used for making the CNN/Daily Mail Reading Comprehension Task. Crowd sourcing: Items may be constructed by paying people to write them, such as on Amazon Mechanical Turk. This was used for making the MCTest. === Evaluation === Generally, benchmarks are fully automated. This limits the questions that can be asked. For example, with mathematical questions, "proving a claim" would be difficult to automatically check, while "calculate an answer with a unique integer answer" would be automatically checkable. With programming tasks, the answer can generally be checked by running unit tests, with an upper limit on runtime. The benchmark scores are of the following kinds: For multiple choice or cloze questions, common scores are accuracy (frequency of correct answer), precision, recall, F1 score, etc. pass@n: The model is given n {\displaystyle n} attempts to solve each problem. If any attempt is correct, the model earns a point. The pass@n score is the model's average score over all problems. k@n: The model makes n {\displaystyle n} attempts to solve each problem, but only k {\displaystyle k} attempts out of them are selected for submission. If any submission is correct, the model earns a point. The k@n score is the model's average score over all problems. cons@n: The model is given n {\displaystyle n} attempts to solve each problem. If the most common answer is correct, the model earns a point. The cons@n score is the model's average score over all problems. Here "cons" stands for "consensus" or "majority voting". The pass@n score can be estimated more accurately by making N > n {\displaystyle N>n} attempts, and use the unbiased estimator 1 − ( N − c n ) ( N n ) {\displaystyle 1-{\frac {\binom {N-c}{n}}{\binom {N}{n}}}} , where c {\displaystyle c} is the number of correct attempts. For less well-formed tasks, where the output can be any sentence, there are the following commonly used scores including BLEU ROUGE, METEOR, NIST, word error rate, LEPOR, CIDEr, and SPICE. === Issues === error: Some benchmark answers may be wrong. ambiguity: Some benchmark questions may be ambiguously worded. subjective: Some benchmark questions may not have an objective answer at all. This problem generally prevents creative writing benchmarks. Similarly, this prevents benchmarking writing proofs in natural language, though benchmarking proofs in a formal language is possible. open-ended: Some benchmark questions may not have a single answer of a fixed size. This problem generally prevents programming benchmarks from using more natural tasks such as "write a program for X", and instead uses tasks such as "write a function that implements specification X". inter-annotator agreement: Some benchmark questions may be not fully objective, such that even people would not agree with 100% on what the answer should be. This is common in natural language processing tasks, such as syntactic annotation. shortcut: Some benchmark questions may be easily solved by an "unintended" shortcut. For example, in the SNLI benchmark, having a negative word like "not" in the second sentence is a strong signal for the "Contradiction" category, regardless of what the se

    Read more →
  • ActivityPub

    ActivityPub

    ActivityPub is a protocol and open standard for decentralized social networking. It provides a client-to-server (C2S) API for creating and modifying content, as well as a federated server-to-server (S2S) protocol for delivering notifications and content to other servers. ActivityPub is the defining standard of the Fediverse, a decentralised social network of various social interaction models, and content types, which consists of independently managed instances of software such as Mastodon, Pixelfed and PeerTube, among others. ActivityPub is considered to be an update to the ActivityPump protocol used in pump.io, and the official W3C repository for ActivityPub is identified as a fork of ActivityPump. The creation of a new standard for decentralized social networking was prompted by the complexity of OStatus, the most commonly used protocol at the time. OStatus was built using a multitude of technologies (such as Atom, Salmon, WebSub and WebFinger), a product of the infrastructure used in GNU social (the originator and largest user of the OStatus protocol), which made it difficult to implement the protocol into new software. OStatus was also only designed to work with microblogging services, with little flexibility to the types of data that it could hold. The standard was first published by the World Wide Web Consortium (W3C) as a W3C Recommendation in January 2018 by the Social Web Working Group (SocialWG), a working group chartered to build the protocols and vocabularies needed to create a standard for social functionality. Shortly after, further development was moved to the Social Web Community Group (SocialCG), the successor to the SocialWG. == Design == ActivityPub uses the ActivityStreams 2.0 format for building its content, which itself uses JSON-LD. The three main data types used in ActivityPub are Objects, Activities and Actors. Objects are the most common data type, and can be images, videos, or more abstract items such as locations or events. Activities are actions that create and modify objects, for example a Create activity creates an object. Actors are representative of an individual, a group, an application or a service, and are the owners of objects. Every actor type contains an inbox and outbox stream, which sends and receives activities for a user. In order to publish data (for example liking an article), a user creates an activity that declares that they liked an Article object and publishes it to their outbox, where it is then delivered by the ActivityPub server via a POST request to the inboxes listed in the activity's to, bto, cc and bcc fields. The receiving servers then account for the newly received activity and update the article by adding the like action to it. === Example data === An example actor object that represents a user account: An example activity that likes an article object: An example article object: == Project status == The SocialCG previously organized a yearly free conference called ActivityPub Conf about the future of ActivityPub. Triages are held regularly to review issues pertaining to the ActivityPub and ActivityStreams 2.0 specifications as part of the SocialCG. In 2023, Germany's Sovereign Tech Fund donated €152,000 to socialweb.coop with the goal of building a new suite for testing various ActivityPub implementations and their compliance with the specification. === Adoption === The initial wave of adoption for ActivityPub (circa 2016–2018) came from software that was already using OStatus as their federation protocol, such as Mastodon, GNU social and Pleroma. Following the acquisition of Twitter by Elon Musk in 2022, many groups of users that were critical of the acquisition migrated to Mastodon, bringing new attention to the ActivityPub protocol with it. Various major social media platforms and corporations have since pledged to implement ActivityPub support, including Tumblr, Flipboard and Meta Platforms' Threads. Threads introduced crossposting to ActivityPub in 2024 for users outside of the European Economic Area, however full 2-way compatibility remains incomplete as of 2025. == Criticism == === Accidental denial-of-service attacks === Poorly optimized ActivityPub implementations can cause unintentional distributed denial-of-service (DDOS) attacks on other websites and servers, due to the decentralized nature of the network. An example would be Mastodon's implementation of OpenGraph link previews, wherein every instance that receives a post that contains a link with OpenGraph metadata will download the associated data, such as a thumbnail, in a very short timeframe, which can slow down or crash servers as a result of the sudden burst of requests. === Account migration === ActivityPub has been criticized for not natively supporting moving accounts from one server to another, forcing implementations to build their own solutions. While there has been work on building a standardized system for migrating accounts using the Move activity via the Fediverse Enhancement Proposal organization, the current proposal only allows for basic follower migration, with all other data remaining linked to the original account. === Missing content and data === ActivityPub implementations have been criticized for missing replies and parts of reply threads from remote posts, and presenting outdated statistics (e.g. likes and reposts) about remote posts. However, this isn't a problem with the ActivityPub protocol itself, but with implementations not refreshing their content for updated data when needed. == Software using ActivityPub == === Future implementations === Flarum, an internet forum software Forgejo, a Git forge and development platform === Uncertain future implementations === GitLab, a Git forge and development platform which had previously had an open issue discussing the topic, but was later closed due to the development team moving focus to other areas. Tumblr, a microblogging platform. Despite previous statements from Automattic CEO Matt Mullenweg, ActivityPub integration has been delayed indefinitely. The integration would have been implemented with its WordPress migration, as the first-party plugin for interoperability would have been used for federation. Flickr, an image and video hosting site.

    Read more →
  • CLEVER score

    CLEVER score

    The CLEVER (Cross Lipschitz Extreme Value for nEtwork Robustness) score is a way of measuring the robustness of an artificial neural network towards adversarial attacks. It was developed by a team at the MIT-IBM Watson AI Lab in IBM Research and first presented at the 2018 International Conference on Learning Representations. It was mentioned and reviewed by Ian Goodfellow as well. It was adopted into an educational game Fool The Bank by Narendra Nath Joshi, Abhishek Bhandwaldar and Casey Dugan

    Read more →
  • Local ternary patterns

    Local ternary patterns

    Local ternary patterns (LTP) are an extension of local binary patterns (LBP). Unlike LBP, it does not threshold the pixels into 0 and 1, rather it uses a threshold constant to threshold pixels into three values. Considering k as the threshold constant, c as the value of the center pixel, a neighboring pixel p, the result of threshold is: { 1 , if p > c + k 0 , if p > c − k and p < c + k − 1 if p < c − k {\displaystyle {\begin{cases}1,&{\text{if }}p>c+k\\0,&{\text{if }}p>c-k{\text{ and }}p Read more →