AI For Students Good Or Bad

AI For Students Good Or Bad — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Weibo

    Weibo

    Weibo (Chinese: 微博; pinyin: Wēibó), or Sina Weibo (Chinese: 新浪微博; pinyin: Xīnlàng Wēibó), is a Chinese microblogging (weibo) website. Launched by Sina Corporation on 14 August 2009, it is one of the biggest social media platforms in China, with over 582 million monthly active users (252 million daily active users) as of Q1 2022. The platform has been highly successful but has faced criticism for heavy censorship. Sina had gone public on the Nasdaq in 2000. In March 2014, Sina announced a spinoff of Weibo and filed an IPO under the symbol WB. Sina carved out 11% of Weibo in the IPO, with Alibaba owning 32% post-IPO. The company began trading publicly on 17 April 2014. In March 2017, Sina launched Sina Weibo International Version. In November 2018, Sina Weibo suspended its registration function for minors under the age of 14. In July 2019, Sina Weibo announced that it would launch a two-month campaign to clean up pornographic and vulgar information, named "Project Deep Blue" (蔚蓝计划). On 29 September 2020, the company announced it would go private again due to rising tensions between the US and China. == Name == "Weibo" (微博) is the Chinese word for "microblog". Sina Weibo launched its new domain name weibo.com on 7 April 2011, deactivating and redirecting from the old domain, t.sina.com.cn, to the new one. Due to its popularity, the media sometimes refers to the platform simply as "Weibo", despite the numerous other Chinese microblogging services including Tencent Weibo, Sohu Weibo, and NetEase Weibo. However, the latter three have stopped providing services. == Background == Sina Weibo is a platform based on fostering user relationships to share, disseminate, and receive information. Through the website or the mobile app, users can upload pictures and videos publicly for instant sharing, with other users being able to comment with text, pictures and videos, or use a multimedia instant messaging service. The company initially invited a large number of celebrities to join the platform at the beginning and has since invited many media personalities, government departments, businesses and non-governmental organizations to open accounts for the purpose of publishing and communicating information. To avoid the impersonation of celebrities, Sina Weibo uses verification symbols; celebrity accounts have an orange letter "V" and organizations' accounts have a blue letter "V". Sina Weibo has more than 500 million registered users; out of these, 313 million are monthly active users, 85% use the Weibo mobile app, 70% are college-aged, 50.10% are male and 49.90% are female. There are over 100 million messages posted by users each day. With more than 100 million followers, actress Xie Na holds the record for the most followers on the platform. Despite fierce competition among Chinese social media platforms, Sina Weibo remains the most popular. == History == After the July 2009 Ürümqi riots, China shut down most domestic microblogging services, including Fanfou, the very first weibo service. Many popular non-China-based microblogging services like Twitter, Facebook, and Plurk have since been blocked. Sina Corporation CEO Charles Chao considered this to be an opportunity, and on 14 August 2009, Sina launched the tested version of Sina Weibo. Basic functions including message, private message, comment and reposting were made available that September. A Sina Weibo–compatible API platform for developing third-party applications was launched on 28 July 2010. On 1 December 2010, the website experienced an outage, which administrators later said was due to the ever-increasing numbers of users and posts. Registered users surpassed 100 million in February 2011. Since 23 March 2011, t.cn has been used as Sina Weibo's official shortened URL in lieu of sinaurl.cn. On 7 April 2011, weibo.com replaced t.sina.com.cn as the new main domain name used by the website. The official logo was also updated. In June 2011, Sina announced an English-language version of Sina Weibo would be developed and launched, though content would still be governed by Chinese law. On 11 January 2013, Sina Weibo and Alibaba China (a subsidiary of Alibaba Group) signed a strategic cooperation agreement. With more and more foreign celebrities using Sina Weibo, language translation has become an urgent need for Chinese users who wish to communicate with their idols online, especially Korean. In January 2013, Sina Weibo and NetEase.com announced that they had reached a strategic cooperation agreement. When users browse foreign language content, they can now directly obtain translation results through the YouDao Dictionary. The Sina Weibo financial report in February 2013 showed that its total revenue was approximately US$66 million and that the number of registered users had exceeded the 500 million mark. In April 2013, Sina officially announced that Sina Weibo had signed a strategic cooperation agreement with Alibaba. The two sides conducted in-depth cooperation in areas such as user account interoperability, data exchange, online payment, and internet marketing. At the same time, Sina announced that Alibaba, through its wholly owned subsidiary, had purchased the preferred shares and common shares issued by Sina Weibo Company for US$586 million, which accounted for approximately 18% of Weibo's fully diluted and diluted total shares. === Ownership === On 9 April 2013, Alibaba Group announced that it would acquire 18% of Sina Weibo for US$586 million, with the option to buy up to 30% in the future. Alibaba exercised this option when Weibo was listed on the NASDAQ in April 2014. == Users == According to iResearch's report on 30 March 2011, Sina Weibo had 56.5% of China's microblogging market based on active users and 86.6% based on browsing time over competitors such as Tencent Weibo and Baidu. According to research by Sina Corporation, the number of active users reached over 400 million by Q1 2018, making Sina Weibo the 7th platform with at least 400 million active users, and daily usage increased by 21%. As of 2017, approximately 80% of its users were in their 20s and 30s. The top 100 users had over 485 million followers combined. More than 5,000 companies and 2,700 media organizations in China use Sina Weibo. The site is maintained by a growing microblogging department of 200 employees responsible for technology, design, operations, and marketing. Sina executives invited and persuaded many Chinese celebrities to join the platform. Users now include Asian celebrities, movie stars, singers, famous business and media figures, athletes, scholars, artists, organizations, religious figures, government departments, and officials from Hong Kong, Mainland China, Malaysia, Singapore, Taiwan, and Macau, as well as some famous foreign individuals and organizations, including Kevin Rudd, Boris Johnson, David Cameron, Narendra Modi, Toshiba, and the Germany national football team. Sina Weibo has a verification program for known people and organizations. Once an account is verified, a verification badge is added beside the account name. == Features == Many of Sina Weibo's features resemble those of Twitter. A user may post with a 140-character limit (increased to 2,000 as of January 2016 with the exception of reposts and comments). An analysis of 29 million Weibo posts found the median length was 14 characters. Users may mention or talk to other people using "@UserName" formatting, add hashtags, follow other users to make their posts appear in one's own timeline, re-post with "//@UserName" similar to Twitter's retweet function "RT @UserName", select posts for one's favorites list, and verify the account if the user is a celebrity, brand, business or otherwise of public interest. URLs are automatically shortened using the domain name t.cn, akin to Twitter's t.co. Official and third-party applications can access Sina Weibo from other websites or platforms. Users may: Submit up to 18 images/video files in every post Send personal messages to followers Follow others and be followed Post "stories" like on Instagram React to posts using different emojis Receive monetary rewards that can be used in a digital store linked to Weibo View posts identified as "hot" or popular Display the location they post from Hashtags differ slightly between Sina Weibo and Twitter, using the double-hashtag "#HashName#" format (the lack of spacing between Chinese characters necessitates a closing tag). Users can own a hashtag by requesting hashtag monitoring; the company reviews these requests and responds within one to three days. Once a user owns a hashtag, they have access to a wide variety of functions available only to them on the condition that they remain active (less than 1 post per calendar week revokes these privileges). Additionally, comments appear as a list below each post. A commenter can also choose to re-post the comment, quoting the whole original post, to their own page. Unregistered users can only browse a few post

    Read more →
  • Immuni

    Immuni

    Immuni was an open-source COVID-19 contact tracing app used for digital contact tracing in Italy, dismissed on 31 December 2022, after a long and debated criticism for having been a failure due to the lack of trust placed by citizens. Immuni COVID-19 contact-tracing app had in fact been downloaded only by 12% of Italians between 14 and 75 years old (the government had previously stated that, in order for the app to work properly, it should have been downloaded by at least 60% of Italians). It makes use of the Apple/Google Exposure Notification system. == Development == It was developed by Bending Spoons and released by the Italian Ministry of Health on 1 June 2020. After a testing phase in 4 Italian regions (Abruzzo, Apulia, Liguria, Marche), the app started being active in the whole country on 15 June 2020. The app was initially released on App Store and Google Play, and since 1 February 2021 it is available on the Huawei AppGallery as well. === Source code === The source code was published on GitHub on the 25 May. The app only works in Italy, but compatibility with other European contact tracing apps was a goal. Since 19 October 2020 the app supports key-exchanges with the EU Interoperability Gateway and is therefore able to communicate with contact tracing apps of other EU countries. == Shutdown == As of 16 December 2020, the app was downloaded more than 10 million times, a number which increased to 21.882.502 downloads the day before the app's shutdown. On 27 December 2022 the Italian Ministry of Health announced that the app and its infrastructures will be dismissed on the 31 December of the same year.

    Read more →
  • Convolutional neural network

    Convolutional neural network

    A convolutional neural network (CNN) is a type of feedforward neural network that learns features via filter (or kernel) optimization. This type of deep learning network has been applied to process and make predictions from many different types of data including text, images and audio. CNNs are the de-facto standard in deep learning-based approaches to computer vision and image processing, and have only recently been replaced—in some cases—by newer architectures such as the transformer. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by the regularization that comes from using shared weights over fewer connections. For example, for each neuron in the fully-connected layer, 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution (or cross-correlation) kernels, only 25 weights for each convolutional layer are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features. Some applications of CNNs include: image and video recognition, recommender systems, image classification, image segmentation, medical image analysis, natural language processing, brain–computer interfaces, and financial time series. CNNs are also known as shift invariant or space invariant artificial neural networks, based on the shared-weight architecture of the convolution kernels or filters that slide along input features and provide translation-equivariant responses known as feature maps. Counter-intuitively, most convolutional neural networks are not invariant to translation, due to the downsampling operation they apply to the input. Feedforward neural networks are usually fully connected networks, that is, each neuron in one layer is connected to all neurons in the next layer. The "full connectivity" of these networks makes them prone to overfitting data. Typical ways of regularization, or preventing overfitting, include: penalizing parameters during training (such as weight decay) or trimming connectivity (skipped connections, dropout, etc.) Robust datasets also increase the probability that CNNs will learn the generalized principles that characterize a given dataset rather than the biases of a poorly-populated set. Convolutional networks were inspired by biological processes in that the connectivity pattern between neurons resembles the organization of the animal visual cortex. Individual cortical neurons respond to stimuli only in a restricted region of the visual field known as the receptive field. The receptive fields of different neurons partially overlap such that they cover the entire visual field. CNNs use relatively little pre-processing compared to other image classification algorithms. This means that the network learns to optimize the filters (or kernels) through automated learning, whereas in traditional algorithms these filters are hand-engineered. This simplifies and automates the process, enhancing efficiency and scalability overcoming human-intervention bottlenecks. == Architecture == A convolutional neural network consists of an input layer, hidden layers and an output layer. In a convolutional neural network, the hidden layers include one or more layers that perform convolutions. Typically this includes a layer that performs a dot product of the convolution kernel with the layer's input matrix. This product is usually the Frobenius inner product, and its activation function is commonly ReLU. As the convolution kernel slides along the input matrix for the layer, the convolution operation generates a feature map, which in turn contributes to the input of the next layer. This is followed by other layers such as pooling layers, fully connected layers, and normalization layers. Here it should be noted how close a convolutional neural network is to a matched filter. === Convolutional layers === In a CNN, the input is a tensor with shape: (number of inputs) × (input height) × (input width) × (input channels) After passing through a convolutional layer, the image becomes abstracted to a feature map, also called an activation map, with shape: (number of inputs) × (feature map height) × (feature map width) × (feature map channels). Convolutional layers convolve the input and pass its result to the next layer. This is similar to the response of a neuron in the visual cortex to a specific stimulus. Each convolutional neuron processes data only for its receptive field. Although fully connected feedforward neural networks can be used to learn features and classify data, this architecture is generally impractical for larger inputs (e.g., high-resolution images), which would require massive numbers of neurons because each pixel is a relevant input feature. A fully connected layer for an image of size 100 × 100 has 10,000 weights for each neuron in the second layer. Convolution reduces the number of free parameters, allowing the network to be deeper. For example, using a 5 × 5 tiling region, each with the same shared weights, requires only 25 neurons. Using shared weights means there are many fewer parameters, which helps avoid the vanishing gradients and exploding gradients problems seen during backpropagation in earlier neural networks. To speed processing, standard convolutional layers can be replaced by depthwise separable convolutional layers, which are based on a depthwise convolution followed by a pointwise convolution. The depthwise convolution is a spatial convolution applied independently over each channel of the input tensor, while the pointwise convolution is a standard convolution restricted to the use of 1 × 1 {\displaystyle 1\times 1} kernels. === Pooling layers === Convolutional networks may include local and/or global pooling layers along with traditional convolutional layers. Pooling layers reduce the dimensions of data by combining the outputs of neuron clusters at one layer into a single neuron in the next layer. Local pooling combines small clusters, tiling sizes such as 2 × 2 are commonly used. Global pooling acts on all the neurons of the feature map. There are two common types of pooling in popular use: max and average. Max pooling uses the maximum value of each local cluster of neurons in the feature map, while average pooling takes the average value. === Fully connected layers === Fully connected layers connect every neuron in one layer to every neuron in another layer. It is the same as a traditional multilayer perceptron neural network (MLP). Each neuron in the fully connected layer receives input from all the neurons in the previous layer. These inputs are weighted and summed with the corresponding biases, and then passed through an activation function to perform a nonlinear transformation, generating the output. The flattened matrix goes through a fully connected layer to classify the images. === Receptive field === In neural networks, each neuron receives input from some number of locations in the previous layer. In a convolutional layer, each neuron receives input from only a restricted area of the previous layer called the neuron's receptive field. Typically the area is a square (e.g. 5 by 5 neurons). Whereas, in a fully connected layer, the receptive field is the entire previous layer. Thus, in each convolutional layer, each neuron takes input from a larger area in the input than previous layers. This is due to applying the convolution over and over, which takes the value of a pixel into account, as well as its surrounding pixels. When using dilated layers, the number of pixels in the receptive field remains constant, but the field is more sparsely populated as its dimensions grow when combining the effect of several layers. To manipulate the receptive field size as desired, there are some alternatives to the standard convolutional layer. For example, atrous or dilated convolution expands the receptive field size without increasing the number of parameters by interleaving visible and blind regions. Moreover, a single dilated convolutional layer can comprise filters with multiple dilation ratios, thus having a variable receptive field size. === Weights === Each neuron in a neural network computes an output value by applying a specific function to the input values received from the receptive field in the previous layer. The function that is applied to the input values is determined by a vector of weights and a bias (typically real numbers). Learning consists of iteratively adjusting these biases and weights. The vectors of weights and biases are called filters and represent particular features of the input (e.g., a particular shape). A distinguishing feature of CNNs is that many neurons can share the same filter. This reduces the memory footprint because a single bias and a single vector of weights are used across all receptive fields that share that filter, as opposed to each receptive field having its own bias and vector

    Read more →
  • Information extraction

    Information extraction

    Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. Typically, this involves processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video/documents could be seen as information extraction. Recent advances in NLP techniques have allowed for significantly improved performance compared to previous years. An example is the extraction from newswire reports of corporate mergers, such as denoted by the formal relation: MergerBetween ⁡ ( c o m p a n y 1 , c o m p a n y 2 , d a t e ) {\displaystyle \operatorname {MergerBetween} (\mathrm {company} _{1},\mathrm {company} _{2},\mathrm {date} )} , from an online news sentence such as: "Yesterday, New York based Foo Inc. announced their acquisition of Bar Corp." A broad goal of IE is to allow computation to be done on the previously unstructured data. A more specific goal is to allow automated reasoning about the logical form of the input data. Structured data is semantically well-defined data from a chosen target domain, interpreted with respect to category and context. Information extraction is the part of a greater puzzle which deals with the problem of devising automatic methods for text management, beyond its transmission, storage and display. The discipline of information retrieval (IR) has developed automatic methods, typically of a statistical flavor, for indexing large document collections and classifying documents. Another complementary approach is that of natural language processing (NLP) which has solved the problem of modelling human language processing with considerable success when taking into account the magnitude of the task. In terms of both difficulty and emphasis, IE deals with tasks in between both IR and NLP. In terms of input, IE assumes the existence of a set of documents in which each document follows a template, i.e. describes one or more entities or events in a manner that is similar to those in other documents but differing in the details. An example, consider a group of newswire articles on Latin American terrorism with each article presumed to be based upon one or more terroristic acts. We also define for any given IE task a template, which is a(or a set of) case frame(s) to hold the information contained in a single document. For the terrorism example, a template would have slots corresponding to the perpetrator, victim, and weapon of the terroristic act, and the date on which the event happened. An IE system for this problem is required to "understand" an attack article only enough to find data corresponding to the slots in this template. == History == Information extraction dates back to the late 1970s in the early days of NLP. An early commercial system from the mid-1980s was JASPER built for Reuters by the Carnegie Group Inc with the aim of providing real-time financial news to financial traders. Beginning in 1987, IE was spurred by a series of Message Understanding Conferences. MUC is a competition-based conference that focused on the following domains: MUC-1 (1987), MUC-3 (1989): Naval operations messages. MUC-3 (1991), MUC-4 (1992): Terrorism in Latin American countries. MUC-5 (1993): Joint ventures and microelectronics domain. MUC-6 (1995): News articles on management changes. MUC-7 (1998): Satellite launch reports. Considerable support came from the U.S. Defense Advanced Research Projects Agency (DARPA), who wished to automate mundane tasks performed by government analysts, such as scanning newspapers for possible links to terrorism. == Present significance == The present significance of IE pertains to the growing amount of information available in unstructured form. Tim Berners-Lee, inventor of the World Wide Web, refers to the existing Internet as the web of documents and advocates that more of the content be made available as a web of data. Until this transpires, the web largely consists of unstructured documents lacking semantic metadata. Knowledge contained within these documents can be made more accessible for machine processing by means of transformation into relational form, or by marking-up with XML tags. An intelligent agent monitoring a news data feed requires IE to transform unstructured data into something that can be reasoned with. A typical application of IE is to scan a set of documents written in a natural language and populate a database with the information extracted. == Tasks and subtasks == Applying information extraction to text is linked to the problem of text simplification in order to create a structured view of the information present in free text. The overall goal being to create a more easily machine-readable text to process the sentences. Typical IE tasks and subtasks include: Template filling: Extracting a fixed set of fields from a document, e.g. extract perpetrators, victims, time, etc. from a newspaper article about a terrorist attack. Event extraction: Given an input document, output zero or more event templates. For instance, a newspaper article might describe multiple terrorist attacks. Knowledge Base Population: Fill a database of facts given a set of documents. Typically the database is in the form of triplets, (entity 1, relation, entity 2), e.g. (Barack Obama, Spouse, Michelle Obama) Named entity recognition: recognition of known entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions, by employing existing knowledge of the domain or information extracted from other sentences. Typically the recognition task involves assigning a unique identifier to the extracted entity. A simpler task is named entity detection, which aims at detecting entities without having any existing knowledge about the entity instances. For example, in processing the sentence "M. Smith likes fishing", named entity detection would denote detecting that the phrase "M. Smith" does refer to a person, but without necessarily having (or using) any knowledge about a certain M. Smith who is (or, "might be") the specific person whom that sentence is talking about. Coreference resolution: detection of coreference and anaphoric links between text entities. In IE tasks, this is typically restricted to finding links between previously extracted named entities. For example, "International Business Machines" and "IBM" refer to the same real-world entity. If we take the two sentences "M. Smith likes fishing. But he doesn't like biking", it would be beneficial to detect that "he" is referring to the previously detected person "M. Smith". Relationship extraction: identification of relations between entities, such as: PERSON works for ORGANIZATION (extracted from the sentence "Bill works for IBM.") PERSON located in LOCATION (extracted from the sentence "Bill is in France.") Semi-structured information extraction which may refer to any IE that tries to restore some kind of information structure that has been lost through publication, such as: Table extraction: finding and extracting tables from documents. Table information extraction : extracting information in structured manner from the tables. This task is more complex than table extraction, as table extraction is only the first step, while understanding the roles of the cells, rows, columns, linking the information inside the table and understanding the information presented in the table are additional tasks necessary for table information extraction. Comments extraction : extracting comments from the actual content of articles in order to restore the link between authors of each of the sentences Language and vocabulary analysis Terminology extraction: finding the relevant terms for a given corpus Audio extraction Template-based music extraction: finding relevant characteristic in an audio signal taken from a given repertoire; for instance time indexes of occurrences of percussive sounds can be extracted in order to represent the essential rhythmic component of a music piece. Note that this list is not exhaustive and that the exact meaning of IE activities is not commonly accepted and that many approaches combine multiple sub-tasks of IE in order to achieve a wider goal. Machine learning, statistical analysis and/or natural language processing are often used in IE. IE on non-text documents is becoming an increasingly interesting topic in research, and information extracted from multimedia documents can now be expressed in a high level structure as it is done on text. This naturally leads to the fusion of extracted information from multiple kinds of documents and sources. == World Wide Web applications == IE has been the focus of the MUC conferences. The proliferation of the Web, however, intensified the need for developing IE systems that help people

    Read more →
  • Intrapixel and Interpixel processing

    Intrapixel and Interpixel processing

    Intrapixel and Interpixel processing is used in the processing of computers graphics, as well as sensors and images in equipment such as cameras. For computer graphics, CMOS sensor processing is done in pixel level. This process includes two general categories: intrapixel processing, where the processing is performed on the individual pixel signals, and interpixel processing, where the processing is performed locally or globally on signals from several pixels. The purpose of interpixel processing is to perform early vision processing, not merely to capture images. Intrapixel and Interpixel processing is an integral part of spatial processing within the earth Mixed Spatial Attraction Model. This also includes use within hyperspectral image processing.

    Read more →
  • Vicuna LLM

    Vicuna LLM

    Vicuna LLM is an omnibus large language model used in AI research. Its methodology is to enable the public at large to contrast and compare the accuracy of LLMs "in the wild" (an example of citizen science) and to vote on their output; a question-and-answer chat format is used. At the beginning of each round two LLM chatbots from a diverse pool of nine are presented randomly and anonymously, their identities only being revealed upon voting on their answers. The user has the option of either replaying ("regenerating") a round, or beginning an entirely fresh one with new LLMs. (The user also has the option of choosing which LLMs to do battle.) Based on Llama 2, it is an open source project, and it itself has become the subject of academic research in the burgeoning field. A non-commercial, public demo of the Vicuna-13b model is available to access using LMSYS.

    Read more →
  • Scale space

    Scale space

    Scale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary motivations from physics and biological vision. It is a formal theory for handling image structures at different scales, by representing an image as a one-parameter family of smoothed images, the scale-space representation, parametrized by the size of the smoothing kernel used for suppressing fine-scale structures. The parameter t {\displaystyle t} in this family is referred to as the scale parameter, with the interpretation that image structures of spatial size smaller than about t {\displaystyle {\sqrt {t}}} have largely been smoothed away in the scale-space level at scale t {\displaystyle t} . The main type of scale space is the linear (Gaussian) scale space, which has wide applicability as well as the attractive property of being possible to derive from a small set of scale-space axioms. The corresponding scale-space framework encompasses a theory for Gaussian derivative operators, which can be used as a basis for expressing a large class of visual operations for computerized systems that process visual information. This framework also allows visual operations to be made scale invariant, which is necessary for dealing with the size variations that may occur in image data, because real-world objects may be of different sizes and in addition the distance between the object and the camera may be unknown and may vary depending on the circumstances. == Definition == The notion of scale space applies to signals of arbitrary numbers of variables. The most common case in the literature applies to two-dimensional images, which is what is presented here. Consider a given image f {\displaystyle f} where f ( x , y ) {\displaystyle f(x,y)} is the greyscale value of the pixel at position ( x , y ) {\displaystyle (x,y)} . The linear (Gaussian) scale-space representation of f {\displaystyle f} is a family of derived signals L ( x , y ; t ) {\displaystyle L(x,y;t)} defined by the convolution of f ( x , y ) {\displaystyle f(x,y)} with the two-dimensional Gaussian kernel g ( x , y ; t ) = 1 2 π t e − ( x 2 + y 2 ) / 2 t {\displaystyle g(x,y;t)={\frac {1}{2\pi t}}e^{-(x^{2}+y^{2})/2t}\,} such that L ( ⋅ , ⋅ ; t ) = g ( ⋅ , ⋅ ; t ) ∗ f ( ⋅ , ⋅ ) , {\displaystyle L(\cdot ,\cdot ;t)\ =g(\cdot ,\cdot ;t)f(\cdot ,\cdot ),} where the semicolon in the argument of L {\displaystyle L} implies that the convolution is performed only over the variables x , y {\displaystyle x,y} , while the scale parameter t {\displaystyle t} after the semicolon just indicates which scale level is being defined. This definition of L {\displaystyle L} works for a continuum of scales t ≥ 0 {\displaystyle t\geq 0} , but typically only a finite discrete set of levels in the scale-space representation would be actually considered. The scale parameter t = σ 2 {\displaystyle t=\sigma ^{2}} is the variance of the Gaussian filter and as a limit for t = 0 {\displaystyle t=0} the filter g {\displaystyle g} becomes an impulse function such that L ( x , y ; 0 ) = f ( x , y ) , {\displaystyle L(x,y;0)=f(x,y),} that is, the scale-space representation at scale level t = 0 {\displaystyle t=0} is the image f {\displaystyle f} itself. As t {\displaystyle t} increases, L {\displaystyle L} is the result of smoothing f {\displaystyle f} with a larger and larger filter, thereby removing more and more of the details that the image contains. Since the standard deviation of the filter is σ = t {\displaystyle \sigma ={\sqrt {t}}} , details that are significantly smaller than this value are to a large extent removed from the image at scale parameter t {\displaystyle t} , see the following figures and for graphical illustrations. === Why a Gaussian filter? === When faced with the task of generating a multi-scale representation one may ask: could any filter g of low-pass type and with a parameter t which determines its width be used to generate a scale space? The answer is no, as it is of crucial importance that the smoothing filter does not introduce new spurious structures at coarse scales that do not correspond to simplifications of corresponding structures at finer scales. In the scale-space literature, a number of different ways have been expressed to formulate this criterion in precise mathematical terms. The conclusion from several different axiomatic derivations that have been presented is that the Gaussian scale space constitutes the canonical way to generate a linear scale space, based on the essential requirement that new structures must not be created when going from a fine scale to any coarser scale. Conditions, referred to as scale-space axioms, that have been used for deriving the uniqueness of the Gaussian kernel include linearity, shift invariance, semi-group structure, non-enhancement of local extrema, scale invariance and rotational invariance. In the works, the uniqueness claimed in the arguments based on scale invariance has been criticized, and alternative self-similar scale-space kernels have been proposed. The Gaussian kernel is, however, a unique choice according to the scale-space axiomatics based on causality or non-enhancement of local extrema. === Alternative definition === Equivalently, the scale-space family can be defined as the solution of the diffusion equation (for example in terms of the heat equation), ∂ t L = 1 2 ∇ 2 L , {\displaystyle \partial _{t}L={\frac {1}{2}}\nabla ^{2}L,} with initial condition L ( x , y ; 0 ) = f ( x , y ) {\displaystyle L(x,y;0)=f(x,y)} . This formulation of the scale-space representation L means that it is possible to interpret the intensity values of the image f as a "temperature distribution" in the image plane and that the process that generates the scale-space representation as a function of t corresponds to heat diffusion in the image plane over time t (assuming the thermal conductivity of the material equal to the arbitrarily chosen constant ⁠1/2⁠). Although this connection may appear superficial for a reader not familiar with differential equations, it is indeed the case that the main scale-space formulation in terms of non-enhancement of local extrema is expressed in terms of a sign condition on partial derivatives in the 2+1-D volume generated by the scale space, thus within the framework of partial differential equations. Furthermore, a detailed analysis of the discrete case shows that the diffusion equation provides a unifying link between continuous and discrete scale spaces, which also generalizes to nonlinear scale spaces, for example, using anisotropic diffusion. Hence, one may say that the primary way to generate a scale space is by the diffusion equation, and that the Gaussian kernel arises as the Green's function of this specific partial differential equation. == Motivations == The motivation for generating a scale-space representation of a given data set originates from the basic observation that real-world objects are composed of different structures at different scales. This implies that real-world objects, in contrast to idealized mathematical entities such as points or lines, may appear in different ways depending on the scale of observation. For example, the concept of a "tree" is appropriate at the scale of meters, while concepts such as leaves and molecules are more appropriate at finer scales. For a computer vision system analysing an unknown scene, there is no way to know a priori what scales are appropriate for describing the interesting structures in the image data. Hence, the only reasonable approach is to consider descriptions at multiple scales in order to be able to capture the unknown scale variations that may occur. Taken to the limit, a scale-space representation considers representations at all scales. Another motivation to the scale-space concept originates from the process of performing a physical measurement on real-world data. In order to extract any information from a measurement process, one has to apply operators of non-infinitesimal size to the data. In many branches of computer science and applied mathematics, the size of the measurement operator is disregarded in the theoretical modelling of a problem. The scale-space theory on the other hand explicitly incorporates the need for a non-infinitesimal size of the image operators as an integral part of any measurement as well as any other operation that depends on a real-world measurement. There is a close link between scale-space theory and biological vision. Many scale-space operations show a high degree of similarity with receptive field profiles recorded from the mammalian retina and the first stages in the visual cortex. In these respects, the scale-space framework can be seen as a theoretically well-founded paradigm for early vision, which in addition has been thoroughly tested by algorithms and experiments. == Gaussian derivatives == At any scale in scale space, we c

    Read more →
  • PropBank

    PropBank

    PropBank is a corpus that is annotated with verbal propositions and their arguments—a "proposition bank". Although "PropBank" refers to a specific corpus produced by Martha Palmer et al., the term propbank is also coming to be used as a common noun referring to any corpus that has been annotated with propositions and their arguments. The PropBank project has played a role in research in natural language processing, and has been used in semantic role labelling. == Comparison == PropBank differs from FrameNet, the resource to which it is most frequently compared, in several ways. PropBank is a verb-oriented resource, while FrameNet is centered on the more abstract notion of frames, which generalizes descriptions across similar verbs (e.g. "describe" and "characterize") as well as nouns and other words (e.g. "description"). PropBank does not annotate events or states of affairs described using nouns. PropBank commits to annotating all verbs in a corpus, whereas the FrameNet project chooses sets of example sentences from a large corpus and only in a few cases has annotated longer continuous stretches of text. PropBank-style annotations often remain close to the syntactic level, while FrameNet-style annotations are sometimes more semantically motivated. From the start, PropBank was developed with the idea of serving as training data for machine learning-based semantic role labeling systems in mind. It requires that all arguments to a verb be syntactic constituents and different senses of a word are only distinguished if the differences bear on the arguments. Due to such differences, semantic role labeling with respect to PropBank is often a somewhat easier task than producing FrameNet-style annotations.

    Read more →
  • Videotex

    Videotex

    Videotex (or interactive videotex) was one of the earliest implementations of an end-user information system. From the late 1970s to early 2010s, it was used to deliver information (usually pages of text) to a user in computer-like format, typically to be displayed on a television or a dumb terminal. In a strict definition, videotex is any system that provides interactive content and displays it on a video monitor such as a television, typically using modems to send data in both directions. A close relative is teletext, which sends data in one direction only, typically encoded in a television signal. All such systems are occasionally referred to as viewdata. Unlike the modern Internet, traditional videotex services were highly centralized. Videotex in its broader definition can be used to refer to any such service, including teletext, the Internet, bulletin board systems, online service providers, and even the arrival/departure displays at an airport. This usage is no longer common. With the exception of Minitel in France, videotex elsewhere never managed to attract any more than a very small percentage of the universal mass market once envisaged. By the end of the 1980s its use was essentially limited to a few niche applications. == Initial development and technologies == === United Kingdom === The first attempts at a general-purpose videotex service were created in the United Kingdom in the late 1960s. In about 1970 the BBC had a brainstorming session in which it was decided to start researching ways to send closed captioning information to the audience. As the Teledata research continued the BBC became interested in using the system for delivering any sort of information, not just closed captioning. In 1972, the concept was first made public under the new name Ceefax. Meanwhile, the General Post Office (soon to become British Telecom) had been researching a similar concept since the late 1960s, known as Viewdata. Unlike Ceefax which was a one-way service carried in the existing TV signal, Viewdata was a two-way system using telephones. Since the Post Office owned the telephones, this was considered to be an excellent way to drive more customers to use the phones. Not to be outdone by the BBC, they also announced their service, under the name Prestel. ITV soon joined the fray with a Ceefax-clone known as ORACLE. In 1974, all the services agreed on a standard for displaying the information. The display would be a simple 40×24 grid of text, with some "graphics characters" for constructing simple graphics, revised and finalized in 1976. The standard did not define the delivery system, so both Viewdata-like and Teledata-like services could at least share the TV-side hardware, which was expensive at the time. The standard also introduced a new term that covered all such services, teletext. Ceefax first started operation in 1974 with a limited 30 pages, followed quickly by ORACLE and then Prestel in 1979. By 1981, Prestel International was available in nine countries, and a number of countries, including Sweden, The Netherlands, Finland and West Germany were developing their own national systems closely based on Prestel. General Telephone and Electronics (GTE) acquired an exclusive agency for the system for North America. In the early 1980s, videotex became the base technology for the London Stock Exchange's pricing service called TOPIC. Later versions of TOPIC, notably TOPIC2 and TOPIC3, were developed by Thanos Vassilakis and introduced trading and historic price feeds. === France === Development of a French teletext-like system began in 1973. A very simple 2-way videotex system called Tictac was also demonstrated in the mid-1970s. As in the UK, this led on to work to develop a common display standard for videotex and teletext, called Antiope, which was finalised in 1977. Antiope had similar capabilities to the UK system for displaying alphanumeric text and chunky "mosaic" character-based block graphics. A difference however was that while in the UK standard control codes automatically also occupied one character position on screen, Antiope allowed for "non spacing" control codes. This gave Antiope slightly more flexibility in the use of colours in mosaic block graphics, and in presenting the accents and diacritics of the French language. Meanwhile, spurred on by the 1978 Nora/Minc report, the French government was determined to catch up on a perceived falling behind in its computer and communications facilities. In 1980 it began field trials issuing Antiope-based terminals for free to over 250,000 telephone subscribers in Ille-et-Vilaine region, where the French CCETT research centre was based, for use as telephone directories. The trial was a success, and in 1982 Minitel was rolled out nationwide. === Canada === Since 1970, researchers at the Communications Research Centre (CRC) in Ottawa had been working on a set of "picture description instructions", which encoded graphics commands as a text stream. Graphics were encoded as a series of instructions (graphics primitives) each represented by a single ASCII character. Graphic coordinates were encoded in multiple 6 bit strings of XY coordinate data, flagged to place them in the printable ASCII range so that they could be transmitted with conventional text transmission techniques. ASCII SI/SO characters were used to differentiate the text from graphic portions of a transmitted "page". In 1975, the CRC gave a contract to Norpak to develop an interactive graphics terminal that could decode the instructions and display them on a colour display, which was successfully up and running by 1977. Against the background of the developments in Europe, CRC was able to persuade the Canadian government to develop the system into a fully-fledged service. In August 1978, the Canadian Department of Communications publicly launched it as Telidon, a "second generation" videotex/teletext service, and committed to a four-year development plan to encourage rollout. Compared to the European systems, Telidon offered real graphics, as opposed to block-mosaic character graphics. The downside was that it required much more advanced decoders, typically featuring Zilog Z80 or Motorola 6809 processors. === Japan === Research in Japan was shaped by the demands of the large number of Kanji characters used in Japanese script. With 1970s technology, the ability to generate so many characters on demand in the end-user's terminal was seen as prohibitive. Instead, development focussed on methods to send pages to user terminals pre-rendered, using coding strategies similar to facsimile machines. This led to a videotex system called Captain ("Character and Pattern Telephone Access Information Network"), created by NTT in 1978, which went into full trials from 1979 to 1981. The system also lent itself naturally to photographic images, albeit at only moderate resolution. However, the pages typically took two or three times longer to load, compared to the European systems. NHK developed an experimental teletext system along similar lines, called CIBS ("Character Information Broadcasting Station"). Based on a 388×200 pixel resolution, it was first announced in 1976, and began trials in late 1978. (NHK's ultimate production teletext system launched in 1983). == Standards == Work to establish an international standard for videotex began in 1978 in CCITT. But the national delegations showed little interest in compromise, each hoping that their system would come to define what was perceived to be going to be an enormous new mass-market. In 1980 CCITT therefore issued recommendation S.100 (later T.100), noting the points of similarity but the essential incompatibility of the systems, and declaring all four to be recognised options. Trying to kick-start the market, AT&T Corporation entered the fray, and in May 1981 announced its own Presentation Layer Protocol (PLP). This was closely based on the Canadian Telidon system, but added to it some further graphics primitives and a syntax for defining macros, algorithms to define cleaner pixel spacing for the (arbitrarily sizeable) text, and also dynamically redefinable characters and a mosaic block graphic character set, so that it could reproduce content from the French Antiope. After some further revisions this was adopted in 1983 as ANSI standard X3.110, more commonly called NAPLPS, the North American Presentation Layer Protocol Syntax. It was also adopted in 1988 as the presentation-layer syntax for NABTS, the North American Broadcast Teletext Specification. Meanwhile, the European national Postal Telephone and Telegraph (PTT) agencies were also increasingly interested in videotex, and had convened discussions in European Conference of Postal and Telecommunications Administrations (CEPT) to co-ordinate developments, which had been diverging along national lines. As well as the British and French standards, the Swedes had proposed extending the British Prestel standard with a new se

    Read more →
  • Uniform convergence in probability

    Uniform convergence in probability

    Uniform convergence in probability is a form of convergence in probability in statistical asymptotic theory and probability theory. It means that, under certain conditions, the empirical frequencies of all events in a certain event-family uniformly converge to their theoretical probabilities. Uniform convergence in probability has applications to statistics as well as machine learning as part of statistical learning theory. Specifically, the Glivenko-Cantelli theorem and the homonymous classes of functions are fundamentally related to uniform convergence. The law of large numbers says that, for each single event A {\displaystyle A} , its empirical frequency in a sequence of independent trials converges (with high probability) to its theoretical probability. In many application however, the need arises to judge simultaneously the probabilities of events of an entire class S {\displaystyle S} from one and the same sample. Moreover, it, is required that the relative frequency of the events converge to the probability uniformly over the entire class of events S {\displaystyle S} . The Uniform Convergence Theorem gives a sufficient condition for this convergence to hold. Roughly, if the event-family is sufficiently simple (its VC dimension is sufficiently small) then uniform convergence holds. == Definitions == For a class of predicates H {\displaystyle H} defined on a set X {\displaystyle X} and a set of samples x = ( x 1 , x 2 , … , x m ) {\displaystyle x=(x_{1},x_{2},\dots ,x_{m})} , where x i ∈ X {\displaystyle x_{i}\in X} , the empirical frequency of h ∈ H {\displaystyle h\in H} on x {\displaystyle x} is Q ^ x ( h ) = 1 m | { i : 1 ≤ i ≤ m , h ( x i ) = 1 } | . {\displaystyle {\widehat {Q}}_{x}(h)={\frac {1}{m}}|\{i:1\leq i\leq m,h(x_{i})=1\}|.} The theoretical probability of h ∈ H {\displaystyle h\in H} is defined as Q P ( h ) = P { y ∈ X : h ( y ) = 1 } . {\displaystyle Q_{P}(h)=P\{y\in X:h(y)=1\}.} The Uniform Convergence Theorem states, roughly, that if H {\displaystyle H} is "simple" and we draw samples independently (with replacement) from X {\displaystyle X} according to any distribution P {\displaystyle P} , then with high probability, the empirical frequency will be close to its expected value, which is the theoretical probability. Here "simple" means that the Vapnik–Chervonenkis dimension of the class H {\displaystyle H} is small relative to the size of the sample. In other words, a sufficiently simple collection of functions behaves roughly the same on a small random sample as it does on the distribution as a whole. The Uniform Convergence Theorem was first proved by Vapnik and Chervonenkis using the concept of growth function. == Uniform Convergence Theorem == The statement of the Uniform Convergence Theorem is as follows: If H {\displaystyle H} is a set of { 0 , 1 } {\displaystyle \{0,1\}} -valued functions defined on a set X {\displaystyle X} and P {\displaystyle P} is a probability distribution on X {\displaystyle X} then for ε > 0 {\displaystyle \varepsilon >0} and m {\displaystyle m} a positive integer, we have: P m { | Q P ( h ) − Q x ^ ( h ) | ≥ ε for some h ∈ H } ≤ 4 Π H ( 2 m ) e − ε 2 m / 8 . {\displaystyle P^{m}\{|Q_{P}(h)-{\widehat {Q_{x}}}(h)|\geq \varepsilon {\text{ for some }}h\in H\}\leq 4\Pi _{H}(2m)e^{-\varepsilon ^{2}m/8}.} In the above, for any x ∈ X m , {\displaystyle x\in X^{m},} Q P ( h ) = P { ( y ∈ X : h ( y ) = 1 } , {\displaystyle Q_{P}(h)=P\{(y\in X:h(y)=1\},} Q ^ x ( h ) = 1 m | { i : 1 ≤ i ≤ m , h ( x i ) = 1 } | {\displaystyle {\widehat {Q}}_{x}(h)={\frac {1}{m}}|\{i:1\leq i\leq m,h(x_{i})=1\}|} and | x | = m . {\displaystyle |x|=m.} P m {\displaystyle P^{m}} indicates that the probability is taken over x {\displaystyle x} consisting of m {\displaystyle m} i.i.d. draws from the distribution P . {\displaystyle P.} Finally, the growth function Π H {\displaystyle \Pi _{H}} is defined in the following way, for any { 0 , 1 } {\displaystyle \{0,1\}} -valued functions H {\displaystyle H} over X {\displaystyle X} and for any natural number m {\displaystyle m} : Π H ( m ) = max | { h ∩ D : D ⊆ X , | D | = m , h ∈ H } | . {\displaystyle \Pi _{H}(m)=\max |\{h\cap D:D\subseteq X,|D|=m,h\in H\}|.} From the point of view of Learning Theory one can consider H {\displaystyle H} to be the Concept/Hypothesis class defined over the instance set X {\displaystyle X} . Crucially, the Sauer–Shelah lemma implies that Π H ( m ) ≤ m d {\displaystyle \Pi _{H}(m)\leq m^{d}} , where d {\displaystyle d} is the VC dimension of H {\displaystyle H} . == Proof of the Uniform Convergence Theorem == and are the sources of the proof below. Before we get into the details of the proof of the Uniform Convergence Theorem we will present a high level overview of the proof. Symmetrization: We transform the problem of analyzing | Q P ( h ) − Q ^ x ( h ) | ≥ ε {\displaystyle |Q_{P}(h)-{\widehat {Q}}_{x}(h)|\geq \varepsilon } into the problem of analyzing | Q ^ r ( h ) − Q ^ s ( h ) | ≥ ε / 2 {\displaystyle |{\widehat {Q}}_{r}(h)-{\widehat {Q}}_{s}(h)|\geq \varepsilon /2} , where r {\displaystyle r} and s {\displaystyle s} are i.i.d samples of size m {\displaystyle m} drawn according to the distribution P {\displaystyle P} . One can view r {\displaystyle r} as the original randomly drawn sample of length m {\displaystyle m} , while s {\displaystyle s} may be thought as the testing sample which is used to estimate Q P ( h ) {\displaystyle Q_{P}(h)} . Permutation: Since r {\displaystyle r} and s {\displaystyle s} are picked identically and independently, so swapping elements between them will not change the probability distribution on r {\displaystyle r} and s {\displaystyle s} . So, we will try to bound the probability of | Q ^ r ( h ) − Q ^ s ( h ) | ≥ ε / 2 {\displaystyle |{\widehat {Q}}_{r}(h)-{\widehat {Q}}_{s}(h)|\geq \varepsilon /2} for some h ∈ H {\displaystyle h\in H} by considering the effect of a specific collection of permutations of the joint sample x = r | | s {\displaystyle x=r||s} . Specifically, we consider permutations σ ( x ) {\displaystyle \sigma (x)} which swap x i {\displaystyle x_{i}} and x m + i {\displaystyle x_{m+i}} in some subset of 1 , 2 , . . . , m {\displaystyle {1,2,...,m}} . The symbol r | | s {\displaystyle r||s} means the concatenation of r {\displaystyle r} and s {\displaystyle s} . Reduction to a finite class: We can now restrict the function class H {\displaystyle H} to a fixed joint sample and hence, if H {\displaystyle H} has finite VC Dimension, it reduces to the problem to one involving a finite function class. We present the technical details of the proof. It should be stressed that this proof glosses over details like the measurability of the events V {\displaystyle V} and R {\displaystyle R} ; measurability is granted in the case of H {\displaystyle H} being finite or countable, but this is not normally the case in standard applications of the theorem (e.g. for statistical learning theory or to prove the Glivenko-Cantelli theorem). To get measurability, one needs to use a notion of separability of the underlying space, possibly related to H {\displaystyle H} . === Symmetrization === Lemma: Let V = { x ∈ X m : | Q P ( h ) − Q ^ x ( h ) | ≥ ε for some h ∈ H } {\displaystyle V=\{x\in X^{m}:|Q_{P}(h)-{\widehat {Q}}_{x}(h)|\geq \varepsilon {\text{ for some }}h\in H\}} and R = { ( r , s ) ∈ X m × X m : | Q r ^ ( h ) − Q ^ s ( h ) | ≥ ε / 2 for some h ∈ H } . {\displaystyle R=\{(r,s)\in X^{m}\times X^{m}:|{\widehat {Q_{r}}}(h)-{\widehat {Q}}_{s}(h)|\geq \varepsilon /2{\text{ for some }}h\in H\}.} Then for m ≥ 2 ε 2 {\displaystyle m\geq {\frac {2}{\varepsilon ^{2}}}} , P m ( V ) ≤ 2 P 2 m ( R ) {\displaystyle P^{m}(V)\leq 2P^{2m}(R)} . Proof: By the triangle inequality, if | Q P ( h ) − Q ^ r ( h ) | ≥ ε {\displaystyle |Q_{P}(h)-{\widehat {Q}}_{r}(h)|\geq \varepsilon } and | Q P ( h ) − Q ^ s ( h ) | ≤ ε / 2 {\displaystyle |Q_{P}(h)-{\widehat {Q}}_{s}(h)|\leq \varepsilon /2} then | Q ^ r ( h ) − Q ^ s ( h ) | ≥ ε / 2 {\displaystyle |{\widehat {Q}}_{r}(h)-{\widehat {Q}}_{s}(h)|\geq \varepsilon /2} . Therefore, P 2 m ( R ) ≥ P 2 m { ∃ h ∈ H , | Q P ( h ) − Q ^ r ( h ) | ≥ ε and | Q P ( h ) − Q ^ s ( h ) | ≤ ε / 2 } = ∫ V P m { s : ∃ h ∈ H , | Q P ( h ) − Q ^ r ( h ) | ≥ ε and | Q P ( h ) − Q ^ s ( h ) | ≤ ε / 2 } d P m ( r ) = A {\displaystyle {\begin{aligned}&P^{2m}(R)\\[5pt]\geq {}&P^{2m}\{\exists h\in H,|Q_{P}(h)-{\widehat {Q}}_{r}(h)|\geq \varepsilon {\text{ and }}|Q_{P}(h)-{\widehat {Q}}_{s}(h)|\leq \varepsilon /2\}\\[5pt]={}&\int _{V}P^{m}\{s:\exists h\in H,|Q_{P}(h)-{\widehat {Q}}_{r}(h)|\geq \varepsilon {\text{ and }}|Q_{P}(h)-{\widehat {Q}}_{s}(h)|\leq \varepsilon /2\}\,dP^{m}(r)\\[5pt]={}&A\end{aligned}}} since r {\displaystyle r} and s {\displaystyle s} are independent. Now for r ∈ V {\displaystyle r\in V} fix an h ∈ H {\displaystyle h\in H} such that | Q P ( h ) − Q ^ r ( h ) | ≥ ε {\displaystyle |Q_{P}(h)-{\widehat {Q}}_{r}(h)|\geq \varepsilon } . For this h {\displaystyle h} , we shall

    Read more →
  • Meta AI

    Meta AI

    Meta AI is a research division of Meta (formerly Facebook) that develops artificial intelligence and augmented reality technologies. == History == Meta AI was founded in 2013 as Facebook Artificial Intelligence Research (FAIR). It has workspaces in Menlo Park, London, New York City, Paris, Seattle, Pittsburgh, Tel Aviv, and Montreal as of 2025. In 2016, FAIR partnered with Google, Amazon, IBM, and Microsoft in creating the Partnership on Artificial Intelligence to Benefit People and Society. Meta AI was directed by Yann LeCun until 2018, when Jérôme Pesenti succeeded the role. Pesenti is formerly the CTO of IBM's big data group. FAIR's research includes self-supervised learning, generative adversarial networks, document classification and translation, and computer vision. FAIR released Torch deep-learning modules as well as PyTorch in 2017, an open-source machine learning framework, which was subsequently used in several deep learning technologies, such as Tesla's autopilot and Uber's Pyro. That same year, a pair of chatbots were falsely rumored to be discontinued for developing a language that was unintelligible to humans. FAIR clarified that the research had been shut down because they had accomplished their initial goal to understand how languages are generated by their models, rather than out of fear. FAIR was renamed Meta AI following the rebranding that changed Facebook, Inc. to Meta Platforms Inc. On October 1, 2025, Facebook announced "We will soon use your interactions with AI at Meta to personalize the content and ads you see". == Virtual assistant == Meta AI is also the name of the virtual assistant developed by the team, now integrated as a chatbot into Meta's social networking products. It is also available as a subscription-based stand-alone app. The virtual assistant was pre-installed on the second generation of Ray-Ban Meta smartglasses, and can incorporate inputs from the glasses' cameras after an update. It is also available on Quest 2 and newer HMDs. Since May 2024, the chatbot has summarized news from various outlets without linking directly to original articles, including in Canada, where news links are banned on its platforms. This use of news content without compensation and attribution has raised ethical and legal concerns, especially as Meta continues to reduce news visibility on its platforms. == Current research == === Natural language processing and chatbot === Natural language processing is the ability for machines to understand and generate natural language. The team is also researching unsupervised machine translation and multilingual chatbots. ==== Galactica ==== Galactica is a large language model (LLM) designed for generating scientific text. It was available for three days from 15 November 2022, before being withdrawn for generating racist and inaccurate content. ==== Llama ==== Llama is an LLM released in February 2023. As of January 2026, the most recent release is the Llama 4. === Hardware === Meta used CPUs and in-house custom chips before 2022; they switched to Nvidia GPUs since then. MTIA v1, one of their early chips, is designed for the company's content recommendation algorithms. It was fabricated on TSMC's 7 nm process technology and consumed 25W, capable of 51.2 TFlops FP16. == Controversy == The French media outlet Mediapart reports that in 2022, Facebook's parent company illegally used works accumulated by the pirate site LibGen to train its artificial intelligence.

    Read more →
  • Eyes of Things

    Eyes of Things

    Eyes of Things (EoT) is the name of a project funded by the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement number 643924. The purpose of the project, which is funded under the Smart Cyber-physical systems topic, is to develop a generic hardware-software platform for embedded, efficient (i.e. battery-operated, wearable, mobile), computer vision, including deep learning inference. On November 29, 2018, the European Space Agency announced that it was testing the suitability of the device for space applications in advance of a flight in a Cubesat. == Motivation == EoT is based on the following tenets: Future embedded systems will have more intelligence and cognitive functionality. Vision is paramount to such intelligent capacity Unlike other sensors, vision requires intensive processing. Power consumption must be optimized if vision is to be used in mobile and wearable applications Cloud processing of edge-captured images is not sustainable. The sheer amount of visual data generated cannot be transferred to the cloud. Bandwidth is not sufficient and cloud servers cannot cope with it. == Partners == VISILAB group at University of Castilla–La Mancha (Coordinator) Movidius Awaiba Thales Security Solutions & Systems DFKI Fluxguide Evercam nVISO == Awards == 2019 Electronic Component and Systems Innovation Award by the European Commission 2018 HiPEAC Tech Transfer Award 2018 EC Innovation Radar - highlighting excellent innovations Award 2018 Internet of Things (IoT) Technology Research Award Pilot by Google 2016 Semifinalist "THE VISION SHOW STARTUP COMPETITION", Global Association for Vision Information, Boston US

    Read more →
  • Manifold regularization

    Manifold regularization

    In machine learning, manifold regularization is a technique for using the shape of a dataset to constrain the functions that should be learned on that dataset. In many machine learning problems, the data to be learned do not cover the entire input space. For example, a facial recognition system may not need to classify any possible image, but only the subset of images that contain faces. The technique of manifold learning assumes that the relevant subset of data comes from a manifold, a mathematical structure with useful properties. The technique also assumes that the function to be learned is smooth: data with different labels are not likely to be close together, and so the labeling function should not change quickly in areas where there are likely to be many data points. Because of this assumption, a manifold regularization algorithm can use unlabeled data to inform where the learned function is allowed to change quickly and where it is not, using an extension of the technique of Tikhonov regularization. Manifold regularization algorithms can extend supervised learning algorithms in semi-supervised learning and transductive learning settings, where unlabeled data are available. The technique has been used for applications including medical imaging, geographical imaging, and object recognition. == Manifold regularizer == === Motivation === Manifold regularization is a type of regularization, a family of techniques that reduces overfitting and ensures that a problem is well-posed by penalizing complex solutions. In particular, manifold regularization extends the technique of Tikhonov regularization as applied to Reproducing kernel Hilbert spaces (RKHSs). Under standard Tikhonov regularization on RKHSs, a learning algorithm attempts to learn a function f {\displaystyle f} from among a hypothesis space of functions H {\displaystyle {\mathcal {H}}} . The hypothesis space is an RKHS, meaning that it is associated with a kernel K {\displaystyle K} , and so every candidate function f {\displaystyle f} has a norm ‖ f ‖ K {\displaystyle \left\|f\right\|_{K}} , which represents the complexity of the candidate function in the hypothesis space. When the algorithm considers a candidate function, it takes its norm into account in order to penalize complex functions. Formally, given a set of labeled training data ( x 1 , y 1 ) , … , ( x ℓ , y ℓ ) {\displaystyle (x_{1},y_{1}),\ldots ,(x_{\ell },y_{\ell })} with x i ∈ X , y i ∈ Y {\displaystyle x_{i}\in X,y_{i}\in Y} and a loss function V {\displaystyle V} , a learning algorithm using Tikhonov regularization will attempt to solve the expression arg min f ∈ H 1 ℓ ∑ i = 1 ℓ V ( f ( x i ) , y i ) + γ ‖ f ‖ K 2 {\displaystyle {\underset {f\in {\mathcal {H}}}{\arg \!\min }}{\frac {1}{\ell }}\sum _{i=1}^{\ell }V(f(x_{i}),y_{i})+\gamma \left\|f\right\|_{K}^{2}} where γ {\displaystyle \gamma } is a hyperparameter that controls how much the algorithm will prefer simpler functions over functions that fit the data better. Manifold regularization adds a second regularization term, the intrinsic regularizer, to the ambient regularizer used in standard Tikhonov regularization. Under the manifold assumption in machine learning, the data in question do not come from the entire input space X {\displaystyle X} , but instead from a nonlinear manifold M ⊂ X {\displaystyle M\subset X} . The geometry of this manifold, the intrinsic space, is used to determine the regularization norm. === Laplacian norm === There are many possible choices for the intrinsic regularizer ‖ f ‖ I {\displaystyle \left\|f\right\|_{I}} . Many natural choices involve the gradient on the manifold ∇ M {\displaystyle \nabla _{M}} , which can provide a measure of how smooth a target function is. A smooth function should change slowly where the input data are dense; that is, the gradient ∇ M f ( x ) {\displaystyle \nabla _{M}f(x)} should be small where the marginal probability density P X ( x ) {\displaystyle {\mathcal {P}}_{X}(x)} , the probability density of a randomly drawn data point appearing at x {\displaystyle x} , is large. This gives one appropriate choice for the intrinsic regularizer: ‖ f ‖ I 2 = ∫ x ∈ M ‖ ∇ M f ( x ) ‖ 2 d P X ( x ) {\displaystyle \left\|f\right\|_{I}^{2}=\int _{x\in M}\left\|\nabla _{M}f(x)\right\|^{2}\,d{\mathcal {P}}_{X}(x)} In practice, this norm cannot be computed directly because the marginal distribution P X {\displaystyle {\mathcal {P}}_{X}} is unknown, but it can be estimated from the provided data. === Graph-based approach of the Laplacian norm === When the distances between input points are interpreted as a graph, then the Laplacian matrix of the graph can help to estimate the marginal distribution. Suppose that the input data include ℓ {\displaystyle \ell } labeled examples (pairs of an input x {\displaystyle x} and a label y {\displaystyle y} ) and u {\displaystyle u} unlabeled examples (inputs without associated labels). Define W {\displaystyle W} to be a matrix of edge weights for a graph, where W i j {\displaystyle W_{ij}} is a similarity built from distance measure between the data points x i {\displaystyle x_{i}} and x j {\displaystyle x_{j}} (so that more close implies higher W i j {\displaystyle W_{ij}} ). Define D {\displaystyle D} to be a diagonal matrix with D i i = ∑ j = 1 ℓ + u W i j {\displaystyle D_{ii}=\sum _{j=1}^{\ell +u}W_{ij}} and L {\displaystyle L} to be the Laplacian matrix D − W {\displaystyle D-W} . Then, as the number of data points ℓ + u {\displaystyle \ell +u} increases, L {\displaystyle L} converges to the Laplace–Beltrami operator Δ M {\displaystyle \Delta _{M}} , which is the divergence of the gradient ∇ M {\displaystyle \nabla _{M}} . Then, if f {\displaystyle \mathbf {f} } is a vector of the values of f {\displaystyle f} at the data, f = [ f ( x 1 ) , … , f ( x l + u ) ] T {\displaystyle \mathbf {f} =[f(x_{1}),\ldots ,f(x_{l+u})]^{\mathrm {T} }} , the intrinsic norm can be estimated: ‖ f ‖ I 2 = 1 ( ℓ + u ) 2 f T L f {\displaystyle \left\|f\right\|_{I}^{2}={\frac {1}{(\ell +u)^{2}}}\mathbf {f} ^{\mathrm {T} }L\mathbf {f} } As the number of data points ℓ + u {\displaystyle \ell +u} increases, this empirical definition of ‖ f ‖ I 2 {\displaystyle \left\|f\right\|_{I}^{2}} converges to the definition when P X {\displaystyle {\mathcal {P}}_{X}} is known. === Solving the regularization problem with graph-based approach === Using the weights γ A {\displaystyle \gamma _{A}} and γ I {\displaystyle \gamma _{I}} for the ambient and intrinsic regularizers, the final expression to be solved becomes: arg min f ∈ H 1 ℓ ∑ i = 1 ℓ V ( f ( x i ) , y i ) + γ A ‖ f ‖ K 2 + γ I ( ℓ + u ) 2 f T L f {\displaystyle {\underset {f\in {\mathcal {H}}}{\arg \!\min }}{\frac {1}{\ell }}\sum _{i=1}^{\ell }V(f(x_{i}),y_{i})+\gamma _{A}\left\|f\right\|_{K}^{2}+{\frac {\gamma _{I}}{(\ell +u)^{2}}}\mathbf {f} ^{\mathrm {T} }L\mathbf {f} } As with other kernel methods, H {\displaystyle {\mathcal {H}}} may be an infinite-dimensional space, so if the regularization expression cannot be solved explicitly, it is impossible to search the entire space for a solution. Instead, a representer theorem shows that under certain conditions on the choice of the norm ‖ f ‖ I {\displaystyle \left\|f\right\|_{I}} , the optimal solution f ∗ {\displaystyle f^{}} must be a linear combination of the kernel centered at each of the input points: for some weights α i {\displaystyle \alpha _{i}} , f ∗ ( x ) = ∑ i = 1 ℓ + u α i K ( x i , x ) {\displaystyle f^{}(x)=\sum _{i=1}^{\ell +u}\alpha _{i}K(x_{i},x)} Using this result, it is possible to search for the optimal solution f ∗ {\displaystyle f^{}} by searching the finite-dimensional space defined by the possible choices of α i {\displaystyle \alpha _{i}} . === Functional approach of the Laplacian norm === The idea beyond the graph-Laplacian is to use neighbors to estimate the Laplacian. This method is akin to local averaging methods, that are known to scale poorly in high-dimensional problems. Indeed, the graph Laplacian is known to suffer from the curse of dimensionality. Luckily, it is possible to leverage expected smoothness of the function to estimate thanks to more advanced functional analysis. This method consists of estimating the Laplacian operator using derivatives of the kernel reading ∂ 1 , j K ( x i , x ) {\displaystyle \partial _{1,j}K(x_{i},x)} where ∂ 1 , j {\displaystyle \partial _{1,j}} denotes the partial derivatives according to the j-th coordinate of the first variable. This second approach to the Laplacian norm is to put in relation with meshfree methods, that contrast with the finite difference method in PDE. == Applications == Manifold regularization can extend a variety of algorithms that can be expressed using Tikhonov regularization, by choosing an appropriate loss function V {\displaystyle V} and hypothesis space H {\displaystyle {\mathcal {H}}} . Two commonly used examples are the families of support vector machines and regularized least squares algorithm

    Read more →
  • Language identification

    Language identification

    In natural language processing, language identification or language guessing is the problem of determining which natural language a given content is in. Computational approaches to this problem view it as a special case of text categorization, solved with various statistical methods. == Overview == === Logical approach === A common non-statistical intuitive approach (though highly uncertain) is to look for common letter combinations, or distinctive diacritics or punctuation. === Statistical approach === There are several statistical approaches to language identification. An older statistical method by Grefenstette was based on the frequency of short n-grams, which are often function morphemes. For example, "ing" is more common in English than in French, while the sequence "que" is more common in French. Given a new page found on the Web, one counts the number of occurrences of each such short sequence and picks the language whose frequency table it matches the most. One technique is to compare the compressibility of the text to the compressibility of texts in a set of known languages. This approach is known as mutual information based distance measure. The same technique can also be used to empirically construct family trees of languages which closely correspond to the trees constructed using historical methods. Mutual information based distance measure is essentially equivalent to more conventional model-based methods and is not generally considered to be either novel or better than simpler techniques. Another technique, as described by Cavnar and Trenkle (1994) and Dunning (1994) is to create a language n-gram model from a "training text" for each of the languages. These models can be based on characters (Cavnar and Trenkle) or encoded bytes (Dunning); in the latter, language identification and character encoding detection are integrated. Then, for any piece of text needing to be identified, a similar model is made, and that model is compared to each stored language model. The most likely language is the one with the model that is most similar to the model from the text needing to be identified. This approach can be problematic when the input text is in a language for which there is no model. In that case, the method may return another, "most similar" language as its result. Also problematic for any approach are pieces of input text that are composed of several languages, as is common on the Web. As of 2025, a commonly used baseline method is via the fastText library, which has comparable classification accuracy as deep learning techniques, but much faster. == Identifying similar languages == One of the great bottlenecks of language identification systems is to distinguish between closely related languages. Similar languages like Bulgarian and Macedonian or Indonesian and Malay present significant lexical and structural overlap, making it challenging for systems to discriminate between them. In 2014 the DSL shared task has been organized providing a dataset (Tan et al., 2014) containing 13 different languages (and language varieties) in six language groups: Group A (Bosnian, Croatian, Serbian), Group B (Indonesian, Malaysian), Group C (Czech, Slovak), Group D (Brazilian Portuguese, European Portuguese), Group E (Peninsular Spanish, Argentine Spanish), Group F (American English, British English). The best system reached performance of over 95% results (Goutte et al., 2014). Results of the DSL shared task are described in Zampieri et al. 2014. == Software == Apache OpenNLP includes char n-gram based statistical detector and comes with a model that can distinguish 103 languages Apache Tika contains a language detector for 18 languages

    Read more →
  • Dynamic texture

    Dynamic texture

    Dynamic texture ( sometimes referred to as temporal texture) is the texture with motion which can be found in videos of sea-waves, fire, smoke, wavy trees, etc. Dynamic texture has a spatially repetitive pattern with time-varying visual pattern. Modeling and analyzing dynamic texture is a topic of images processing and pattern recognition in computer vision. Extracting features that describe the dynamic texture can be utilized for tasks of images sequences classification, segmentation, recognition and retrieval. Comparing with texture found within static images, analyzing dynamic texture is a challenging problem. It is important that the extracted features from dynamic texture combine motion and appearance description, and also be invariance to some transformation such as rotation, translation and illumination. == Analysis methods of dynamic texture == The methods of dynamic texture recognition can categorized as follows: Methods based on optical flow: by applying optical flow to the dynamic texture, velocity with direction and magnitude can be detected and used to recognize the dynamic texture. Due to simplicity of its computation, it is currently the most popular method. Methods computing geometric properties: this methods track the surfaces of motion trajectories in spatiotemporal domain. Methods based on local spatiotemporal filtering : this methods analyze the local spatiotemporal patterns and its orientation and energy and employ them as feature used for classification. Methods based on global spatiotemporal transform: this method characterize the motion at different scale using wavelets that can decompose the motion into local and global. Model-based methods : These methods aims at generating a model to describe the motion by a set of parameters. == Applications == - Segmenting the sequence images of natural scenes. This helps on differentiate between streets and grass alongside these streets which could be used in the application of navigations. - Motion detection : Dynamic texture features extracted from footage videos can be exploited to detect abnormal crowd activities. - Video classification: video of natural scenes or other scenes that exhibit dynamic textures. - Video retrieval : Dynamic textures can be employed as a feature retrieve videos that contain, for example, sea-waves, smoke, clouds, wavy trees.

    Read more →