AI Avatar For Teams Meetings

AI Avatar For Teams Meetings — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Super app

    Super app

    A super app or super-app (also known as an everything app) is a mobile or web application that can provide multiple services including payment and instant messaging services, effectively becoming an all-encompassing, self-contained, commerce and communication online platform that embraces many aspects of personal and commercial life. Notable examples of super apps include Tencent's WeChat in China, Tata Neu in India, Grab in Southeast Asia and Max in Russia. For end users, a super app is an application that provides a set of core features while also giving access to independently developed miniapps. For app developers, a super app is an application integrated with the capabilities of platforms and ecosystems that allows third-parties to develop and publish miniapps. == History == The super app term was first used to describe WeChat when it combined the instant messaging service with the digital wallet function. Recognition of WeChat as a super app stems from its combination of messaging, payments, e-commerce, and much more within a single application, making it indispensable for many users. WeChat's establishment of the super app model has led companies like Meta to try to build similar applications outside of China. In India, Tata Group has announced that it is currently developing a super app named Tata Neu. Major Indian companies like Paytm, PhonePe, and ITC Maars also have apps in development that might constitute super apps. In Southeast Asia, Grab and Gojek lay claim to the super app classification despite lacking many of the features offered by WeChat. Accordingly, growth-stage companies like Shopee, Traveloka, and AirAsia have also expanded the range of services offered by their respective applications. == Notable examples == === Alipay === Alipay is a third-party mobile and online payment platform established in Hangzhou, China in February 2004 by Alibaba Group and its founder Jack Ma. It operates in association with Ant Group, an affiliate company of the Chinese Alibaba Group. === Gojek === Gojek is an Indonesian on-demand multiservice digital platform and fintech payment super app. Established in Jakarta in 2010, as a call center to connect consumers to courier delivery and two-wheeled ride-hailing services, it launched its mobile app in 2015 with four services: GoRide, GoSend, GoShop, and GoFood, which has since expanded to offer over 20 services. In 2021, it merged with another Indonesian unicorn, Tokopedia, forming the decacorn GoTo Gojek Tokopedia. === Grab === Grab is a Southeast Asian technology company headquartered in Singapore and Indonesia. Founded in 2012 as the MyTeksi app in Kuala Lumpur, Malaysia, it expanded the following year as GrabTaxi, before moving its headquarters to Singapore in 2014 and rebranding officially as Grab. In addition to ride-hailing and transportation services, the company's mobile app also offers food delivery and digital payment services. === Max === Max is a messenger from the Russian company VK, positioned as a super app. The application combines messaging, calls, and channels features with the integration of additional services: payments, miniapps, taxi ordering, deliveries, and other everyday services are available within a single interface. The goal is to unite communication and routine tasks in a unified ecosystem. === Tata Neu === Tata Neu is a multipurpose super app, developed in India by the Tata Group. It is the country's first super app. The app was launched to coincide with the start of a 2022 Indian Premier League cricket match. === WeChat === WeChat is a Chinese multipurpose instant messaging, social media and mobile payment app. First released in 2011, it became the world's largest standalone mobile app in 2018, with over 1 billion monthly active users. WeChat provides text messaging, hold-to-talk voice messaging, broadcast (one-to-many) messaging, video conferencing, video games, the sharing of photographs and videos and location sharing. === X === X is an American social network, originally known as Twitter from its launch through 2023. Prior to his acquisition of the service, new owner Elon Musk stated that he planned for Twitter to become an "everything app" known as "X"; in 2023, the service added an AI chatbot known as "Grok" as well as integrated job search tools known as "X Hiring". In January 2025, X announced its intent to offer a digital wallet service in the future. Later in the year, X revamped its direct messaging system as "Chat". == Criticism == Although apps that fit the super app classification can offer users a wider variety of services in comparison to single-purpose alternatives, internet regulators in regions such as the US and Europe have become more concerned about the overall power of the technology industry and have become more critical of companies developing such apps. In China, WeChat and other local firms have been ordered to open up their platforms to rivals by local regulators. There are also reports that suggest it might be difficult to replicate WeChat's super app model. This stems partly from the peaking of smartphone penetration rates in many regions worldwide, which has led to overcrowded app stores and tighter restrictions on targeted advertising as regulators assert more control over the companies. From a technical viewpoint, single-purpose apps are comparatively faster, more responsive and easier to navigate than super apps, which helps improve the overall user experience. Super-apps are also likelier to store larger amounts of personal data to facilitate the delivery of their services, so users run a greater risk of becoming victims of severe data breaches. In 2020, this unfolded with Tokopedia, which had the data of 91 million of its users stolen and shared by crackers. It has also been noted that a user who loses access to their account or is banned from a super app generally loses access to multiple real-life services and digital applications; the Chinese government has used this approach to penalize people who shared the photos of the Sitong Bridge protest.

    Read more →
  • User profile

    User profile

    A user profile is a collection of settings and information associated with a user. It contains critical information that is used to identify an individual, such as their name, age, portrait photograph and individual characteristics such as knowledge or expertise. User profiles are most commonly present on social media websites such as Facebook, Instagram, and LinkedIn; and serve as voluntary digital identity of an individual, highlighting their key features and traits. In personal computing and operating systems, user profiles serve to categorise files, settings, and documents by individual user environments, known as 'accounts', allowing the operating system to be more friendly and catered to the user. Physical user profiles serve as identity documents such as passports, driving licenses and legal documents that are used to identify an individual under the legal system. A user profile can also be considered as the computer representation of a user model. A user model is a (data) structure that is used to capture certain characteristics about an individual user, and the process of obtaining the user profile is called user modeling or profiling. == Origin == The origin of user profiles can be traced to the origin of the passport, an identity document (ID) made mandatory in 1920, after World War I following negotiations at the League of Nations. The passport served as an official government record of an individual. Consequently, Immigration Act of 1924 was established to identify an individual's country of origin. In the 21st century, passports have now become a highly sought-after commodity as it is widely accepted as a source of verifying an individual's identity under the legal system. With the advent of digital revolution and social media websites, user profiles have transitioned to an organised group of data describing the interaction between a user and a system. Social media sites like Instagram allow individuals to create profiles that are representative of their desired personality and image. Filling all fields of profile information may not be necessary to create a meaningful self-presentation, which grants individual more control over of the identity they wish to present by displaying the most meaningful attributes. A personal user profile is a key aspect of an individual's social networking experience, around which his/her public identity is built. == Types of user profiles == A user profile can be of any format if it contains information, settings and/or characteristics specific to an individual. Most popular user profiles include those on photo and video sharing websites such as Facebook and Instagram, accounts on operating systems, such as those on Windows and MacOS and physical documents such as passports and driving licenses. === Social media === Effectively structured user profiles on social media channels such as Instagram and Facebook offer a way for people to form impressions about someone that is predictive or similarly meeting them offline. The condensed format of social media profiles allows for quick filtering of millions of profiles by matching individuals by similar characteristics and interests; information provided upon sign up. A research conducted highlights that only a "thin slice" of information is required to form an impression about an individual online (Stecher and Counts 2008). Online user profiles eliminate the complexity of interaction that is present in 'face-to-face' meetings such as behavioural, facial, and environmental information, resulting in increased predictiveness of user personality. Dating apps and websites solely rely on an individual's user profile and the information provided to form interactions and communication with others on the platform. Despite having control over presented information, lying is minimal in online dating contexts (Hancock, Toma and Ellison, 2007). Apps such as Bumble allow users to 'match' with other individuals based on their characteristics and selected filters that allow users to narrow the spectrum of search to their preference. Information for a user's profile is voluntarily specified by the user and includes information such as height, interests, photographs, gender or education. The requirement of information varies respective to each platform, and there surrounds little consensus to an appropriate amount of information for a condensed user profile. Universally, all social networking platforms display an individual's profile picture and an "about me" page that allows for self-expression. === Influencers === Influencer user profiles are third party endorsers who shape audience attitudes and decisions through social media content such as photos, blogs and tweets. Social Media Influencers (SMI) often hold a significant following on a social media platform which enables them to be recognised as opinion leaders to shape an information influence to their audience. 'Influencer marketing' industry gained prominence in 2018, when the photo sharing app Instagram crossed 1 billion users, subsequently with approximately 60,000 google search queries for 'influencer marketing' the same year. Influencer user profiles hold a unique selling point, or public personality that is unique and charismatic to the needs and wants of their target audience. SMI profiles advertise product information, latest promotions and regularly engage with their followers to maintain their online persona. Messages endorsed by social media influencers are often perceived as reliable and compelling, as a study conducted found 82% of followers were more inclined to follow the suggestions of their favorite influencer. This allows advertisers to leverage online user profiles and their audience rapport to target younger and niche audiences. According to a market survey, influencer marketing through social media profiles yields a return 11 times higher than traditional marketing, as they are more capable of communicating to a niche segment. Most popular influencers include sport starts such as Cristiano Ronaldo and Hollywood personalities such as Dwayne Johnson and Kylie Jenner each with over 200 million followers respectively. === Ecommerce === Online shopping or Ecommerce websites such as Amazon use information from a customer's user profile and interests to generate a list of recommended items to shop. Recommendation algorithms analyse user demographic data, history, and favourite artists to compile suggestions. The store rapidly adapts to changing user needs and preferences, with generation of real time results required within half of a second. New profiles naturally have limited information for algorithms to analyse, and customer data of each interaction provides valuable information which is stored as a database linked with each individual profile. User profiles on ecommerce websites also serve to improve sales of sellers as individuals are recommend products that other "customers who bought this item also bought" to widen the selection of the buyer. A study conducted found that user profiles and recommendation algorithms have significant impact on related product sales and overall spending of an individual. A process known as "collaborative filtering" tries to analyse common products of interest for an individual on the basis of views expressed by other similar behaving profiles. Features such as product ratings, seller ratings and comments allow individual user profiles to contribute to recommendation algorithms, eliminate adverse selection and contribute to shaping an online marketplace adhering to Amazons zero tolerance policy for misleading products. == Digital user profiles == Modern software and applications account for user profiles as a foundation on which a usable application is built. The structure and layout of an application such as its menus, features and controls are often derived from user's selected settings and preferences. The origin of digital user profiles in computer systems was first initiated by Windows NT that held user settings and information in a separate environment variable named %USERPROFILE% and held the framework to a user's profile root. Consequently, operating systems such as MacOS further accelerated prominence of user profiles in Mac OS X 10.0. Iterations since have been made with each operating system release with the aim to maximise user friendliness with the system. Features such as keyboard layouts, time zones, measurement units, synchronisation of different services and privacy preferences are made available during the setup of a user account on the computer === Types of accounts === ==== Administrator ==== Administrator user profiles have complete access to the system and its permissions. It is often the first user profile on a system by design, and is what allows other accounts to be created. However, since the administrator account has no restrictions, they are highly vulnerable to malware and viruses, with potential to impact all other accounts.

    Read more →
  • Fifth Generation Computer Systems

    Fifth Generation Computer Systems

    The Fifth Generation Computer Systems (FGCS; Japanese: 第五世代コンピュータ, romanized: daigosedai konpyūta) was a 10-year initiative launched in 1982 by Japan's Ministry of International Trade and Industry (MITI) to develop computers based on massively parallel computing and logic programming. The project aimed to create an "epoch-making computer" with supercomputer-like performance and to establish a platform for future advancements in artificial intelligence. Although FGCS was noted as ahead of its time, and its ambitious goals contributed significantly to the development of concurrent logic programming, it ultimately ended in commercial failure. The term "fifth generation" was chosen to emphasize the system's advanced nature. In the history of computing hardware, there had been four prior "generations" of computers: the first generation utilized vacuum tubes; the second, transistors and diodes; the third, integrated circuits; and the fourth, microprocessors. While earlier generations focused on increasing the number of logic elements within a single CPU, it was widely believed at the time that the fifth generation would achieve enhanced performance through the use of massive numbers of CPUs. == Background == In the late 1960s until the early 1970s, there was much talk about "generations" of computer hardware, then usually organized into three generations First generation: Thermionic vacuum tubes. Mid-1940s. IBM pioneered the arrangement of vacuum tubes in pluggable modules. The IBM 650 was a first-generation computer. Second generation: Transistors. 1956. The era of miniaturization begins. Transistors are much smaller than vacuum tubes, draw less power, and generate less heat. Discrete transistors are soldered to circuit boards, with interconnections accomplished by stencil-screened conductive patterns on the reverse side. The IBM 7090 was a second-generation computer. Third generation: Integrated circuits (silicon chips containing multiple transistors). 1964. A pioneering example is the ACPX module used in the IBM 360/91, which, by stacking layers of silicon over a ceramic substrate, accommodated over 20 transistors per chip; the chips could be packed together onto a circuit board to achieve unprecedented logic densities. The IBM 360/91 was a hybrid second and third-generation computer. Omitted from this taxonomy is the "zeroth-generation" computer based on metal gears (such as the IBM 407) or mechanical relays (such as the Mark I), and the post-third-generation computers based on Very Large Scale Integrated (VLSI) circuits. There was also a parallel set of generations for software: First generation: Machine language. Second generation: Low-level programming languages such as Assembly language. Third generation: Structured high-level programming languages such as C, COBOL and FORTRAN. Fourth generation: "Non-procedural" high-level programming languages (such as object-oriented languages). Throughout these multiple generations up to the 1970s, Japan built computers following U.S. and British leads. In the mid-1970s, the Ministry of International Trade and Industry stopped following western leads and started looking into the future of computing on a small scale. They asked the Japan Information Processing Development Center (JIPDEC) to indicate a number of future directions, and in 1979 offered a three-year contract to carry out more in-depth studies along with industry and academia. It was during this period that the term "fifth-generation computer" started to be used. Prior to the 1970s, MITI guidance had successes such as an improved steel industry, the creation of the oil supertanker, the automotive industry, consumer electronics, and computer memory. MITI decided that the future was going to be information technology. However, the Japanese language, particularly in its written form, presented and still presents obstacles for computers. As a result of these hurdles, MITI held a conference to seek assistance from experts. The primary fields for investigation from this initial project were: Inference computer technologies for knowledge processing Computer technologies to process large-scale data bases and knowledge bases High-performance workstations Distributed functional computer technologies Super-computers for scientific calculation == Project launch == The aim was to build parallel computers for artificial intelligence applications using concurrent logic programming. The project imagined an "epoch-making" computer with supercomputer-like performance running on top of large databases (as opposed to a traditional filesystem) using a logic programming language to define and access the data using massively parallel computing/processing. They envisioned building a prototype machine with performance between 100M and 1G LIPS, where a LIPS is a Logical Inference Per Second. At the time typical workstation machines were capable of about 100k LIPS. They proposed to build this machine over a ten-year period, 3 years for initial R&D, 4 years for building various subsystems, and a final 3 years to complete a working prototype system. In 1982 the government decided to go ahead with the project, and established the Institute for New Generation Computer Technology (ICOT) through joint investment with various Japanese computer companies. After the project ended, MITI would consider an investment in a new "sixth generation" project. Ehud Shapiro captured the rationale and motivations driving this project: "As part of Japan's effort to become a leader in the computer industry, the Institute for New Generation Computer Technology has launched a revolutionary ten-year plan for the development of large computer systems which will be applicable to knowledge information processing systems. These Fifth Generation computers will be built around the concepts of logic programming. In order to refute the accusation that Japan exploits knowledge from abroad without contributing any of its own, this project will stimulate original research and will make its results available to the international research community." === Logic programming === The target defined by the FGCS project was to develop "Knowledge Information Processing systems" (roughly meaning, applied Artificial Intelligence). The chosen tool to implement this goal was logic programming. Logic programming approach as was characterized by Maarten Van Emden – one of its founders – as: The use of logic to express information in a computer. The use of logic to present problems to a computer. The use of logical inference to solve these problems. More technically, it can be summed up in two equations: Program = Set of axioms. Computation = Proof of a statement from axioms. The Axioms typically used are universal axioms of a restricted form, called Horn-clauses or definite-clauses. The statement proved in a computation is an existential statement. The proof is constructive, and provides values for the existentially quantified variables: these values constitute the output of the computation. Logic programming was thought of as something that unified various gradients of computer science (software engineering, databases, computer architecture and artificial intelligence). It seemed that logic programming was a key missing connection between knowledge engineering and parallel computer architectures. == Results == After having influenced the consumer electronics field during the 1970s and the automotive world during the 1980s, the Japanese had developed a strong reputation. The launch of the FGCS project spread the belief that parallel computing was the future of all performance gains, producing a wave of apprehension in the computer field. Soon parallel projects were set up in the US as the Strategic Computing Initiative and the Microelectronics and Computer Technology Corporation (MCC), in the UK as Alvey, and in Europe as the European Strategic Program on Research in Information Technology (ESPRIT), as well as the European Computer‐Industry Research Centre (ECRC) in Munich, a collaboration between ICL in Britain, Bull in France, and Siemens in Germany. The project ran from 1982 to 1994, spending a little less than ¥57 billion (about US$320 million) total. After the FGCS Project, MITI stopped funding large-scale computer research projects, and the research momentum developed by the FGCS Project dissipated. However MITI/ICOT embarked on a neural-net project which some called the Sixth Generation Project in the 1990s, with a similar level of funding. Per-year spending was less than 1% of the entire R&D expenditure of the electronics and communications equipment industry. For example, the project's highest expenditure year was 7.2 million yen in 1991, but IBM alone spent 1.5 billion dollars (370 billion yen) in 1982, while the industry spent 2150 billion yen in 1990. === Concurrent logic programming === In 1982, during a visit to the ICOT, Ehud Shapiro invented Concurrent Prolog, a novel programming language t

    Read more →
  • Attribute–value system

    Attribute–value system

    An attribute–value system is a basic knowledge representation framework comprising a table with columns designating "attributes" (also known as "properties", "predicates", "features", "dimensions", "characteristics", "fields", "headers" or "independent variables" depending on the context) and "rows" designating "objects" (also known as "entities", "instances", "exemplars", "elements", "records" or "dependent variables"). Each table cell therefore designates the value (also known as "state") of a particular attribute of a particular object. == Example of attribute–value system == Below is a sample attribute–value system. It represents 10 objects (rows) and five features (columns). In this example, the table contains only integer values. In general, an attribute–value system may contain any kind of data, numeric or otherwise. An attribute–value system is distinguished from a simple "feature list" representation in that each feature in an attribute–value system may possess a range of values (e.g., feature P1 below, which has domain of {0,1,2}), rather than simply being present or absent (Barsalou & Hale 1993). == Other terms used for "attribute–value system" == Attribute–value systems are pervasive throughout many different literatures, and have been discussed under many different names: Flat data Spreadsheet Attribute–value system (Ziarko & Shan 1996) Information system (Pawlak 1981) Classification system (Ziarko 1998) Knowledge representation system (Wong & Ziarko 1986) Information table (Yao & Yao 2002)

    Read more →
  • Structural similarity index measure

    Structural similarity index measure

    The structural similarity index measure (SSIM) is a method for predicting the perceived quality of digital television and cinematic pictures, as well as other kinds of digital images and videos. It is also used for measuring the similarity between two images. The SSIM index is a full reference metric; in other words, the measurement or prediction of image quality is based on an initial uncompressed or distortion-free image as reference. SSIM is a perception-based model that considers image degradation as perceived change in structural information, while also incorporating important perceptual phenomena, including both luminance masking and contrast masking terms. This distinguishes from other techniques such as mean squared error (MSE) or peak signal-to-noise ratio (PSNR) that instead estimate absolute errors. Structural information is the idea that the pixels have strong inter-dependencies especially when they are spatially close. These dependencies carry important information about the structure of the objects in the visual scene. Luminance masking is a phenomenon whereby image distortions (in this context) tend to be less visible in bright regions, while contrast masking is a phenomenon whereby distortions become less visible where there is significant activity or "texture" in the image. == History == The predecessor of SSIM was called Universal Quality Index (UQI), or Wang–Bovik index, which was developed by Zhou Wang and Alan Bovik in 2001. This evolved, through their collaboration with Hamid Sheikh and Eero Simoncelli, into the current version of SSIM, which was published in April 2004 in the IEEE Transactions on Image Processing. In addition to defining the SSIM quality index, the paper provides a general context for developing and evaluating perceptual quality measures, including connections to human visual neurobiology and perception, and direct validation of the index against human subject ratings. The basic model was developed in the Laboratory for Image and Video Engineering (LIVE) at The University of Texas at Austin and further developed jointly with the Laboratory for Computational Vision (LCV) at New York University. Further variants of the model have been developed in the Image and Visual Computing Laboratory at University of Waterloo and have been commercially marketed. SSIM subsequently found strong adoption in the image processing community and in the television and social media industries. The 2004 SSIM paper has been cited over 50,000 times according to Google Scholar, making it one of the highest cited papers in the image processing and video engineering fields. It was recognized with the IEEE Signal Processing Society Best Paper Award for 2009. It also received the IEEE Signal Processing Society Sustained Impact Award for 2016, indicative of a paper having an unusually high impact for at least 10 years following its publication. Because of its high adoption by the television industry, the authors of the original SSIM paper were each accorded a Primetime Engineering Emmy Award in 2015 by the Television Academy. == Algorithm == The SSIM index is calculated between two windows of pixel values x {\displaystyle x} and y {\displaystyle y} of common size, from corresponding locations in two images to be compared. These SSIM values can be aggregated across the full images by averaging or other variations. === Special-case formula === In one simple special case, further explained in the next section, the SSIM measure between x {\displaystyle x} and y {\displaystyle y} is: SSIM ( x , y ) = ( 2 μ x μ y + c 1 ) ( 2 σ x y + c 2 ) ( μ x 2 + μ y 2 + c 1 ) ( σ x 2 + σ y 2 + c 2 ) {\displaystyle {\hbox{SSIM}}(x,y)={\frac {(2\mu _{x}\mu _{y}+c_{1})(2\sigma _{xy}+c_{2})}{(\mu _{x}^{2}+\mu _{y}^{2}+c_{1})(\sigma _{x}^{2}+\sigma _{y}^{2}+c_{2})}}} with: μ x {\displaystyle \mu _{x}} the pixel sample mean of x {\displaystyle x} ; μ y {\displaystyle \mu _{y}} the pixel sample mean of y {\displaystyle y} ; σ x 2 {\displaystyle \sigma _{x}^{2}} the sample variance of x {\displaystyle x} ; σ y 2 {\displaystyle \sigma _{y}^{2}} the sample variance of y {\displaystyle y} ; σ x y {\displaystyle \sigma _{xy}} the sample covariance of x {\displaystyle x} and y {\displaystyle y} ; c 1 = ( k 1 L ) 2 {\displaystyle c_{1}=(k_{1}L)^{2}} , c 2 = ( k 2 L ) 2 {\displaystyle c_{2}=(k_{2}L)^{2}} two variables to stabilize the division with weak denominator; L {\displaystyle L} the dynamic range of the pixel-values (typically this is 2 # b i t s p e r p i x e l − 1 {\displaystyle 2^{\#bits\ per\ pixel}-1} ); k 1 = 0.01 {\displaystyle k_{1}=0.01} and k 2 = 0.03 {\displaystyle k_{2}=0.03} by default. === General formula and components === The SSIM formula is based on three comparison measurements between the samples of x {\displaystyle x} and y {\displaystyle y} : luminance ( l {\displaystyle l} ), contrast ( c {\displaystyle c} ), and structure ( s {\displaystyle s} ). The individual comparison functions are: l ( x , y ) = 2 μ x μ y + c 1 μ x 2 + μ y 2 + c 1 {\displaystyle l(x,y)={\frac {2\mu _{x}\mu _{y}+c_{1}}{\mu _{x}^{2}+\mu _{y}^{2}+c_{1}}}} c ( x , y ) = 2 σ x σ y + c 2 σ x 2 + σ y 2 + c 2 {\displaystyle c(x,y)={\frac {2\sigma _{x}\sigma _{y}+c_{2}}{\sigma _{x}^{2}+\sigma _{y}^{2}+c_{2}}}} s ( x , y ) = σ x y + c 3 σ x σ y + c 3 {\displaystyle s(x,y)={\frac {\sigma _{xy}+c_{3}}{\sigma _{x}\sigma _{y}+c_{3}}}} The SSIM for each block is then a weighted combination of those comparative measures: SSIM ( x , y ) = l ( x , y ) α ⋅ c ( x , y ) β ⋅ s ( x , y ) γ {\displaystyle {\text{SSIM}}(x,y)=l(x,y)^{\alpha }\cdot c(x,y)^{\beta }\cdot s(x,y)^{\gamma }} Choosing the third denominator stabilizing constant as: c 3 = c 2 / 2 {\displaystyle c_{3}=c_{2}/2} leads to a simplification when combining the c and s components with equal exponents ( β = γ {\displaystyle \beta =\gamma } ), as the numerator of c is then twice the denominator of s, leading to a cancellation leaving just a 2. Setting the weights (exponents) α , β , γ {\displaystyle \alpha ,\beta ,\gamma } to 1, the formula can then be reduced to the special case shown above. === Mathematical properties === SSIM satisfies the identity of indiscernibles, and symmetry properties, but not the triangle inequality or non-negativity, and thus is not a distance function. However, under certain conditions, SSIM may be converted to a normalized root MSE measure, which is a distance function. The square of such a function is not convex, but is locally convex and quasiconvex, making SSIM a feasible target for optimization. === Application of the formula === In order to evaluate the image quality, this formula is usually applied only on luma, although it may also be applied on color (e.g., RGB) values or chromatic (e.g. YCbCr) values. The resultant SSIM index is a decimal value between -1 and 1, where 1 indicates perfect similarity, 0 indicates no similarity, and -1 indicates perfect anti-correlation. For an image, it is typically calculated using a sliding Gaussian window of size 11×11 or a block window of size 8×8. The window can be displaced pixel-by-pixel on the image to create an SSIM quality map of the image. In the case of video quality assessment, the authors propose to use only a subgroup of the possible windows to reduce the complexity of the calculation. === Variants === ==== Multi-scale SSIM ==== A more advanced form of SSIM, called Multiscale SSIM (MS-SSIM) is conducted over multiple scales through a process of multiple stages of sub-sampling, reminiscent of multiscale processing in the early vision system. It has been shown to perform equally well or better than SSIM on different subjective image and video databases. ==== Multi-component SSIM ==== Three-component SSIM (3-SSIM) is a form of SSIM that takes into account the fact that the human eye can see differences more precisely on textured or edge regions than on smooth regions. The resulting metric is calculated as a weighted average of SSIM for three categories of regions: edges, textures, and smooth regions. The proposed weighting is 0.5 for edges, 0.25 for the textured and smooth regions. The authors mention that a 1/0/0 weighting (ignoring anything but edge distortions) leads to results that are closer to subjective ratings. This suggests that edge regions play a dominant role in image quality perception. The authors of 3-SSIM have also extended the model into four-component SSIM (4-SSIM). The edge types are further subdivided into preserved and changed edges by their distortion status. The proposed weighting is 0.25 for all four components. ==== Structural dissimilarity ==== Structural dissimilarity (DSSIM) may be derived from SSIM, though it does not constitute a distance function as the triangle inequality is not necessarily satisfied. DSSIM ( x , y ) = 1 − SSIM ( x , y ) 2 {\displaystyle {\hbox{DSSIM}}(x,y)={\frac {1-{\hbox{SSIM}}(x,y)}{2}}} ==== Video quality metrics and temporal variants ==== It is worth noting that the original vers

    Read more →
  • Linear belief function

    Linear belief function

    Linear belief functions are an extension of the Dempster–Shafer theory of belief functions to the case when variables of interest are continuous. Examples of such variables include financial asset prices, portfolio performance, and other antecedent and consequent variables. The theory was originally proposed by Arthur P. Dempster in the context of Kalman Filters and later was elaborated, refined, and applied to knowledge representation in artificial intelligence and decision making in finance and accounting by Liping Liu. == Concept == A linear belief function intends to represent our belief regarding the location of the true value as follows: We are certain that the truth is on a so-called certainty hyperplane but we do not know its exact location; along some dimensions of the certainty hyperplane, we believe the true value could be anywhere from –∞ to +∞ and the probability of being at a particular location is described by a normal distribution; along other dimensions, our knowledge is vacuous, i.e., the true value is somewhere from –∞ to +∞ but the associated probability is unknown. A belief function in general is defined by a mass function over a class of focal elements, which may have nonempty intersections. A linear belief function is a special type of belief function in the sense that its focal elements are exclusive, parallel sub-hyperplanes over the certainty hyperplane and its mass function is a normal distribution across the sub-hyperplanes. Based on the above geometrical description, Shafer and Liu propose two mathematical representations of a LBF: a wide-sense inner product and a linear functional in the variable space, and as their duals over a hyperplane in the sample space. Monney proposes still another structure called Gaussian hints. Although these representations are mathematically neat, they tend to be unsuitable for knowledge representation in expert systems. == Knowledge representation == A linear belief function can represent both logical and probabilistic knowledge for three types of variables: deterministic such as an observable or controllable, random whose distribution is normal, and vacuous on which no knowledge bears. Logical knowledge is represented by linear equations, or geometrically, a certainty hyperplane. Probabilistic knowledge is represented by a normal distribution across all parallel focal elements. In general, assume X is a vector of multiple normal variables with mean μ and covariance Σ. Then, the multivariate normal distribution can be equivalently represented as a moment matrix: M ( X ) = ( μ Σ ) . {\displaystyle M(X)=\left({\begin{array}{{20}c}\mu \\\Sigma \end{array}}\right).} If the distribution is non-degenerate, i.e., Σ has a full rank and its inverse exists, the moment matrix can be fully swept: M ( X → ) = ( μ Σ − 1 − Σ − 1 ) {\displaystyle M({\vec {X}})=\left({\begin{array}{{20}c}\mu \Sigma ^{-1}\\-\Sigma ^{-1}\end{array}}\right)} Except for normalization constant, the above equation completely determines the normal density function for X. Therefore, M ( X → ) {\displaystyle M({\vec {X}})} represents the probability distribution of X in the potential form. These two simple matrices allow us to represent three special cases of linear belief functions. First, for an ordinary normal probability distribution M(X) represents it. Second, suppose one makes a direct observation on X and obtains a value μ. In this case, since there is no uncertainty, both variance and covariance vanish, i.e., Σ = 0. Thus, a direct observation can be represented as: M ( X ) = ( μ 0 ) {\displaystyle M(X)=\left({\begin{array}{{20}c}\mu \\0\end{array}}\right)} Third, suppose one is completely ignorant about X. This is a very thorny case in Bayesian statistics since the density function does not exist. By using the fully swept moment matrix, we represent the vacuous linear belief functions as a zero matrix in the swept form follows: M ( X → ) = [ 0 0 ] {\displaystyle M({\vec {X}})=\left[{\begin{array}{{20}c}0\\0\end{array}}\right]} One way to understand the representation is to imagine complete ignorance as the limiting case when the variance of X approaches to ∞, where one can show that Σ−1 = 0 and hence M ( X → ) {\displaystyle M({\vec {X}})} vanishes. However, the above equation is not the same as an improper prior or normal distribution with infinite variance. In fact, it does not correspond to any unique probability distribution. For this reason, a better way is to understand the vacuous linear belief functions as the neutral element for combination (see later). To represent the remaining three special cases, we need the concept of partial sweeping. Unlike a full sweeping, a partial sweeping is a transformation on a subset of variables. Suppose X and Y are two vectors of normal variables with the joint moment matrix: M ( X , Y ) = [ μ 1 Σ 11 Σ 21 μ 2 Σ 12 Σ 22 ] {\displaystyle M(X,Y)=\left[{\begin{array}{{20}c}{\begin{array}{{20}c}\mu _{1}\\\Sigma _{11}\\\Sigma _{21}\end{array}}&{\begin{array}{{20}c}\mu _{2}\\\Sigma _{12}\\\Sigma _{22}\end{array}}\end{array}}\right]} Then M(X, Y) may be partially swept. For example, we can define the partial sweeping on X as follows: M ( X → , Y ) = [ μ 1 ( Σ 11 ) − 1 − ( Σ 11 ) − 1 Σ 21 ( Σ 11 ) − 1 μ 2 − μ 1 ( Σ 11 ) − 1 Σ 12 ( Σ 11 ) − 1 Σ 12 Σ 22 − Σ 21 ( Σ 11 ) − 1 Σ 12 ] {\displaystyle M({\vec {X}},Y)=\left[{\begin{array}{{20}c}{\begin{array}{{20}c}\mu _{1}(\Sigma _{11})^{-1}\\-(\Sigma _{11})^{-1}\\\Sigma _{21}(\Sigma _{11})^{-1}\end{array}}&{\begin{array}{{20}c}\mu _{2}-\mu _{1}(\Sigma _{11})^{-1}\Sigma _{12}\\(\Sigma _{11})^{-1}\Sigma _{12}\\\Sigma _{22}-\Sigma _{21}(\Sigma _{11})^{-1}\Sigma _{12}\end{array}}\end{array}}\right]} If X is one-dimensional, a partial sweeping replaces the variance of X by its negative inverse and multiplies the inverse with other elements. If X is multidimensional, the operation involves the inverse of the covariance matrix of X and other multiplications. A swept matrix obtained from a partial sweeping on a subset of variables can be equivalently obtained by a sequence of partial sweepings on each individual variable in the subset and the order of the sequence does not matter. Similarly, a fully swept matrix is the result of partial sweepings on all variables. We can make two observations. First, after the partial sweeping on X, the mean vector and covariance matrix of X are respectively μ 1 ( Σ 11 ) − 1 {\displaystyle \mu _{1}(\Sigma _{11})^{-1}} and − ( Σ 11 ) − 1 {\displaystyle -(\Sigma _{11})^{-1}} , which are the same as that of a full sweeping of the marginal moment matrix of X. Thus, the elements corresponding to X in the above partial sweeping equation represent the marginal distribution of X in potential form. Second, according to statistics, μ 2 − μ 1 ( Σ 11 ) − 1 Σ 12 {\displaystyle \mu _{2}-\mu _{1}(\Sigma _{11})^{-1}\Sigma _{12}} is the conditional mean of Y given X = 0; Σ 22 − Σ 21 ( Σ 11 ) − 1 Σ 12 {\displaystyle \Sigma _{22}-\Sigma _{21}(\Sigma _{11})^{-1}\Sigma _{12}} is the conditional covariance matrix of Y given X = 0; and ( Σ 11 ) − 1 Σ 12 {\displaystyle (\Sigma _{11})^{-1}\Sigma _{12}} is the slope of the regression model of Y on X. Therefore, the elements corresponding to Y indices and the intersection of X and Y in M ( X → , Y ) {\displaystyle M({\vec {X}},Y)} represents the conditional distribution of Y given X = 0. These semantics render the partial sweeping operation a useful method for manipulating multivariate normal distributions. They also form the basis of the moment matrix representations for the three remaining important cases of linear belief functions, including proper belief functions, linear equations, and linear regression models. === Proper linear belief functions === For variables X and Y, assume there exists a piece of evidence justifying a normal distribution for variables Y while bearing no opinions for variables X. Also, assume that X and Y are not perfectly linearly related, i.e., their correlation is less than 1. This case involves a mix of an ordinary normal distribution for Y and a vacuous belief function for X. Thus, we represent it using a partially swept matrix as follows: M ( X → , Y ) = [ 0 0 0 μ 2 0 Σ 22 ] {\displaystyle M({\vec {X}},Y)=\left[{\begin{array}{{20}c}{\begin{array}{{20}c}0\\0\\0\end{array}}&{\begin{array}{{20}c}\mu _{2}\\0\\\Sigma _{22}\\\end{array}}\end{array}}\right]} This is how we could understand the representation. Since we are ignorant on X, we use its swept form and set μ 1 ( Σ 11 ) − 1 = 0 {\displaystyle \mu _{1}(\Sigma _{11})^{-1}=0} and − ( Σ 11 ) − 1 = 0 {\displaystyle -(\Sigma _{11})^{-1}=0} . Since the correlation between X and Y is less than 1, the regression coefficient of X on Y approaches to 0 when the variance of X approaches to ∞. Therefore, ( Σ 11 ) − 1 Σ 12 = 0 {\displaystyle (\Sigma _{11})^{-1}\Sigma _{12}=0} . Similarly, one can prove that μ 1 ( Σ 11 ) − 1 Σ 12 = 0 {\displaystyle \mu _{1}(\Sigma _{11})^{-1}\Sigma _{12}=0} and Σ 21 ( Σ 11 ) −

    Read more →
  • Interim Measures for the Management of Anthropomorphic AI Interactive Services

    Interim Measures for the Management of Anthropomorphic AI Interactive Services

    The Interim Measures for the Management of Anthropomorphic AI Interactive Services (Chinese: 人工智能拟人化互动服务管理暂行办法) is a document proposed by the Cyberspace Administration of China to regulate anthropomorphic artificial intelligence systems. The draft was released on December 27, 2026 for public comment period until January 25, 2026. The proposed document would prohibit AI companies and users of AI services from generating certain types of content deemed harmful to national interests or the social order, and impose various regulatory and safety requirements on providers of AI systems. The proposed regulation is motivated by concerns about the psychological and social effects of AI systems that are perceived as personalities by their users, including addiction, encouragement of self-harm, or generation of illegal content. == Description == === Scope === The regulation would apply to AI systems that are offered to the general public within China. They would not apply to company-internal or research use, or to products that are only available outside of China. For the purpose of the regulation, anthropomorphic Ai systems are defined as those that "simulate human personality traits, modes of thinking, and communication styles, and that engage in emotional interaction with humans through text, images, audio, video, or other means". === Requirements === The regulation would require AI providers to monitor users for signs of harmful use and to take various interventions when indications of harmful use are detected. It would also prohibit AI systems from certain types of behaviors and generation of certain types of content. In some circumstances where a user appears to be at risk of self harm, the system would be required to hand over control to a human operator who would manually intervene. The regulation would also require more rigorous practices for managing the provenance of training data used to develop these systems, and would require explicit opt-in consent from users before their interactions with an AI system were used as training data. Data used to train the regulated systems would be required to reflect core socialist values and traditional Chinese culture.

    Read more →
  • Repertory grid

    Repertory grid

    The repertory grid is an interviewing technique which uses nonparametric factor analysis to determine an idiographic measure of personality. It was devised by George Kelly in around 1955 and is based on his personal construct theory of personality. == Introduction == The repertory grid is a technique for identifying the ways that a person construes (interprets or gives meaning to) his or her experience. It provides information from which inferences about personality can be made, but it is not a personality test in the conventional sense. It is underpinned by the personal construct theory developed by George Kelly, first published in 1955. A grid consists of four parts: A topic: it is about some part of the person's experience. A set of elements, which are examples or instances of the topic. Working as a clinical psychologist, Kelly was interested in how his clients construed people in the roles they adopted towards the client, and so, originally, such terms as "my father", "my mother", "an admired friend" and so forth were used. Since then, the grid has been used in much wider settings (educational, occupational, organisational) and so any well-defined set of words, phrases, or even brief behavioral vignettes can be used as elements. For example, to see how a person construes the purchase of a car, a list of vehicles within that person's price range could be a set of elements. A set of constructs. These are the basic terms that the client uses to make sense of the elements, and are always expressed as a contrast. Thus the meaning of "good" depends on whether you intend to say "good versus poor", as if you were construing a theatrical performance, or "good versus evil", as if you were construing the moral or ontological status of some more fundamental experience. A set of ratings of elements on constructs. Each element is positioned between the two extremes of the construct using a 5- or 7-point rating scale system; this is done repeatedly for all the constructs that apply; and thus its meaning to the client is modeled, and statistical analysis varying from simple counting, to more complex multivariate analysis of meaning, is made possible. Constructs are regarded as personal to the client, who is psychologically similar to other people depending on the extent to which they would tend to use similar constructs, and similar ratings, in relating to a particular set of elements. The client is asked to consider the elements three at a time, and to identify a way in which two of the elements might be seen as alike, but distinct from, contrasted to, the third. For example, in considering a set of people as part of a topic dealing with personal relationships, a client might say that the element "my father" and the element "my boss" are similar because they are both fairly tense individuals, whereas the element "my wife" is different because she is "relaxed". And so we identify one construct that the individual uses when thinking about people: whether they are "tense as distinct from relaxed". In practice, good grid interview technique would delve a little deeper and identify some more behaviorally explicit description of "tense versus relaxed". All the elements are rated on the construct, further triads of elements are compared and further constructs elicited, and the interview would continue until no further constructs are obtained. == Using the repertory grid == Careful interviewing to identify what the individual means by the words initially proposed, using a 5-point rating system could be used to characterize the way in which a group of fellow-employees are viewed on the construct "keen and committed versus energies elsewhere", a 1 indicating that the left pole of the construct applies ("keen and committed") and a 5 indicating that the right pole of the construct applies ("energies elsewhere"). On being asked to rate all of the elements, our interviewee might reply that Tom merits a 2 (fairly keen and committed), Mary a 1 (very keen and committed), and Peter a 5 (his energies are very much outside the place of employment). The remaining elements (another five people, for example) are then rated on this construct. Typically (and depending on the topic) people have a limited number of genuinely different constructs for any one topic: 6 to 16 are common when they talk about their job or their occupation, for example. The richness of people's meaning structures comes from the many different ways in which a limited number of constructs can be applied to individual elements. A person may indicate that Tom is fairly keen, very experienced, lacks social skills, is a good technical supervisor, can be trusted to follow complex instructions accurately, has no sense of humour, will always return a favour but only sometimes help his co-workers, while Mary is very keen, fairly experienced, has good social and technical supervisory skills, needs complex instructions explained to her, appreciates a joke, always returns favours, and is very helpful to her co-workers: these are two very different and complex pictures, using just 8 constructs about a person's co-workers. Important information can be obtained by including self-elements such as "Myself as I am now"; "Myself as I would like to be" among other elements, where the topic permits. == Analysis of results == A single grid can be analysed for both content (eyeball inspection) and structure (cluster analysis, principal component analysis, and a variety of structural indices relating to the complexity and range of the ratings being the chief techniques used). Sets of grids are dealt with using one or other of a variety of content analysis techniques. A range of associated techniques can be used to provide precise, operationally defined expressions of an interviewee's constructs, or a detailed expression of the interviewee's personal values, and all of these techniques are used in a collaborative way. The repertory grid is emphatically not a standardized "psychological test"; it is an exercise in the mutual negotiation of a person's meanings. The repertory grid has found favour among both academics and practitioners in a great variety of fields because it provides a way of describing people's construct systems (loosely, understanding people's perceptions) without prejudging the terms of reference—a kind of personalized grounded theory. Unlike a conventional rating-scale questionnaire, it is not the investigator but the interviewee who provides the constructs on which a topic is rated. Market researchers, trainers, teachers, guidance counsellors, new product developers, sports scientists, and knowledge capture specialists are among the users who find the technique (originally developed for use in clinical psychology) helpful. == Relationship to other tools == In the book Personal Construct Methodology, researchers Brian R. Gaines and Mildred L.G. Shaw noted that they "have also found concept mapping and semantic network tools to be complementary to repertory grid tools and generally use both in most studies" but that they "see less use of network representations in PCP [personal construct psychology] studies than is appropriate". They encouraged practitioners to use semantic network techniques in addition to the repertory grid.

    Read more →
  • Optical sorting

    Optical sorting

    Optical sorting (sometimes called digital sorting) is the automated process of sorting solid products using cameras and/or lasers. Depending on the types of sensors used and the software-driven intelligence of the image processing system, optical sorters can recognize an object's color, size, shape, structural properties and chemical composition. The sorter compares objects to user-defined accept/reject criteria to identify and remove defective products and foreign material (FM) from the production line, or to separate product of different grades or types of materials. Optical sorters are in widespread use in the food industry worldwide, with the highest adoption in processing harvested foods such as potatoes, fruits, vegetables and nuts where it achieves non-destructive, 100 percent inspection in-line at full production volumes. The technology is also used in pharmaceutical manufacturing and nutraceutical manufacturing, tobacco processing, waste recycling and other industries. Compared to manual sorting, which is subjective and inconsistent, optical sorting helps improve product quality, maximize throughput and increase yields while reducing labor costs. == History == Optical sorting is an idea that first came out of the desire to automate industrial sorting of agricultural goods like fruits and vegetables. Before automated optical sorting technology was conceived in the 1930s, companies like Unitec were producing wooden machinery to assist in the mechanical sorting of fruit processing. In 1931, a company known as “the Electric Sorting Company” was incorporated and began the creation of the world’s first color sorters, which were being installed and used in Michigan’s bean industry by 1932. In 1937, optical sorting technology had advanced to allow for systems based on a two-color principle of selection. The next few decades saw the installation of new and improved sorting mechanisms, like gravity feed systems and the implementation of optical sorting in more agricultural industries. In the late 1960s, optical sorting began to be implemented to new industries beyond agriculture, like the sorting of ferrous and non-ferrous metals. By the 1990s, optical sorting was being used heavily in the sorting of solid wastes. With the large technological revolution happening in the late 1990s and early 2000s, optical sorters were being made more efficient via the implementation of new optical sensors, like CCD, UV, and IR cameras. Today, optical sorting is used in a wide variety of industries and, as such, is implemented with a varying selection of mechanisms to assist in that specific sorter’s task. == The sorting system == In general, optical sorters feature four major components: the feed system, the optical system, image processing software, and the separation system. The objective of the feed system is to spread products into a uniform monolayer so products are presented to the optical system evenly, without clumps, at a constant velocity. The optical system includes lights and sensors housed above and/or below the flow of the objects being inspected. The image processing system compares objects to user-defined accept/reject thresholds to classify objects and actuate the separation system. The separation system — usually compressed air for small products and mechanical devices for larger products, like whole potatoes — pinpoints objects while in-air and deflects the objects to remove into a reject chute while the good product continues along its normal trajectory. The ideal sorter to use depends on the application. Therefore, the product's characteristics and the user's objectives determine the ideal sensors, software-driven capabilities and mechanical platform. == Sensors == Optical sorters require a combination of lights and sensors to illuminate and capture images of the objects so the images can be processed. The processed images will determine if the material should be accepted or rejected. There are camera sorters, laser sorters and sorters that feature a combination of the two on one platform. Lights, cameras, lasers and laser sensors can be designed to function within visible light wavelengths as well as the infrared (IR) and ultraviolet (UV) spectrums. The optimal wavelengths for each application maximize the contrast between the objects to be separated. Cameras and laser sensors can differ in spatial resolution, with higher resolutions enabling the sorter to detect and remove smaller defects. === Cameras === Monochromatic cameras detect shades of gray from black to white and can be effective when sorting products with high-contrast defects. Sophisticated color cameras with high color resolution are capable of detecting millions of colors to better distinguish more subtle color defects. Trichromatic color cameras (also called three-channel cameras) divide light into three bands, which can include red, green and/or blue within the visible spectrum as well as IR and UV. The interaction of different materials with parts of the electromagnetic spectrum make these contrasts more evident than how they appear to the naked human eye. Coupled with intelligent software, sorters that feature cameras are capable of recognizing each object's color, size and shape; as well as the color, size, shape and location of a defect on a product. Some intelligent sorters even allow the user to define a defective product based on the total defective surface area of any given object. === Lasers === While cameras capture product information based primarily on material reflectance, lasers and their sensors are able to distinguish a material's structural properties along with their color. This structural property inspection allows lasers to detect a wide range of organic and inorganic foreign material such as insects, glass, metal, sticks, rocks and plastic; even if they are the same color as the good product. Lasers can be designed to operate within specific wavelengths of light; whether on the visible spectrum or beyond. For example, lasers can detect chlorophyll by stimulating fluorescence using specific wavelengths; which is a process that is very effective for removing foreign material from green vegetables. === Camera/laser combinations === Sorters equipped with cameras and lasers on one platform are generally capable of identifying the widest variety of attributes. Cameras are often better at recognizing color, size and shape while laser sensors identify differences in structural properties to maximize foreign material detection and removal. === Hyperspectral Imaging === Driven by the need to solve previously impossible sorting challenges, a new generation of sorters that feature multispectral and hyperspectral imaging Optical Sorters. Like trichromatic cameras, multispectral and hyperspectral cameras collect data from the electromagnetic spectrum. Unlike trichromatic cameras, which divide light into three bands, hyperspectral systems can divide light into hundreds of narrow bands over a continuous range that covers a vast portion of the electromagnetic spectrum. This opens the door for more detailed analysis that leads to a more consistent product. Using IR alone might detect some defects, but combining it with a broader range of the spectrum makes it more effective. Compared to the three data points per pixel collected by trichromatic cameras, hyperspectral cameras can collect hundreds of data points per pixel, which are combined to create a unique spectral signature (also called a fingerprint) for each object. When complemented by capable software intelligence, a hyperspectral sorter processes those fingerprints to enable sorting on the chemical composition of the product. This is an emerging area of chemometrics. == Software-driven intelligence == Once the sensors capture the object's response to the energy source, image processing is used to manipulate the raw data. The image processing extracts and categorizes information about specific features. The user then defines accept/reject thresholds that are used to determine what is good and bad in the raw data flow. The art and science of image processing lies in developing algorithms that maximize the effectiveness of the sorter while presenting a simple user-interface to the operator. Object-based recognition is a classic example of software-driven intelligence. It allows the user to define a defective product based on where a defect lies on the product and/or the total defective surface area of an object. It offers more control in defining a wider range of defective products. When used to control the sorter's ejection system, it can improve the accuracy of ejecting defective products. This improves product quality and increases yields. New software-driven capabilities are constantly being developed to address the specific needs of various applications. As computing hardware becomes more powerful, new software-driven advancements become possible. Some of these advancements enhance the effectivene

    Read more →
  • Geopolitical ontology

    Geopolitical ontology

    The FAO geopolitical ontology is an ontology developed by the Food and Agriculture Organization of the United Nations (FAO) to describe, manage and exchange data related to geopolitical entities such as countries, territories, regions and other similar areas. == Definitions and examples == An ontology is a kind of dictionary that describes information in a certain domain using concepts and relationships. It is often implemented using OWL (Web Ontology Language), an XML-based standard language that can be interpreted by computers. A Concept is defined as abstract knowledge. For example, in the geopolitical ontology a non-self-governing territory and a geographical group are concepts. Concepts are explicitly implemented in the ontology with individuals and classes: An individual is defined as an object perceived from the real world. In the geopolitical domain Ethiopia and the least developed countries group are individuals. A class is defined as a set of individuals sharing common properties. In the geopolitical domain, Ethiopia, Republic of Korea and Italy are individuals of the class self-governing territory; and least developed countries is an individual of the class special group. Relationships between concepts are explicitly implemented by: Object properties between individuals of two classes. For example, has member and is in group properties, as shown in Figure 1. Datatype properties between individuals and literals or XML datatypes. For example, the individual Afghanistan has the datatype property CodeISO3 with the value "AFG". Restrictions in classes and/or properties. For example, the property official English name of the class self-governing territory has been restricted to have only one value, this means that a self-governing territory (or country) can only have one internationally recognized official English name. The advantage of describing information in an ontology is that it enables to acquire domain knowledge by defining hierarchical structures of classes, adding individuals, setting object properties and datatype properties, and assigning restrictions. == FAO ontology == The geopolitical ontology provides names in seven languages (Arabic, Chinese, French, English, Spanish, Russian and Italian) and identifiers in various international coding systems (ISO2, ISO3, AGROVOC, FAOSTAT, FAOTERM, GAUL, UN, UNDP and DBPediaID codes) for territories and groups. Moreover, the FAO geopolitical ontology tracks historical changes from 1985 up until today; provides geolocation (geographical coordinates); implements relationships among countries and countries, or countries and groups, including properties such as has border with, is predecessor of, is successor of, is administered by, has members, and is in group; and disseminates country statistics including country area, land area, agricultural area, GDP or population. The FAO geopolitical ontology provides a structured description of data sources. This includes: source name, source identifier, source creator and source's update date. Concepts are described using the Dublin Core vocabulary In summary, the main objectives of the FAO geopolitical ontology are: To provide the most updated geopolitical information (names, codes, relationships, statistics) To track historical changes in geopolitical information To improve information management and facilitate standardized data sharing of geopolitical information To demonstrate the benefits of the geopolitical ontology to improve interoperability of corporate information systems It is possible to download the FAO geopolitical ontology in OWL and RDF formats. Documentation is available in the FAO Country Profiles Geopolitical information web page. == Features of the FAO ontology == The geopolitical ontology contains : Area types: Territories: self-governing, non-self-governing, disputed, other. Groups: organizations, geographic, economic and special groups. Names (official, short and names for lists) in Arabic, Chinese, English, French, Spanish, Russian and Italian. International codes: UN code – M49, ISO 3166 Alpha-2 and Alpha-3, UNDP code, GAUL code, FAOSTAT, AGROVOC FAOTERM and DBPediaID. Coordinates: maximum latitude, minimum latitude, maximum longitude, minimum longitude. Basic country statistics: country area, land area, agricultural area, GDP, population. Currency names and codes. Adjectives of nationality. Relations: Groups membership. Neighbours (land border), administration of non-self-governing. Historic changes: predecessor, successor, valid since, valid until. == Implementation into OWL == The FAO geopolitical ontology is implemented in OWL. It consists of classes, properties, individuals and restrictions. Table 1 shows all classes, gives a brief description and lists some individuals that belong to each class. Note that the current version of the geopolitical ontology does not provide individuals of the class "disputed" territories. Table 2 and Table 3 illustrate datatype properties and object properties. == Geopolitical ontology in Linked Open Data == The FAO Geopolitical ontology is embracing the W3C Linked Open Data (LOD) initiative and released its RDF version of the geopolitical ontology in March 2011. The term 'Linked Open Data' refers to a set of best practices for publishing and connecting structured data on the Web. The key technologies that support Linked Data are URIs, HTTP and RDF. The RDF version of the geopolitical ontology is compliant with all Linked data principles to be included in the Linked Open Data cloud, as explained in the following. == Resolvable http:// URIs == Every resource in the OWL format of the FAO Geopolitical Ontology has a unique URI. Dereferenciation was implemented to allow for three different URIs to be assigned to each resource as follows: URI identifying the non-information resource Information resource with an RDF/XML representation Information resource with an HTML representation In addition the current URIs used for OWL format needed to be kept to allow for backwards compatibility for other systems that are using them. Therefore, the new URIs for the FAO Geopolitical Ontology in LOD were carefully created, using “Cool URIs for Semantic Web” and considering other good practices for URIs, such as DBpedia URIs. == New URIs == The URIs of the geopolitical ontology need to be permanent, consequently all transient information, such as year, version, or format was avoided in the definition of the URIs. The new URIs can be accessed For example, for the resource “Italy” the URIs are the following: http://www.fao.org/countryprofiles/geoinfo/geopolitical/resource/Italy identifies the non-information resource. http://www.fao.org/countryprofiles/geoinfo/geopolitical/data/Italy identifies the resource with an RDF/XML representation. http://www.fao.org/countryprofiles/geoinfo/geopolitical/page/Italy identifies the information resource with an HTML representation. In addition, “owl:sameAs” is used to map the new URIs to the OWL representation. == Dereferencing URIs == When a non-information resource is looked up without any specific representation format, then the server needs to redirect the request to information resource with an HTML representation. For example, to retrieve the resource “Italy”, which is a non-information resource, the server redirects to the HTML page of “Italy”. == At least 1000 triples in the datasets == The total number of triple statements in FAO Geopolitical Ontology is 22,495. At least 50 links to a dataset already in the current LOD Cloud: FAO Geopolitical Ontology has 195 links to DBpedia, which is already part of the LOD Cloud. == Access to the entire dataset == FAO Geopolitical Ontology provides the entire dataset as a RDF dump. The RDF version of the FAO Geopolitical Ontology has been already registered in CKAN and it was requested to add it into the LOD Cloud. == Example of use == The FAO Country Profiles is an information retrieval tool which groups the FAO's vast archive of information on its global activities in agriculture and rural development in one single area and catalogues it exclusively by country. The FAO Country Profiles system provides access to country-based heterogeneous data sources. By using the geopolitical ontology in the system, the following benefits are expected: Enhanced system functionality for content aggregation and synchronization from the multiple source repositories. Improved information access and browsing through comparison of data in neighbor countries and groups. Figure 3 shows a page in the FAO Country Profiles where the geopolitical ontology is described.

    Read more →
  • 4E cognition

    4E cognition

    4E cognition refers to a group of theories in (the philosophy of) cognitive science that challenge traditional views of the mind as something that happens only inside the brain. The four Es stand for: embodied, meaning that a brain is found in and, more importantly, vitally interconnected with a larger physical/biological body; embedded, which refers to the limitations placed on the body by the external environment and laws of nature; extended, which argues that the mind is supplemented and even enhanced by the exterior world (e.g., writing, a calculator, etc.); and enactive, which is the argument that without dynamic processes, actions that require reactions, the mind would be ineffectual. It could be argued that the four Es are compounding extensions of cognition or the mind, being part of a body that is, in turn, part of an environment which limits it but also allows for certain extensions, all of which require dynamic actions and reactions. == History == Ideas of embodied cognition, or rather the idea that our physical bodies play a crucial role in our decision making, can be traced back as far as Plato's dialogues and Aristotelian thought. It was, however, in the twentieth century that this debate began to resemble the current discussion, fueled by disagreements between cognitivists and behaviourists. Tensions within cognitivism, as well as the increasing popularity of neurobiology, led, on the one side, to a predominant focus on internal, cognitive processes while neglecting environmental factors, which in turn caused a push-back fuelling our modern understanding of embodied cognition. The term 4E cognition is hard to trace back to its first use, however, some sources attribute it to Shaun Gallagher and the conference on 4E cognition he organised in 2007, while others indicate the term to be first used in 2006 at an 'Embodied mind workshop' at Cardiff University that Gallagher attended. Embodiment or embodied cognition arguably presents the bridge between cognitivism and 4E cognition as the embodiment of cognitive function provides the necessary conditions for embeddedness, enactedness, and extendedness to connect to cognition. 4E cognition was and is heavily influenced by phenomenology. The ideas are still rather fragmented in nature due to their four main components, which can not be neatly divided, causing conceptual questions of internal boundary concepts. As a young field, it is held back both by its fragmented nature and a relative lack of critical evaluations. It is important to acknowledge that 4E cognition, though young, is a broad field containing and combining several different theoretical perspectives that conflict with one another to varying degrees. The somewhat convoluted and competing nature of the theories that can be grouped as 4E cognition, as well as the field's relative youth, make it difficult to put together an exhaustive history beyond the history of its four main theoretical pillars: embodiment, embeddedness, extendedness, and enactedness. == Importance and core tenets of 4E == If there are separate theories of cognition (e.g., embodied, extended, etc.), why group them under this umbrella, causing important epistemological and especially ontological dilemmas? Notably, other theories of 'non-traditional' cognition are not included under the 4E umbrella. The four E's in 4E cognition importantly all reject, or at a minimum draw into question, some of the core tenets of traditional cognitivism. Importantly, 4E cognition is seen as deindividualizing cognition to some extent, allowing for a broader examination of the interplay of personal, social, political, and ethical aspects that shape human cognition. This can be compared to advancements in the field of epigenetics, which have allowed for a broader examination of environmental (both natural and social) factors and their influence on what had previously only been subject to genetic theorizing. In a similar vein, 4E cognition might also help ground cognition in evolutionary theory by extending cognition to a biological account subject to development over time by means of evolution. Overall, the importance of the extension that is 4E cognition aims to reexamine ideas of a self-centered view of cognition, advocating for a more holistic approach. Ideally, this would allow us to reconsider ideas of justice and individual rights and responsibilities that take into account a more nuanced understanding of the relations between people and their context, balancing self-agency with factors beyond it. === Conceptual differences from cognitive psychology === According to the traditional teachings of cognitive psychology, cognition is a type of information processing based on representational mental structures. This idea, as the name suggests, was heavily influenced by computer science. In this light, the brain is a kind of central processing unit that organises and directs all else. The classical cognitivist view draws a strong boundary between 'the internal' and 'the external', where cognition is solely a subject of 'the internal' realm. The four E's, however, break down this boundary. Cognition can not reside solely within the confines of our heads if it is also embodied, embedded, enacted, and extended. In a way, 4E cognition is interested in the extracranial processes affecting cognition. == From embodied cognition to 4E cognition == === The strong and the weak view === ==== Embodied cognition ==== Broadly speaking, there is a strong and a weak perspective of embodied cognition in 4E cognition. The weak understanding refers to mental processes being causally dependent on extracranial processes. This essentially means that there is a cause and effect or action-reaction relationship between the mind and the body and its environment, etc. The strong perspective views extracranial processes as a (partial) constitutive aspect of cognition. An example here could be using a calculator to solve math problems. The calculator is not part of your brain or mind, but it supports your cognitive processes. === Extracranial processes: bodily or extrabodily === In addition to the weak and the strong reading of 4E cognition, there is also the distinction between bodily and extrabodily extracranial processes. Bodily extracranial processes refer to processes within the body, e.g., sensory perception. Extrabodily extracranial processes refer to processes outside of the body, like the aforementioned calculator example. === Four claims of embodied cognition === ==== Embedded and extended cognition ==== When combining the weak/strong reading of embodied cognition and bodily/extrabodily extracranial process, four claims about embodied cognition emerge: strongly embodied and bodily processes strongly embodied and extrabodily processes weakly embodied and bodily processes weakly embodied and extrabodily processes The first and third claims signify a strong and a weak reading of embodied cognition in the more classical sense. The second claim fits almost perfectly with embedded cognition. Claim two is most compatible with extended cognition. ==== Enacted cognition ==== Finally, enacted cognition refers to cognition being connected to active interaction between a conscious agent and their environment. Here, too, there can be a weak and a strong reading. == Criticisms == Given the divided nature of the field, much criticism surrounding the lack of unity within the field has emerged. In particular, the claims of embodied cognition centering around the body appear to conflict with the tenets of extended cognition, which also appear to conflict with the body/environment distinction that is central to enactivism. Some theoreticians argue that the umbrella of 4E theories is still lacking a common language that might bridge the gaps between the theories that constitute it. There is also the concern that the grouping of such variable theories results in an important loss of nuance and complexity, which is a part of human cognition. Another concern raised is the "dogma of harmony". The criticism contained there regards the notion that within 4E theorizing, there is generally an optimistic and harmonic expectation of the extension between humans and their technologies, ignoring the possibility of those extensions detracting from cognition in some way rather than adding to it. Recent attempts to incorporate embodied cognitive neuroscience have been argued to hold the potential to resolve internal issues within 4E cognition. Overall, a concern often voiced regarding 4E cognition is that its proponents are at best only vaguely interested in cognition. More broadly, this concern reflects the arguably too distracted nature of this emerging field.

    Read more →
  • Yann LeCun

    Yann LeCun

    Yann André Le Cun ( lə-KUN; French: [ləkœ̃]; usually spelled LeCun; born 8 July 1960) is a French-American computer scientist working in the fields of artificial intelligence, machine learning, computer vision, robotics and image compression. He is the Jacob T. Schwartz Professor of Computer Science at the Courant Institute of Mathematical Sciences at New York University. He served as Chief AI Scientist at Meta Platforms before co-founding Advanced Machine Intelligence Labs in December 2025. He is well known for his work on optical character recognition and computer vision using convolutional neural networks (CNNs). He is also one of the main creators of the DjVu image compression technology, alongside Léon Bottou and Patrick Haffner. He co-developed the Lush programming language with Léon Bottou. In 2018, LeCun, Yoshua Bengio, and Geoffrey Hinton received the Turing Award from the Association for Computing Machinery (ACM) for their work on deep learning. LeCun, Bengio, and Hinton, and occasionally Jürgen Schmidhuber, are sometimes referred to as the "Godfathers of AI" and "Godfathers of Deep Learning". == Early life and education == Yann André Le Cun was born on 8 July 1960 at Soisy-sous-Montmorency, in the suburbs of Paris. His surname, Le Cun, derives from the old Breton form Le Cunff and originates from the region of Guingamp in northern Brittany. Yann is the Breton form of Jean, the French form of John. He received a Diplôme d'Ingénieur from the ESIEE Paris in 1983 and a PhD in computer science from Université Pierre et Marie Curie (now Sorbonne University) in 1987, during which he proposed an early form of backpropagation, an algorithm crucial for enabling neural networks to learn. Before joining AT&T, LeCun was a postdoctoral researcher for a year, starting in 1987, supervised by Geoffrey Hinton at the University of Toronto. LeCun has three sons, and his brother is employed by Google. He has American citizenship. == Career and research == LeCun's career has been spent primarily at Bell Labs, New York University and Meta Platforms, Inc. === Bell Labs === In 1988, LeCun joined the Adaptive Systems Research Department at AT&T Bell Laboratories in Holmdel, New Jersey, United States, headed by Lawrence D. Jackel, where he developed a number of new machine learning methods, such as a biologically inspired model of image recognition called convolutional neural networks (LeNet), the "Optimal Brain Damage" regularization methods, and the Graph Transformer Networks method (similar to conditional random field), which he applied to handwriting recognition and Optical character recognition (OCR). The bank check recognition system that he helped develop was widely deployed by NCR and other companies. In 1996, he joined AT&T Labs-Research as head of the Image Processing Research Department, which was part of Lawrence Rabiner's Speech and Image Processing Research Lab, and worked primarily on the DjVu image compression technology, a format designed for efficient distribution of scanned documents, and used by the Internet Archive to provide access to digitized texts. His collaborators at AT&T include Léon Bottou and Vladimir Vapnik. === New York University === After a brief tenure as a fellow of NEC Research Institute, LeCun joined New York University in 2003, where he is Jacob T. Schwartz Chaired Professor of Computer Science and Neural Science at the Courant Institute of Mathematical Sciences and the Center for Neural Science. At NYU, he has worked primarily on energy-based models for supervised and unsupervised learning, feature learning for object recognition in computer vision, and mobile robotics. In 2012, he became the founding director of the NYU Center for Data Science. On 9 December 2013, LeCun became the first director of Meta AI Research in New York City and in early 2014 stepped down from the NYU–CDS directorship. In 2013, he and Yoshua Bengio co-founded the International Conference on Learning Representations, which adopted a post-publication open review process he previously advocated on his website. He was the chair and organiser of the "Learning Workshop" held every year between 1986 and 2012 in Snowbird, Utah. He is a member of the Science Advisory Board of the Institute for Pure and Applied Mathematics at UCLA. He is the co-director of the Learning in Machines and Brain research program (formerly Neural Computation & Adaptive Perception) of CIFAR. In 2016, he was the visiting professor of computer science on the Chaire Annuelle Informatique et Sciences Numériques at Collège de France in Paris, where he presented the leçon inaugurale (inaugural lecture). In 2023, he was named as the inaugural Jacob T. Schwartz Chaired Professor in Computer Science at NYU's Courant Institute. LeCun is also a scientific advisor to French research group Kyutai which is being funded by Xavier Niel, Rodolphe Saadé, Eric Schmidt, and others. === Meta Platforms === LeCun joined Facebook (now Meta Platforms) in 2013 as chief AI scientist and led the company's AI research laboratory, FAIR. === AMI Labs === On 19 November 2025, LeCun confirmed that he would be leaving Meta after ten years to found his own company focused on world-model architectures and human-like artificial intelligence he calls superintelligence. The company he founded, Advanced Machine Intelligence Labs (or AMI Labs), is run by CEO Alex LeBrun, with LeCun serving as Executive Chair. This venture is focused on building AI "world models": systems that learn to understand the physical world's structure and dynamics rather than just predict text like large language models. In March 2026, AMI announced it had raised $1.03 billion in funding at a $3.5 billion pre-money valuation. The funding round was co-led by investors including Cathay Innovation, Greycroft, Hiro Capital, HV Capital and Bezos Expeditions. In January 2026, LeCun became founding chair of the Technical Research Board of Logical Intelligence, an AI company developing energy-based (EBM) reasoning systems. == Honours and awards == LeCun is a member of the US National Academy of Sciences, National Academy of Engineering and the French Académie des Sciences. He has received honorary doctorates from Instituto Politécnico Nacional (IPN) in Mexico City in 2016, from EPFL in 2018, from Université Côte d'Azur in 2021, from Università di Siena in 2023, and from Hong Kong University of Science and Technology in 2023. In 2014, he received the IEEE Neural Network Pioneer Award and in 2015, the PAMI Distinguished Researcher Award. In 2018, LeCun was awarded the IRI Medal, established by the Industrial Research Institute (IRI), and the Harold Pender Award, given by the University of Pennsylvania. In 2019, he received the Golden Plate Award of the American Academy of Achievement. In March 2019, LeCun won the 2018 Turing Award, sharing it with Yoshua Bengio and Geoffrey Hinton. In 2022, he received the Princess of Asturias Award in the category "Scientific Research", along with Yoshua Bengio, Geoffrey Hinton and Demis Hassabis. In 2023, the President of France made him a Chevalier (Knight) of the French Legion of Honour. During the World Economic Forum (WEF) 2024 in Davos, he received the Global Swiss AI Award 2023. The same year, he received the grand prize of the VinFuture Prize alongside Yoshua Bengio, Jensen Huang, Geoffrey Hinton, and Fei-Fei Li for their groundbreaking contributions to neural networks and deep learning algorithms. In 2025 he was awarded the Queen Elizabeth Prize for Engineering jointly with Yoshua Bengio, Bill Dally, Geoffrey E. Hinton, John Hopfield, Jensen Huang and Fei-Fei Li.

    Read more →
  • Confusion matrix

    Confusion matrix

    In machine learning, a confusion matrix, also known as error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one. In unsupervised learning it is usually called a matching matrix. The term is used specifically in the problem of statistical classification. Each row of the matrix represents the instances in an actual class while each column represents the instances in a predicted class, or vice versa – both variants are found in the literature. The diagonal of the matrix therefore represents all instances that are correctly predicted. The name stems from the fact that it makes it easy to identify whether the system is confusing two classes (i.e., commonly mislabeling one class as another). The confusion matrix has its origins in human perceptual studies of auditory stimuli. It was adapted for machine learning studies and used by Frank Rosenblatt, among other early researchers, to compare human and machine classifications of visual (and later auditory) stimuli. It is a special kind of contingency table, with two dimensions ("actual" and "predicted"), and identical sets of "classes" in both dimensions (each combination of dimension and class is a variable in the contingency table). == Example == Given a sample of 12 individuals, 8 that have been diagnosed with cancer and 4 that are cancer-free, where individuals with cancer belong to class 1 (positive) and non-cancer individuals belong to class 0 (negative), we can display that data as follows: Assume that we have a classifier that distinguishes between individuals with and without cancer in some way, we can take the 12 individuals and run them through the classifier. The classifier then makes 9 accurate predictions and misses 3: 2 individuals with cancer wrongly predicted as being cancer-free (sample 1 and 2), and 1 person without cancer that is wrongly predicted to have cancer (sample 9). Notice, that if we compare the actual classification set to the predicted classification set, there are 4 different outcomes that could result in any particular column: The actual classification is positive and the predicted classification is positive (1,1). This is called a true positive result because the positive sample was correctly identified by the classifier. The actual classification is positive and the predicted classification is negative (1,0). This is called a false negative result because the positive sample is incorrectly identified by the classifier as being negative. The actual classification is negative and the predicted classification is positive (0,1). This is called a false positive result because the negative sample is incorrectly identified by the classifier as being positive. The actual classification is negative and the predicted classification is negative (0,0). This is called a true negative result because the negative sample gets correctly identified by the classifier. We can then perform the comparison between actual and predicted classifications and add this information to the table, making correct results appear in green so they are more easily identifiable. The template for any binary confusion matrix uses the four kinds of results discussed above (true positives, false negatives, false positives, and true negatives) along with the positive and negative classifications. The four outcomes can be formulated in a 2×2 confusion matrix, as follows: The color convention of the three data tables above were picked to match this confusion matrix, in order to easily differentiate the data. Now, we can simply total up each type of result, substitute into the template, and create a confusion matrix that will concisely summarize the results of testing the classifier: In this confusion matrix, of the 8 samples with cancer, the system judged that 2 were cancer-free, and of the 4 samples without cancer, it predicted that 1 did have cancer. All correct predictions are located in the diagonal of the table (highlighted in green), so it is easy to visually inspect the table for prediction errors, as values outside the diagonal will represent them. By summing up the 2 rows of the confusion matrix, one can also deduce the total number of positive (P) and negative (N) samples in the original dataset, i.e. P = T P + F N {\displaystyle P=TP+FN} and N = F P + T N {\displaystyle N=FP+TN} . == Table of confusion == In predictive analytics, a table of confusion (sometimes also called a confusion matrix) is a table with two rows and two columns that reports the number of true positives, false negatives, false positives, and true negatives. This allows more detailed analysis than simply observing the proportion of correct classifications (accuracy). Accuracy will yield misleading results if the data set is unbalanced; that is, when the numbers of observations in different classes vary greatly. For example, if there were 95 cancer samples and only 5 non-cancer samples in the data, a particular classifier might classify all the observations as having cancer. The overall accuracy would be 95%, but in more detail the classifier would have a 100% recognition rate (sensitivity) for the cancer class but a 0% recognition rate for the non-cancer class. F1 score is even more unreliable in such cases, and here would yield over 97.4%, whereas informedness removes such bias and yields 0 as the probability of an informed decision for any form of guessing (here always guessing cancer). According to Davide Chicco and Giuseppe Jurman, the most informative metric to evaluate a confusion matrix is the Matthews correlation coefficient (MCC). Other metrics can be included in a confusion matrix, each of them having their significance and use. Some researchers have argued that the confusion matrix, and the metrics derived from it, do not truly reflect a model's knowledge. In particular, the confusion matrix cannot show whether correct predictions were reached through sound reasoning or merely by chance (a problem known in philosophy as epistemic luck). It also does not capture situations where the facts used to make a prediction later change or turn out to be wrong (defeasibility). This means that while the confusion matrix is a useful tool for measuring classification performance, it may give an incomplete picture of a model’s true reliability. == Confusion matrices with more than two categories == Confusion matrix is not limited to binary classification and can be used in multi-class classifiers as well. The confusion matrices discussed above have only two conditions: positive and negative. For example, the table below summarizes communication of a whistled language between two speakers, with zero values omitted for clarity. == Confusion matrices in multi-label and soft-label classification == Confusion matrices are not limited to single-label classification (where only one class is present) or hard-label settings (where classes are either fully present, 1, or absent, 0). They can also be extended to Multi-label classification (where multiple classes can be predicted at once) and soft-label classification (where classes can be partially present). One such extension is the Transport-based Confusion Matrix (TCM), which builds on the theory of optimal transport and the principle of maximum entropy. TCM applies to single-label, multi-label, and soft-label settings. It retains the familiar structure of the standard confusion matrix: a square matrix sized by the number of classes, with diagonal entries indicating correct predictions and off-diagonal entries indicating confusion. In the single-label case, TCM is identical to the standard confusion matrix. TCM follows the same reasoning as the standard confusion matrix: if class A is overestimated (its predicted value is greater than its label value) and class B is underestimated (its predicted value is less than its label value), A is considered confused with B, and the entry (B, A) is increased. If a class is both predicted and present, it is correctly identified, and the diagonal entry (A, A) increases. Optimal transport and maximum entropy are used to determine the extent to which these entries are updated. TCM enables clearer comparison between predictions and labels in complex classification tasks, while maintaining a consistent matrix format across settings.

    Read more →
  • David Krueger (professor)

    David Krueger (professor)

    David Krueger is an American machine learning professor and advocate for the reduction of risks related to artificial intelligence. Krueger is an assistant professor in Robust, Reasoning, and Responsible AI at the University of Montreal and a Core Academic Member at Mila. == Early life and education == Krueger obtained a B.A. in mathematics from Reed College, and completed his MSc and Ph.D. in Computer Science at the University of Montreal. He trained in deep learning under Yoshua Bengio, Roland Memisevic, and Aaron Courville from 2013 to 2021. Krueger was also an intern on Google DeepMind's AI Safety team in 2018. == Career == Krueger researches deep learning, AI alignment, and AI safety. His work is focused on reducing the risk of human extinction resulting from out-of-control AI systems. Krueger was an assistant professor at the University of Cambridge from 2021 to 2024, before taking a faculty position at the University of Montreal in 2024. In 2023, he was a founding research director at the UK AI Security Institute. That same year, Krueger initiated the Statement on AI Risk, which argues that AI could cause human extinction and was signed by Anthropic's Dario Amodei, OpenAI's Sam Altman, AI expert Geoffrey Hinton, and other leaders. In April 2026, Krueger discussed the risks of advanced AI at a Capitol Hill event hosted by Senator Bernie Sanders. === Evitable === In 2025, Krueger founded Evitable, a nonprofit organization that advocates for an AI moratorium. == Views == Krueger argues that AI will lead to a "gradual disempowerment" of workers, likening AI chips to nuclear bombs. He also says the military use of AI "poses an existential risk to humanity."

    Read more →
  • Connectionism

    Connectionism

    Connectionism is an approach to the study of human mental processes and cognition that utilizes mathematical models known as connectionist networks or artificial neural networks. Connectionism has had many "waves" since its beginnings. The first wave appeared 1943 with Warren Sturgis McCulloch and Walter Pitts both focusing on comprehending neural circuitry through a formal and mathematical approach, and Frank Rosenblatt who published the 1958 paper "The Perceptron: A Probabilistic Model For Information Storage and Organization in the Brain" in Psychological Review, while working at the Cornell Aeronautical Laboratory. The first wave ended with the 1969 book Perceptrons about limitations of the original perceptron idea, written by Marvin Minsky and Seymour Papert, which contributed to discouraging major funding agencies in the US from investing in connectionist research. With a few noteworthy deviations, most connectionist research entered a period of inactivity until the mid-1980s. The term connectionist model was reintroduced in a 1982 paper in the journal Cognitive Science by Jerome Feldman and Dana Ballard. The second wave blossomed in the late 1980s, following a 1987 book Parallel Distributed Processing by James L. McClelland, David E. Rumelhart, et al., which introduced a couple of improvements to the simple perceptron idea, such as intermediate processors (now known as "hidden layers") alongside input and output units, and used a sigmoid activation function instead of the old "all-or-nothing" function. Their work built upon that of John Hopfield, who was a key figure investigating the mathematical characteristics of sigmoid activation functions. From the late 1980s to the mid-1990s, connectionism took on an almost revolutionary tone when Schneider, Terence Horgan and Tienson posed the question of whether connectionism represented a fundamental shift in psychology and so-called "good old-fashioned AI", or GOFAI. Some advantages of the second wave connectionist approach included its applicability to a broad array of functions, structural approximation to biological neurons, low requirements for innate structure, and capacity for graceful degradation. Its disadvantages included the difficulty in deciphering how ANNs process information or account for the compositionality of mental representations, and a resultant difficulty explaining phenomena at a higher level. The current (third) wave has been marked by advances in deep learning, which have made possible the creation of large language models. The success of deep-learning networks in the past decade has greatly increased the popularity of this approach, but the complexity and scale of such networks has brought with them increased interpretability problems. == Basic principle == The central connectionist principle is that mental phenomena can be described by interconnected networks of simple and often uniform units. The form of the connections and the units can vary from model to model. For example, units in the network could represent neurons and the connections could represent synapses, as in the human brain. This principle has been seen as an alternative to GOFAI and the classical theories of mind based on symbolic computation, but the extent to which the two approaches are compatible has been the subject of much debate since their inception. === Activation function === Internal states of any network change over time due to neurons sending a signal to a succeeding layer of neurons in the case of a feedforward network, or to a previous layer in the case of a recurrent network. Discovery of non-linear activation functions has enabled the second wave of connectionism. === Memory and learning === Neural networks follow two basic principles: Any mental state can be described as a n-dimensional vector of numeric activation values over neural units in a network. Memory and learning are created by modifying the 'weights' of the connections between neural units, generally represented as an n×m matrix. The weights are adjusted according to some learning rule or algorithm, such as Hebbian learning. Most of the variety among the models comes from: Interpretation of units: Units can be interpreted as neurons or groups of neurons. Definition of activation: Activation can be defined in a variety of ways. For example, in a Boltzmann machine, the activation is interpreted as the probability of generating an action potential spike, and is determined via a logistic function on the sum of the inputs to a unit. Learning algorithm: Different networks modify their connections differently. In general, any mathematically defined change in connection weights over time is referred to as the "learning algorithm". === Biological realism === Connectionist work in general does not need to be biologically realistic. One area where connectionist models are thought to be biologically implausible is with respect to error-propagation networks that are needed to support learning, but error propagation can explain some of the biologically-generated electrical activity seen at the scalp in event-related potentials such as the N400 and P600, and this provides some biological support for one of the key assumptions of connectionist learning procedures. Many recurrent connectionist models also incorporate dynamical systems theory. Many researchers, such as the connectionist Paul Smolensky, have argued that connectionist models will evolve toward fully continuous, high-dimensional, non-linear, dynamic systems approaches. == Precursors == Precursors of the connectionist principles can be traced to early work in psychology, such as that of William James. Psychological theories based on knowledge about the human brain were fashionable in the late 19th century. As early as 1869, the neurologist John Hughlings Jackson argued for multi-level, distributed systems. Following from this lead, Herbert Spencer's Principles of Psychology, 3rd edition (1872), and Sigmund Freud's Project for a Scientific Psychology (composed 1895) propounded connectionist or proto-connectionist theories. These tended to be speculative theories. But by the early 20th century, Edward Thorndike was writing about human learning that posited a connectionist type network. Hopfield networks had precursors in the Ising model due to Wilhelm Lenz (1920) and Ernst Ising (1925), though the Ising model conceived by them did not involve time. Monte Carlo simulations of Ising model required the advent of computers in the 1950s. == The first wave == The first wave begun in 1943 with Warren Sturgis McCulloch and Walter Pitts both focusing on comprehending neural circuitry through a formal and mathematical approach. McCulloch and Pitts showed how neural systems could implement first-order logic: Their classic paper "A Logical Calculus of Ideas Immanent in Nervous Activity" (1943) is important in this development here. They were influenced by the work of Nicolas Rashevsky in the 1930s and symbolic logic in the style of Principia Mathematica. Hebb contributed greatly to speculations about neural functioning, and proposed a learning principle, Hebbian learning. Lashley argued for distributed representations as a result of his failure to find anything like a localized engram in years of lesion experiments. Friedrich Hayek independently conceived the model, first in a brief unpublished manuscript in 1920, then expanded into a book in 1952. The Perceptron machines were proposed and built by Frank Rosenblatt, who published the 1958 paper “The Perceptron: A Probabilistic Model For Information Storage and Organization in the Brain” in Psychological Review, while working at the Cornell Aeronautical Laboratory. He cited Hebb, Hayek, Uttley, and Ashby as main influences. Another form of connectionist model was the relational network framework developed by the linguist Sydney Lamb in the 1960s. The research group led by Widrow empirically searched for methods to train two-layered ADALINE networks (MADALINE), with limited success. A method to train multilayered perceptrons with arbitrary levels of trainable weights was published by Alexey Grigorevich Ivakhnenko and Valentin Lapa in 1965, called the Group Method of Data Handling. This method employs incremental layer by layer training based on regression analysis, where useless units in hidden layers are pruned with the help of a validation set. The first multilayered perceptrons trained by stochastic gradient descent was published in 1967 by Shun'ichi Amari. In computer experiments conducted by Amari's student Saito, a five layer MLP with two modifiable layers learned useful internal representations to classify non-linearily separable pattern classes. In 1972, Shun'ichi Amari produced an early example of self-organizing network. == The neural network winter == There was some conflict among artificial intelligence researchers as to what neural networks are useful for. Around late 1960s, there was a widespread lull in research a

    Read more →