Record linkage

Record linkage (also known as data matching, data linkage, entity resolution, and many other terms) is the task of finding records in a data set that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Record linkage is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference. A data set that has undergone RL-oriented reconciliation may be referred to as being cross-linked. == Naming conventions == "Record linkage" is the term used by statisticians, epidemiologists, and historians, among others, to describe the process of joining records from one data source with another that describe the same entity. However, many other terms are used for this process. Unfortunately, this profusion of terminology has led to few cross-references between these research communities. Computer scientists often refer to it as "data matching" or as the "object identity problem". Commercial mail and database applications refer to it as "merge/purge processing" or "list washing". Other names used to describe the same concept include: "coreference/entity/identity/name/record resolution", "entity disambiguation/linking", "fuzzy matching", "duplicate detection", "deduplication", "record matching", "(reference) reconciliation", "object identification", "data/information integration" and "conflation". While they share similar names, record linkage and linked data are two separate approaches to processing and structuring data. Although both involve identifying matching entities across different data sets, record linkage standardly equates "entities" with human individuals; by contrast, Linked Data is based on the possibility of interlinking any web resource across data sets, using a correspondingly broader concept of identifier, namely a URI. == History == The initial idea of record linkage goes back to Halbert L. Dunn in his 1946 article titled "Record Linkage" published in the American Journal of Public Health. Howard Borden Newcombe then laid the probabilistic foundations of modern record linkage theory in a 1959 article in Science. These were formalized in 1969 by Ivan Fellegi and Alan Sunter, in their pioneering work "A Theory For Record Linkage", where they proved that the probabilistic decision rule they described was optimal when the comparison attributes were conditionally independent. In their work they recognized the growing interest in applying advances in computing and automation to large collections of administrative data, and the Fellegi-Sunter theory remains the mathematical foundation for many record linkage applications. Since the late 1990s, various machine learning techniques have been developed that can, under favorable conditions, be used to estimate the conditional probabilities required by the Fellegi-Sunter theory. Several researchers have reported that the conditional independence assumption of the Fellegi-Sunter algorithm is often violated in practice; however, published efforts to explicitly model the conditional dependencies among the comparison attributes have not resulted in an improvement in record linkage quality. On the other hand, machine learning or neural network algorithms that do not rely on these assumptions often provide far higher accuracy, when sufficient labeled training data is available. Record linkage can be done entirely without the aid of a computer, but the primary reasons computers are often used to complete record linkages are to reduce or eliminate manual review and to make results more easily reproducible. Computer matching has the advantages of allowing central supervision of processing, better quality control, speed, consistency, and better reproducibility of results. == Methods == === Data preprocessing === Record linkage is highly sensitive to the quality of the data being linked, so all data sets under consideration (particularly their key identifier fields) should ideally undergo a data quality assessment before record linkage. Many key identifiers for the same entity can be presented quite differently between (and even within) data sets, which can greatly complicate record linkage unless understood ahead of time. For example, key identifiers for a man named William J. Smith might appear in three different data sets as follows: In this example, the different formatting styles lead to records that look different but in fact all refer to the same entity with the same logical identifier values. Most, if not all, record linkage strategies would result in more accurate linkage if these values were first normalized or standardized into a consistent format (e.g., all names are "Surname, Given name", and all dates are "YYYY/MM/DD"). Standardization can be accomplished through simple rule-based data transformations or more complex procedures such as lexicon-based tokenization and probabilistic hidden Markov models. Several of the packages listed in the Software Implementations section provide some of these features to simplify the process of data standardization. === Entity resolution === Entity resolution is an operational intelligence process, typically powered by an entity resolution engine or middleware, whereby organizations can connect disparate data sources with a view to understand possible entity matches and non-obvious relationships across multiple data silos. It analyzes all of the information relating to individuals and/or entities from multiple sources of data, and then applies likelihood and probability scoring to determine which identities are a match and what, if any, non-obvious relationships exist between those identities. Entity resolution engines are typically used to uncover risk, fraud, and conflicts of interest, but are also useful tools for use within customer data integration (CDI) and master data management (MDM) requirements. Typical uses for entity resolution engines include terrorist screening, insurance fraud detection, USA Patriot Act compliance, organized retail crime ring detection and applicant screening. For example, across different data silos – employee records, vendor data, watch lists, etc. – an organization may have several variations of an entity named ABC, which may or may not be the same individual. These entries may, in fact, appear as ABC1, ABC2, or ABC3 within those data sources. By comparing similarities between underlying attributes such as address, date of birth, or social security number, the user can eliminate some possible matches and confirm others as very likely matches. Entity resolution engines then apply rules, based on common sense logic, to identify hidden relationships across the data. In the example above, perhaps ABC1 and ABC2 are not the same individual, but rather two distinct people who share common attributes such as address or phone number. ==== Data matching ==== While entity resolution solutions include data matching technology, many data matching offerings do not fit the definition of entity resolution. Here are four factors that distinguish entity resolution from data matching, according to John Talburt, director of the UALR Center for Advanced Research in Entity Resolution and Information Quality: Works with both structured and unstructured records, and it entails the process of extracting references when the sources are unstructured or semi-structured Uses elaborate business rules and concept models to deal with missing, conflicting, and corrupted information Utilizes non-matching, asserted linking (associate) information in addition to direct matching Uncovers non-obvious relationships and association networks (i.e. who's associated with whom) In contrast to data quality products, more powerful identity resolution engines also include a rules engine and workflow process, which apply business intelligence to the resolved identities and their relationships. These advanced technologies make automated decisions and impact business processes in real time, limiting the need for human intervention. === Deterministic record linkage === The simplest kind of record linkage, called deterministic or rules-based record linkage, generates links based on the number of individual identifiers that match among the available data sets. Two records are said to match via a deterministic record linkage procedure if all or some identifiers (above a certain threshold) are identical. Deterministic record linkage is a good option when the entities in the data sets are identified by a common identifier, or when there are several representative identifiers (e.g., name, date of birth, and sex when identifying a person) whose quality of data is relatively high. As an example, consider two standardized data sets, Set A and Set B, that contain different bits of information about patients in a hospital system. T

MetaMask

MetaMask is a software cryptocurrency wallet developed by ConsenSys for interacting with the Ethereum blockchain and other EVM-compatible networks. It enables users to manage Ethereum accounts and connect to decentralized applications (dApps) via a browser extension or mobile app. As of early 2026, MetaMask reports over 100 million users worldwide. == Overview == MetaMask allows users to store and manage private keys, send and receive Ethereum-based cryptocurrencies and tokens (including ERC-20 and ERC-721 standards), broadcast transactions, and interact with dApps. dApps connect to the wallet via JavaScript interfaces, prompting users to approve signatures or transactions. The wallet features MetaMask Swaps, an in-app token swap aggregator sourcing liquidity from multiple decentralized exchanges (DEXs), with a service fee of 0.875%. In 2025, MetaMask introduced the MetaMask Rewards program (initially mobile-only), where users earn points for activities such as swaps, bridging, and referrals. Season 1 (October 2025 – January 2026) distributed over $30 million in Linea tokens and other perks to participants. == History == MetaMask launched in 2016 as open-source software under the MIT license. It initially supported browser extensions for Chrome and Firefox. Mobile versions were in closed beta from 2019 and publicly released for iOS and Android in September 2020. In August 2020, the license changed to a custom proprietary one. MetaMask Swaps launched on desktop in October 2020 and on mobile in March 2021. The Rewards program launched in late 2025 with Linea integration. == Criticism == MetaMask has faced criticism over privacy, including default analytics settings that share some user data (which can be disabled). Its reliance on Infura (acquired by ConsenSys in 2019) has raised concerns about centralization in Ethereum infrastructure. The wallet regularly issues warnings about phishing scams and fake airdrops impersonating MetaMask.

Vacuum tube characteristics

Vacuum tube characteristics (also called tube curves, valve characteristics or valve curves) describes the electrical relationships between electrode voltages and currents in a vacuum tube. These relationships are commonly presented as characteristic curves in tube manuals and engineering references. The curves typically show plate current versus plate voltage for several fixed control-grid voltages, showing how current varies with electrode potentials under controlled conditions. Designers use them to select operating points, determine voltage gain, estimate output power, and construct graphical load-line analyses. The use of characteristic curves as an engineering tool for analyzing vacuum-tube operation was established in the 1910s, notably in work by Edwin Howard Armstrong. Examples of such curves appear in early tube manuals and textbooks and form the basis of classical vacuum-tube circuit design. Different types of vacuum tubes are characterized using plots appropriate to their electrode structure and intended use. Two-electrode devices such as diodes are described primarily by the relation between plate voltage and plate current. Amplifying tubes containing control grids, such as triodes, tetrodes, pentodes, and beam tetrodes, are represented by families of curves measured for different grid voltages. From these families additional parameters such as amplification factor (μ), transconductance (gm), and plate resistance (rp) may be obtained. Although these plots are used primarily for circuit design, their shapes arise from the underlying physics of electron flow in vacuum tubes. The physical principles responsible for the observed characteristics are discussed in later sections. == 3/2 power law == In high-vacuum thermionic diodes operating under normal conditions, plate current increases nonlinearly with plate voltage. Over the space-charge-limited region, the current is well approximated by the three-halves power relation I p = P ⋅ V p 3 / 2 {\displaystyle I_{p}=P\cdot V_{p}^{3/2}} where P {\displaystyle P} is the perveance of the tube. Perveance is determined primarily by electrode geometry, including cathode area and cathode-to-plate spacing. It provides a practical measure of current-producing capability and is often used in tube manuals in place of a complete family of plate-characteristic curves. == Signal diode characterization == For small-signal diodes, tube manuals typically publish a single static anode characteristic showing anode current (Ia) as a function of anode voltage (Va), measured with the heater operating at its rated voltage. Because the diode contains no control grid, only one such I–V curve is required. The low-voltage portion of the curve is particularly important in detector service, where the nonlinear curvature of the current–voltage relation allows a small alternating signal to produce a net direct-current output, resulting in rectification. In addition to the static characteristic, tube manuals specify heater ratings, maximum plate voltage, permissible average current, and interelectrode capacitance. These parameters define the allowable operating region and high-frequency behavior. Another typical data sheet for a diode is for the Philips EB91 double diode. This book includes curves of the diode response in use as a detector. The output voltage is non-zero for an input voltage of 0 due to the Edison effect. == Rectifier characterization == Vacuum-tube rectifiers intended for power-supply service are specified differently from signal diodes. Their data emphasize heater requirements, peak inverse voltage, maximum peak plate current, permissible DC output current for various filter configurations, and regulation characteristics. Rectifier tubes exhibit nonlinear voltage drop that increases with current. For limited operating ranges this behavior may be represented by an equivalent or effective series resistance corresponding to the local slope of the plate characteristic (dynamic plate resistance, dV/dI). Diode voltages can be determied by use of a graphical aide. In capacitor-input supplies, conduction occurs in pulses near the peaks of the AC waveform, producing peak currents substantially greater than the average DC load current. Data sheets therefore specify maximum peak plate current and permissible filter capacitance in addition to average DC ratings. Under varying load conditions, the supply voltage changes in accordance with the rectifier's nonlinear characteristic and effective impedance. == Triode characterization == === Early use === The systematic use of characteristic curves to explain and quantify vacuum-tube amplification was introduced by Edwin Howard Armstrong in 1914. Using measured plate voltage-current curves, Armstrong demonstrated the mechanism of triode amplification and clarified the operation of grid-leak detection. ==== Plate and transfer characteristics ==== Triode data sheets present families of plate characteristics showing plate current I p {\displaystyle I_{p}} as a function of plate voltage E p {\displaystyle E_{p}} for several fixed grid voltages E g {\displaystyle E_{g}} . From these curves the operating point, voltage gain, and load-line behavior may be determined graphically. In normal operation, plate current depends on both grid and plate voltage. Classical analysis shows that the characteristics for different grid voltages are similar in form and differ primarily by horizontal displacement. In triodes, plate current may be approximated by I p = k ( E g + E p μ ) 3 / 2 {\displaystyle I_{p}=k\left(E_{g}+{\frac {E_{p}}{\mu }}\right)^{3/2}} where E g {\displaystyle E_{g}} is the grid voltage, E p {\displaystyle E_{p}} the plate voltage, μ {\displaystyle \mu } the amplification factor, and k {\displaystyle k} a constant determined by the tube geometry.. The amplification factor μ represents the relative effectiveness of grid voltage compared with plate voltage in controlling current. It is fundamentally determined by structural dimensions, particularly grid-to-cathode spacing relative to plate-to-cathode spacing. ==== Small-signal parameters ==== Triodes are commonly characterized by three interrelated small-signal parameters: Amplification factor ( μ {\displaystyle \mu } ) — the change in plate voltage divided by the change in grid voltage at constant plate current: μ = ( ∂ E p ∂ E g ) I p {\displaystyle \mu =\left({\frac {\partial E_{p}}{\partial E_{g}}}\right)_{I_{p}}} Transconductance ( g m {\displaystyle g_{m}} ) — the change in plate current divided by the change in grid voltage at constant plate voltage: g m = ( ∂ I p ∂ E g ) E p {\displaystyle g_{m}=\left({\frac {\partial I_{p}}{\partial E_{g}}}\right)_{E_{p}}} Plate resistance ( r p {\displaystyle r_{p}} ) — the change in plate voltage divided by the change in plate current at constant grid voltage: r p = ( ∂ E p ∂ I p ) E g {\displaystyle r_{p}=\left({\frac {\partial E_{p}}{\partial I_{p}}}\right)_{E_{g}}} These parameters are related by μ = g m r p {\displaystyle \mu =g_{m}r_{p}} as shown in classical tube theory treatments. These parameters are obtained either from slopes of the characteristic curves or from tabulated operating-point data. ==== Comparison of ECC81, ECC82, and ECC83 ==== The ECC81, ECC82, and ECC83 (also known respectively as 12AT7, 12AU7, and 12AX7) are closely related dual triodes widely used in small-signal amplifier stages. Although similar in construction and envelope size, they differ significantly in electrical parameters due to differences in electrode spacing and grid structure. (Data representative of manufacturer specifications.) The ECC83 exhibits high μ {\displaystyle \mu } and high plate resistance, producing large voltage gain but relatively low current drive capability. The ECC82 has lower μ {\displaystyle \mu } and lower plate resistance, allowing greater current delivery and reduced voltage gain. The ECC81 occupies an intermediate position with comparatively high transconductance and moderate amplification factor. These differences arise primarily from variations in grid pitch, cathode area, and electrode spacing, which determine perveance and amplification factor. Although the external envelope is similar, the internal geometry governs the characteristic curves and small-signal parameters. == Tetrode (screen-grid) characterization == The screen-grid tube (tetrode) was developed primarily to reduce the electrostatic coupling between plate and control grid that limited gain and stability in radio-frequency triode amplifiers. In triodes, the grid–plate capacitance provides feedback from plate to grid, restricting obtainable gain and often requiring neutralization circuits such as those used in neutrodyne receivers. By inserting a positively biased screen grid between control grid and plate, this capacitive coupling is greatly reduced, permitting higher stable gain at radio frequencies. The screen grid, also known as the shield grid or grid 2 (to distinguish it from t

Media Block

A Media Block or Integrated Media Block (IMB) is a component in a digital cinema projection system. Its purpose is to convert the Digital Cinema Package (DCP) content into data that ultimately produces picture and sound in a theater in compliance with DCI anti-piracy encryption requirements. == Terminology == DCI specification allows for two different security system architectures. In the first the Media Block is outside of the projector. This design is simply referred to as a "Media Block" and is typically a device attached directly to the motherboard of a Digital Cinema server. The media block is usually connected to the projector by dual-link SDI cables. Such media block is limited to processing 2K output, downscaling 4K DCPs if necessary. The second architecture describes an "Integrated Media Block". This refers to a device attached and integrated directly into the projector, which receives image data from the server, usually via a cat6 Ethernet connection. They can process 2K and 4K output. Some hardware implementations integrate the entire server on a single board and are able to work both as a MB as well as an IMB. == Security features == All security functions are contained within a Secure Processing Block (SPB), a tamper-proof physical device. Upon ingestion into a DCP server, Key Delivery Messages (KDM) are stored on flash memory in the media block or IMB. A KDM is written to enable the playback of a specific DCP during a specific time window and on a specific media block or IMB, identified by its serial number during the authoring process. Media blocks and IMBs also contain a secure clock that is set in the factory cannot be altered by the end user, which the DCP servers to which they are attached use to determine showtimes. The secure clock prevents theaters from showing encrypted movies outside the times authorized by the KDM (e.g. after it has expired) by simply changing the date and time in the server's BIOS. Media blocks and IMBs also typically include anti-tamper devices, designed to self-destruct the unit if unauthorized modification of its hardware, software or secure clock is attempted.

Digital goods

Digital goods or e-goods are intangible goods that exist in digital form. Examples are Wikipedia articles; digital media, such as e-books, downloadable music, internet radio, internet television and streaming media; fonts, logos, photos and graphics; digital subscriptions; online ads (as purchased by the advertiser); internet coupons; electronic tickets; electronically treated documentation in many different fields; downloadable software (Digital Distribution) and mobile apps; cloud-based applications and online games; virtual goods used within the virtual economies of online games and communities; community access; workbooks; worksheets; planners; e-learning (online courses); webinars, video tutorials, blog posts; cards; patterns; website themes and templates. == Legal concerns about digital goods == Special legal concerns regarding digital goods include copyright infringement and taxation. Also the question of the ownership (versus licensed use or service only) of purely digital goods is not finally resolved. For instance, the software installers of the digital software distributor gog.com are technically independent to the account but are still subject to the EULA, where a "licensed, not sold" formulation is used. Therefore, it is not clear if the software can be legally used after a hypothetical loss of the account; a question which was also raised before in practice for the similar service Steam. In July 2012, the European Court of Justice ruled in the case UsedSoft GMbH v. Oracle International Corp. that the sale of a software product, either through a physical support or download, constituted a transfer of ownership in EU law, thus the first sale doctrine applies; the ruling thereby breaks the "licensed, not sold" legal theory, but leaves open numerous questions. Therefore, it is also permissible to resell software licenses even if the digital good has been downloaded directly from the Internet, as the first-sale doctrine applied whenever software was originally sold to a customer for an unlimited amount of time, thus prohibiting any software maker from preventing the resale of their software by any of their legitimate owners. The court requires that the previous owner must no longer be able to use the licensed software after the resale, but finds that the practical difficulties in enforcing this clause should not be an obstacle to authorizing resale, as they are also present for software which can be installed from physical supports, where the first-sale doctrine is in force. In several cases, content providers have faced criticism for revoking access to digital goods due to expired licenses or the discontinuation of a product, such as ebooks (which resulted in a lawsuit against Amazon.com, Inc.), digital video (with Sony Interactive Entertainment revoking access to purchased StudioCanal content from its now-defunct PlayStation video store; a similar move involving Warner Bros. Discovery content was averted by an updated license agreement), and video games (such as Ubisoft discontinuing and revoking access to its game The Crew without providing refunds or the ability to redownload the game) In September 2024, the U.S. state of California implemented a consumer protection law that prohibits the use of terms such as "buy" or "purchase" during transactions involving digital goods if there is no way to obtain the purchases in a manner that cannot be revoked by the seller (such as allowing it to be downloaded for permanent, offline access), and requires a disclaimer to be displayed to the customer at the time of purchase.

Commonsense knowledge (artificial intelligence)

In artificial intelligence research, commonsense knowledge consists of facts about the everyday world, such as "Lemons are sour" or "Cows say moo", that all humans are expected to know. It is currently an unsolved problem in artificial general intelligence. The first AI program to address common sense knowledge was Advice Taker in 1959 by John McCarthy. Commonsense knowledge can underpin a commonsense reasoning process, to attempt inferences such as "You might bake a cake because you want people to eat the cake." A natural language processing process can be attached to the commonsense knowledge base to allow the knowledge base to attempt to answer questions about the world. Common sense knowledge also helps to solve problems in the face of incomplete information. Using widely held beliefs about everyday objects, or common sense knowledge, AI systems make common sense assumptions or default assumptions about the unknown similar to the way people do. In an AI system or in English, this is expressed as "Normally P holds", "Usually P" or "Typically P so Assume P". For example, if we know the fact "Tweety is a bird", because we know the commonly held belief about birds, "typically birds fly," without knowing anything else about Tweety, we may reasonably assume the fact that "Tweety can fly." As more knowledge of the world is discovered or learned over time, the AI system can revise its assumptions about Tweety using a truth maintenance process. If we later learn that "Tweety is a penguin" then truth maintenance revises this assumption because we also know "penguins do not fly". == Commonsense reasoning == Commonsense reasoning simulates the human ability to use commonsense knowledge to make presumptions about the type and essence of ordinary situations they encounter every day, and to change their "minds" should new information come to light. This includes time, missing or incomplete information and cause and effect. The ability to explain cause and effect is an important aspect of explainable AI. Truth maintenance algorithms automatically provide an explanation facility because they create elaborate records of presumptions. Compared with humans, all existing computer programs that attempt human-level AI perform extremely poorly on modern "commonsense reasoning" benchmark tests such as the Winograd Schema Challenge. The problem of attaining human-level competency at "commonsense knowledge" tasks is considered to probably be "AI complete" (that is, solving it would require the ability to synthesize a fully human-level intelligence), although some oppose this notion and believe compassionate intelligence is also required for human-level AI. Common sense reasoning has been applied successfully in more limited domains such as natural language processing and automated diagnosis or analysis. == Commonsense knowledge base construction == Compiling comprehensive knowledge bases of commonsense assertions (CSKBs) is a long-standing challenge in AI research. From early expert-driven efforts like CYC and WordNet, significant advances were achieved via the crowdsourced OpenMind Commonsense project, which led to the crowdsourced ConceptNet KB. Several approaches have attempted to automate CSKB construction, most notably, via text mining (WebChild, Quasimodo, TransOMCS, Ascent), as well as harvesting these directly from pre-trained language models (AutoTOMIC). These resources are significantly larger than ConceptNet, though the automated construction mostly makes them of moderately lower quality. Challenges also remain on the representation of commonsense knowledge: Most CSKB projects follow a triple data model, which is not necessarily best suited for breaking more complex natural language assertions. A notable exception here is GenericsKB, which applies no further normalization to sentences, but retains them in full. == Applications == Around 2013, MIT researchers developed BullySpace, an extension of the commonsense knowledgebase ConceptNet, to catch taunting social media comments. BullySpace included over 200 semantic assertions based around stereotypes, to help the system infer that comments like "Put on a wig and lipstick and be who you really are" are more likely to be an insult if directed at a boy than a girl. ConceptNet has also been used by chatbots and by computers that compose original fiction. At Lawrence Livermore National Laboratory, common sense knowledge was used in an intelligent software agent to detect violations of a comprehensive nuclear test ban treaty. == Data == As an example, as of 2012 ConceptNet includes these 21 language-independent relations: IsA (An "RV" is a "vehicle" | X is an instance of a Y) UsedFor (a "cake tin" is used for "making cakes" | X is used for the purpose Y) HasA (A "rabbit" has a "tail" | X possesses Y element or feature) CapableOf (a "cook" is capable of "making baked goods" | X is capable of doing Y) Desires (a "child" desires "the aroma of baking" | X has a desire for Y) CreatedBy ("cake" is created by a "baker" | X is created by Y) PartOf (a "knife" is be part of a "knife set" | X is a part of Y) Causes ("Heat" causes "cooking"| X is what causes Y) LocatedNear (the "oven" is located near the "refrigerator" | X is located near Y) AtLocation (Somewhere a "Cook" can be at a "restaurant" | X is at the location of Y) DefinedAs (a "Cupcake" is defined as a "cake" that also has the qualities of being "small", "baked within a wrapper", and "containing only one area of frosting or icing" | X is defined as Y that also has the properties A, B & C) SymbolOf (a "heart" is a symbol of "affection" | X is a symbolic representation of Y) ReceivesAction ("cake" can receive the action of being "eaten" | X is capable of receiving action Y) HasPrerequisite ("baking" has the prerequisite of obtaining the "ingredients" | X cannot do Y unless A does B) MotivatedByGoal ("baking" is motivated by the goal of "consumption"/"eating" | X has the motivation of Y goal) CausesDesire ("baking" makesYou want to "follow recipe" | X causes the desire to do Y) MadeOf ("Cake" is made of "flour"/"eggs"/"sugar"/"oil"/etc | X is made of Y) HasFirstSubevent ("baking" has first subevent "make batter" | To do X the first thing that needs to be done is Y) HasSubevent ("eat" has subevent "swallow" | Doing X will lead to Y event following) HasLastSubevent ("sleeping" has last subevent of "waking" | Doing X ends with the event Y) == Commonsense knowledge bases == Cyc Open Mind Common Sense (data source) and ConceptNet (datastore and NLP engine) Evi Graphiq

Bluelight (web forum)

Bluelight is a web-forum, research portal, online community, and non-profit organisation dedicated to harm reduction in drug use. Its userbase includes current and former substance users, academic researchers, drug policy activists, and mental health advocates. It is believed to be the largest online international drug discussion website in the world. As of November 2025, the website claims over 475,900 registered members, the Discord community claims over 11,900 members, and additional members utilise other platforms such as Telegram. Bluelight has been utilised by academic researchers as a primary source of data in numerous publications. Researchers also utilise the site to advertise research studies, recruit study participants, and better understand the world of substance use. Research groups and organisations that have partnered with Bluelight to recruit study participants include Imperial College London, Johns Hopkins University, Health Canada, Karlstad University, Curtin University, Macquarie University, Columbia University, University of Pennsylvania, University of Michigan, Toronto Metropolitan University (then known as Ryerson University), and MAPS. Researchers have found that the most common reasons for substance users to visit Bluelight.org and similar online communities are to learn "how to use drugs safely" and "how to help others use drugs safely." Bluelight neither condemns or condones drug use, instead advocating for the principle of responsible drug use; educating and allowing individuals to make informed decisions regarding their drug use, providing information on local drug misuse services, and providing them with other drug harm reduction resources and public safety notices. == History == Bluelight.org was originally formed in 1997 as a message board on bluelight.net called the MDMA Clearinghouse. The board was created as a side project by the owner of West Palm Beach design company Bluelight Designs. 200–300 users joined the site between 1998 and 1999, but the site's servers were heavily limited and could only store a few threads at a time; this led to the creation of 'The New Bluelight' forum in May 1999 and the registration of the bluelight.nu domain in June 1999. The site began to explode in popularity in the early 2000s with the rise of MDMA in the club scene, amassing nearly 7,000 members by the year 2000 and 59,000 by the start of 2006. The site switched to the bluelight.ru domain in October 2005, and switched again to bluelight.org in January 2014. In early 2024, Bluelight was re-structured and the forum became a subsidiary of the newly formed Australian non-profit organisation & registered charity Bluelight Communities Ltd. == Partnerships == In the early 2000s, Bluelight worked with reagent test supplier EZ-Test to promote the sale of drug checking kits. In 2007, Bluelight partnered with the Multidisciplinary Association for Psychedelic Studies (MAPS), a non-profit organisation working to raise awareness and understanding of psychedelic drugs through education, clinical research, and advocacy. MAPS utilised Bluelight to recruit participants for its first MDMA-assisted psychotherapy trial for PTSD. In 2013, the official MAPS forums were migrated to Bluelight. Bluelight's other partners include Erowid, a non-profit organisation dedicated to education surrounding psychoactive drugs; TripSit, a harm reduction education website; Pill Reports, a web-based database for drug checking results that was initially formed as an offshoot of the site; and the Global Drug Survey, an independent research organisation focused on collecting data about substance use. == Notable users == Alan Woods – funded the site's maintenance costs from 1999 until his death in 2008 Hamilton Morris John McAfee – created an infamous series of troll posts about the stimulant MDPV