AI Data Bias

AI Data Bias — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Systems development life cycle

    Systems development life cycle

    The systems development life cycle (SDLC) describes the typical phases and progression between phases during the development of a computer-based system. These phases progress from inception to retirement. At base, there is just one life cycle, but the taxonomy used to describe it may vary; the cycle may be classified into different numbers of phases and various names may be used for those phases. The SDLC is analogous to the life cycle of a living organism from its birth to its death. In particular, the SDLC varies by system in much the same way that each living organism has a unique path through its life. The SDLC does not prescribe how engineers should go about their work to move the system through its life cycle. Prescriptive techniques are referred to using various terms such as methodology, model, framework, and formal process. Other terms are used for the same concept as SDLC, including software development life cycle (also SDLC), application development life cycle (ADLC), and system design life cycle (also SDLC). These other terms focus on a different scope of development and are associated with different prescriptive techniques, but are about the same essential life cycle. The term "life cycle" is often written without a space, as "lifecycle", with the former more popular in the past and in non-engineering contexts. The acronym SDLC was coined when the longer form was more popular and has remained associated with the expansion, even though the shorter form is popular in engineering. Also, SDLC is relatively unique as opposed to the TLA SDL, which is highly overloaded. == Phases == Depending on the source, the SDLC is described as having different phases and using different terms. Even so, there are common aspects. The following attempts to describe notable phases using notable terminology. The phases are somewhat ordered by the natural sequence of development, although they can be overlapping and iterative. === Conceptualization === During conceptualization (a.k.a. conceptual design, system investigation, feasibility), options and priorities are considered. A feasibility study can determine whether the development effort is worthwhile via activities such as understanding user needs, cost estimation, benefit analysis, and resource analysis. A study should address operational, financial, technical, human factors, and legal/political concerns. === Requirements analysis === Requirements analysis (a.k.a. preliminary design) involves understanding the problem and determining what is needed. Often this involves engaging users to define the requirements and recording them in a document known as a requirements specification. === Design === During the design phase (a.k.a. detail design), a solution is planned. The plan can include relatively high-level information such as describing the major components of the system. The plan can include relatively low-level information such as describing functions, screen layout, business rules, and process flow. The design phase is informed by the requirements of the system. The design must satisfy each requirement. The design may be recorded in textual documents as well as functional hierarchy diagrams, example screen images, business rules, process diagrams, pseudo-code, and data models. === Construction === During construction (a.k.a. implementation, production), the system is realized. Based on the design, hardware and software components are created and integrated. This phase includes testing sub-components, components and the integration of some components, but typically does not include testing at the complete system level. This phase may include the development of training materials, including user manuals and help files. === Acceptance === The acceptance phase (a.k.a. system testing) is about testing the complete system to ensure that it meets customer expectations (requirements). === Deployment === The deployment phase (a.k.a. implementation) involves the logistics of delivery to the customer. Some systems are deployed as a single instance (i.e. in the cloud), and deployment may be ad hoc and manual. Some systems are built in quantity and are associated with manufacturing process and commissioning. This phase may include training users to use the system. It may include transitioning future development to support staff. === Maintenance === During the maintenance phase (a.k.a. operation, utilization, support) development is largely inactive, although this phase does include customer support for resolving user issues and recording suggestions for improvement. Fixes and enhancements are handled by returning to the first phase, conceptualization. For minor changes, the cycle may be significantly abbreviated compared to initial development. === Decommission === Decommission (a.k.a. disposition, retirement, phase-out) is when the system is removed from use, i.e., when it reaches end-of-life. == Practices == === Management and control === SDLC phase objectives are described in this section with key deliverables, a description of recommended tasks, and a summary of related control objectives for effective management. It is critical for the project manager to establish and monitor control objectives while executing projects. Control objectives are clear statements of the desired result or purpose and should be defined and monitored throughout a project. Control objectives can be grouped into major categories (domains), and relate to the SDLC phases as shown in the figure. To manage and control a substantial SDLC initiative, a work breakdown structure (WBS) captures and schedules the work. The WBS and all programmatic material should be kept in the "project description" section of the project notebook. The project manager chooses a WBS format that best describes the project. The diagram shows that coverage spans numerous phases of the SDLC, but the associated MCD (Management Control Domains) shows mappings to SDLC phases. For example, Analysis and Design is primarily performed as part of the Acquisition and Implementation Domain, and System Build and Prototype is primarily performed as part of delivery and support. === Work breakdown structured organization === The upper section of the WBS provides an overview of the project scope and timeline. It should also summarize the major phases and milestones. The middle section is based on the SDLC phases. WBS elements consist of milestones and tasks to be completed rather than activities to be undertaken, and have a deadline. Each task has a measurable output (e.g., an analysis document). A WBS task may rely on one or more activities (e.g., coding). Parts of the project needing support from contractors should have a statement of work (SOW). The development of an SOW does not occur during a specific phase of SDLC but is developed to include the work from the SDLC process that may be conducted by contractors. === Baselines === Baselines are established after four of the five phases of the SDLC, and are critical to the iterative nature of the model. Baselines become milestones. functional baseline: established after the conceptual design phase. allocated baseline: established after the preliminary design phase. product baseline: established after the detailed design and development phase. updated product baseline: established after the production construction phase. In the following diagram, these stages are divided into ten steps, from definition to creation and modification of IT work products:

    Read more →
  • Layer (deep learning)

    Layer (deep learning)

    A layer in a deep learning model is a structure or network topology in the model's architecture, which takes information from the previous layers and then passes it to the next layer. == Layer types == The first type of layer is the Dense layer, also called the fully-connected layer, and is used for abstract representations of input data. In this layer, neurons connect to every neuron in the preceding layer. In multilayer perceptron networks, these layers are stacked together. The Convolutional layer is typically used for image analysis tasks. In this layer, the network detects edges, textures, and patterns. The outputs from this layer are then fed into a fully-connected layer for further processing. See also: CNN model. The Pooling layer is used to reduce the size of data input. The Recurrent layer is used for text processing with a memory function. Similar to the Convolutional layer, the output of recurrent layers are usually fed into a fully-connected layer for further processing. See also: RNN model. The Normalization layer adjusts the output data from previous layers to achieve a regular distribution. This results in improved scalability and model training. A Hidden layer is any of the layers in a Neural Network that aren't the input or output layers. == Differences with layers of the neocortex == There is an intrinsic difference between deep learning layering and neocortical layering: deep learning layering depends on network topology, while neocortical layering depends on intra-layers homogeneity.

    Read more →
  • Unique name assumption

    Unique name assumption

    The unique name assumption is a simplifying assumption made in some ontology languages and description logics. In logics with the unique name assumption, different names always refer to different entities in the world. It was included in Ray Reiter's discussion of the closed-world assumption often tacitly included in Database Management Systems (e.g. SQL) in his 1984 article "Towards a logical reconstruction of relational database theory" (in M. L. Brodie, J. Mylopoulos, J. W. Schmidt (editors), Data Modelling in Artificial Intelligence, Database and Programming Languages, Springer, 1984, pages 191–233). The standard ontology language OWL does not make this assumption, but provides explicit constructs to express whether two names denote the same or distinct entities. owl:sameAs is the OWL property that asserts that two given names or identifiers (e.g., URIs) refer to the same individual or entity. owl:differentFrom is the OWL property that asserts that two given names or identifiers (e.g., URIs) refer to different individuals or entities.

    Read more →
  • ML.NET

    ML.NET

    ML.NET is a free software machine learning library for the C# and F# programming languages. It also supports Python models when used together with NimbusML. The preview release of ML.NET included transforms for feature engineering like n-gram creation, and learners to handle binary classification, multi-class classification, and regression tasks. Additional ML tasks like anomaly detection and recommendation systems have since been added, and other approaches like deep learning will be included in future versions. == Machine learning == ML.NET brings model-based Machine Learning analytic and prediction capabilities to existing .NET developers. The framework is built upon .NET Core and .NET Standard inheriting the ability to run cross-platform on Linux, Windows and macOS. Although the ML.NET framework is new, its origins began in 2002 as a Microsoft Research project named TMSN (text mining search and navigation) for use internally within Microsoft products. It was later renamed to TLC (the learning code) around 2011. ML.NET was derived from the TLC library and has largely surpassed its parent says Dr. James McCaffrey, Microsoft Research. Developers can train a Machine Learning Model or reuse an existing Model by a 3rd party and run it on any environment offline. This means developers do not need to have a background in Data Science to use the framework. Support for the open-source Open Neural Network Exchange (ONNX) Deep Learning model format was introduced from build 0.3 in ML.NET. The release included other notable enhancements such as Factorization Machines, LightGBM, Ensembles, LightLDA transform and OVA. The ML.NET integration of TensorFlow is enabled from the 0.5 release. Support for x86 & x64 applications was added to build 0.7 including enhanced recommendation capabilities with Matrix Factorization. A full roadmap of planned features have been made available on the official GitHub repo. The first stable 1.0 release of the framework was announced at Build (developer conference) 2019. It included the addition of a Model Builder tool and AutoML (Automated Machine Learning) capabilities. Build 1.3.1 introduced a preview of Deep Neural Network training using C# bindings for Tensorflow and a Database loader which enables model training on databases. The 1.4.0 preview added ML.NET scoring on ARM processors and Deep Neural Network training with GPU's for Windows and Linux. === Performance === Microsoft's paper on machine learning with ML.NET demonstrated it is capable of training sentiment analysis models using large datasets while achieving high accuracy. Its results showed 95% accuracy on Amazon's 9GB review dataset. === Model builder === The ML.NET CLI is a Command-line interface which uses ML.NET AutoML to perform model training and pick the best algorithm for the data. The ML.NET Model Builder preview is an extension for Visual Studio that uses ML.NET CLI and ML.NET AutoML to output the best ML.NET Model using a GUI. === Model explainability === AI fairness and explainability has been an area of debate for AI Ethicists in recent years. A major issue for Machine Learning applications is the black box effect where end users and the developers of an application are unsure of how an algorithm came to a decision or whether the dataset contains bias. Build 0.8 included model explainability API's that had been used internally in Microsoft. It added the capability to understand the feature importance of models with the addition of 'Overall Feature Importance' and 'Generalized Additive Models'. When there are several variables that contribute to the overall score, it is possible to see a breakdown of each variable and which features had the most impact on the final score. The official documentation demonstrates that the scoring metrics can be output for debugging purposes. During training & debugging of a model, developers can preview and inspect live filtered data. This is possible using the Visual Studio DataView tools. === Infer.NET === Microsoft Research announced the popular Infer.NET model-based machine learning framework used for research in academic institutions since 2008 has been released open source and is now part of the ML.NET framework. The Infer.NET framework utilises probabilistic programming to describe probabilistic models which has the added advantage of interpretability. The Infer.NET namespace has since been changed to Microsoft.ML.Probabilistic consistent with ML.NET namespaces. === NimbusML Python support === Microsoft acknowledged that the Python programming language is popular with Data Scientists, so it has introduced NimbusML the experimental Python bindings for ML.NET. This enables users to train and use machine learning models in Python. It was made open source similar to Infer.NET. === Machine learning in the browser === ML.NET allows users to export trained models to the Open Neural Network Exchange (ONNX) format. This establishes an opportunity to use models in different environments that don't use ML.NET. It would be possible to run these models in the client side of a browser using ONNX.js, a JavaScript client-side framework for deep learning models created in the Onnx format. === AI School Machine Learning Course === Along with the rollout of the ML.NET preview, Microsoft rolled out free AI tutorials and courses to help developers understand techniques needed to work with the framework.

    Read more →
  • Zeuthen strategy

    Zeuthen strategy

    The Zeuthen strategy in cognitive science is a negotiation strategy used by some artificial agents. Its purpose is to measure the willingness to risk conflict. An agent will be more willing to risk conflict if it does not have much to lose in case that the negotiation fails. In contrast, an agent is less willing to risk conflict when it has more to lose. The value of a deal is expressed in its utility. An agent has much to lose when the difference between the utility of its current proposal and the conflict deal is high. When both agents use the monotonic concession protocol, the Zeuthen strategy leads them to agree upon a deal in the negotiation set. This set consists of all conflict free deals, which are individually rational and Pareto optimal, and the conflict deal, which maximizes the Nash product. The strategy was introduced in 1930 by the Danish economist Frederik Zeuthen. == Three key questions == The Zeuthen strategy answers three open questions that arise when using the monotonic concession protocol, namely: Which deal should be proposed at first? On any given round, who should concede? In case of a concession, how much should the agent concede? The answer to the first question is that any agent should start with its most preferred deal, because that deal has the highest utility for that agent. The second answer is that the agent with the smallest value of Risk(i,t) concedes, because the agent with the lowest utility for the conflict deal profits most from avoiding conflict. To the third question, the Zeuthen strategy suggests that the conceding agent should concede just enough raise its value of Risk(i,t) just above that of the other agent. This prevents the conceding agent to have to concede again in the next round. == Risk == Risk ( i , t ) = { 1 U i ( δ ( i , t ) ) = 0 U i ( δ ( i , t ) ) − U i ( δ ( j , t ) ) U i ( δ ( i , t ) ) otherwise {\displaystyle {\text{Risk}}(i,t)={\begin{cases}1&U_{i}(\delta (i,t))=0\\{\frac {U_{i}(\delta (i,t))-U_{i}(\delta (j,t))}{U_{i}(\delta (i,t))}}&{\text{otherwise}}\end{cases}}} Risk(i,t) is a measurement of agent i's willingness to risk conflict. The risk function formalizes the notion that an agent's willingness to risk conflict is the ratio of the utility that agent would lose by accepting the other agent's proposal to the utility that agent would lose by causing a conflict. Agent i is said to be using a rational negotiation strategy if at any step t + 1 that agent i sticks to his last proposal, Risk(i,t) > Risk(j,t). == Sufficient concession == If agent i makes a sufficient concession in the next step, then, assuming that agent j is using a rational negotiation strategy, if agent j does not concede in the next step, he must do so in the step after that. The set of all sufficient concessions of agent i at step t is denoted SC(i, t). == Minimal sufficient concession == δ ′ = arg ⁡ max δ ∈ S C ( A , t ) { U A ( δ ) } {\displaystyle \delta '=\arg \max _{\delta \in {SC(A,t)}}\{U_{A}(\delta )\}} is the minimal sufficient concession of agent A in step t. Agent A begins the negotiation by proposing δ ( A , 0 ) = arg ⁡ max δ ∈ N S U A ( δ ) {\displaystyle \delta (A,0)=\arg \max _{\delta \in {NS}}U_{A}(\delta )} and will make the minimal sufficient concession in step t + 1 if and only if Risk(A,t) ≤ Risk(B,t). Theorem If both agents are using Zeuthen strategies, then they will agree on δ = arg ⁡ max δ ′ ∈ N S { π ( δ ′ ) } , {\displaystyle \delta =\arg \max _{\delta '\in {NS}}\{\pi (\delta ')\},} that is, the deal which maximizes the Nash product. Proof Let δA = δ(A,t). Let δB = δ(B,t). According to the Zeuthen strategy, agent A will concede at step t {\displaystyle t} if and only if R i s k ( A , t ) ≤ R i s k ( B , t ) . {\displaystyle Risk(A,t)\leq Risk(B,t).} That is, if and only if U A ( δ A ) − U A ( δ B ) U A ( δ A ) ≤ U B ( δ B ) − U B ( δ A ) U B ( δ B ) {\displaystyle {\frac {U_{A}(\delta _{A})-U_{A}(\delta _{B})}{U_{A}(\delta _{A})}}\leq {\frac {U_{B}(\delta _{B})-U_{B}(\delta _{A})}{U_{B}(\delta _{B})}}} U B ( δ B ) ( U A ( δ A ) − U A ( δ B ) ) ≤ U A ( δ A ) ( U B ( δ B ) − U B ( δ A ) ) {\displaystyle U_{B}(\delta _{B})(U_{A}(\delta _{A})-U_{A}(\delta _{B}))\leq U_{A}(\delta _{A})(U_{B}(\delta _{B})-U_{B}(\delta _{A}))} U A ( δ A ) U B ( δ B ) − U A ( δ B ) U B ( δ B ) ≤ U A ( δ A ) U B ( δ B ) − U A ( δ A ) U B ( δ A ) {\displaystyle U_{A}(\delta _{A})U_{B}(\delta _{B})-U_{A}(\delta _{B})U_{B}(\delta _{B})\leq U_{A}(\delta _{A})U_{B}(\delta _{B})-U_{A}(\delta _{A})U_{B}(\delta _{A})} − U A ( δ B ) U B ( δ B ) ≤ − U A ( δ A ) U B ( δ A ) {\displaystyle -U_{A}(\delta _{B})U_{B}(\delta _{B})\leq -U_{A}(\delta _{A})U_{B}(\delta _{A})} U A ( δ A ) U B ( δ A ) ≤ U A ( δ B ) U B ( δ B ) {\displaystyle U_{A}(\delta _{A})U_{B}(\delta _{A})\leq U_{A}(\delta _{B})U_{B}(\delta _{B})} π ( δ A ) ≤ π ( δ B ) {\displaystyle \pi (\delta _{A})\leq \pi (\delta _{B})} Thus, Agent A will concede if and only if δ A {\displaystyle \delta _{A}} does not yield the larger product of utilities. Therefore, the Zeuthen strategy guarantees a final agreement that maximizes the Nash Product.

    Read more →
  • Otterly.ai

    Otterly.ai

    Otterly.ai is an Austrian software company, founded in 2024, that provides tools for generative engine optimization, the practice of monitoring and optimizing results in large language models. == History == Otterly.ai was co-founded in 2024 by Thomas Peham, Klaus-M. Schremser and Josef Trauner. The concept for OtterlyAI was developed in response to the increasing use of generative AI tools in digital search and content discovery. The company announced a technology partnership with SEO platform Semrush in January 2025.

    Read more →
  • Linguistic value

    Linguistic value

    In artificial intelligence, fuzzy logic operations research, and related fields, a linguistic value is a natural language term which is derived using quantitative or qualitative reasoning such as with probability and statistics or fuzzy sets and systems. Variables that take linguistic values are called linguistic variables. == Examples of linguistic variables and values == For example, "age" may be a linguistic variable if its values are not numerical, e.g. very young, quite young, not young, old, not very old etc. These values could be derived from the numeric values for age. As another example, if a shuttle heat shield is deemed of having a linguistic value of a "very low" percentage of damage in re-entry, based upon knowledge from experts in the field, that probability would be given a value of say, 5%. From there on out, if it were to be used in an equation, the variable of percentage of damage will be at 5% if it deemed very low percentage.

    Read more →
  • Copyright and artificial intelligence in the United Kingdom

    Copyright and artificial intelligence in the United Kingdom

    The interaction of artificial intelligence and copyright law has become one of the most contentious tech policy debates in the United Kingdom, centering on whether AI developers should be permitted to train their models on copyrighted material without explicit consent or remuneration. This debate has exposed a deep fracture between the creative industries, which seek to protect their intellectual property from unauthorised commercial exploitation, and tech companies. The academic and library sectors are also impacted, and argue that overly restrictive copyright laws hinder scientific research and the UK's sovereign AI capabilities. In 2024, the UK government proposed a broad text and data mining (TDM) exception to copyright that would have allowed AI companies to use publicly available copyrighted material for training, offering creators only an "opt-out" mechanism, similar to the exception introduced in Europe. This proposal faced intense opposition from across the creative sector. Trade unions representing writers, musicians, performers, and journalists argued that such an exception would effectively expropriate their members' work for the commercial benefit of tech giants. A report from the House of Lords Communications and Digital Committee, warned that generative AI posed a "clear and present danger" to the £124 billion creative economy. The government abandoned the opt-out model in March 2026, opting instead to build a stronger evidence base before pursuing any copyright reform. Conversely, the academic and library sectors have raised significant concerns that the UK's current TDM exception, which is strictly limited to non-commercial research, is too narrow. Universities and research libraries occupy a dual role as both creators of vast datasets and beneficiaries of TDM exceptions. They argue that the current legal framework restricts their ability to computationally analyse the very research they produce, thereby hobbling the UK's "AI for Science" strategy. Advocacy groups have highlighted a "triple payment" problem, wherein publicly funded research is handed over to publishers, who then charge universities substantial subscription fees and demand additional payments for specific TDM licences. This tension is further complicated by the commercial practices of major academic publishers. While publishers often restrict universities from using subscribed databases for AI training, they have simultaneously entered into lucrative, multi-million-dollar licensing agreements to sell access to this academic content to commercial AI developers. Furthermore, academics have accused publishers of actively steering authors away from permissive open-access licences towards more restrictive variants. By doing so, publishers retain the exclusive commercial rights necessary to strike these AI training deals, often without consulting the original authors or offering them any additional remuneration. This dynamic has not only reopened debates within the Open Access movement but has also created complex legal scenarios where publishers, rather than authors, control the terms of copyright litigation against major tech companies. == Training on copyrighted material == The question of whether AI developers should be permitted to train their models on copyrighted material without payment or consent has been one of the most contentious policy debates in the UK AI landscape. In 2024, the then-Conservative government proposed a broad text and data mining (TDM) exception that would have allowed AI companies to use any publicly available copyrighted material for training purposes, with creators able only to "opt out" of having their work used. This proposal provoked intense opposition from writers, musicians, visual artists, publishers, and broadcasters, who argued it would effectively expropriate their intellectual property for the commercial benefit of AI companies. The debate over text and data mining exceptions extends significantly beyond generative AI and the creative industries, implicating a wide range of scientific, industrial, and academic research applications. TDM is a foundational process for analysing large datasets to identify patterns, trends, and correlations, which is heavily utilised in fields such as medical research, climate modelling, and financial services. In the scientific and academic sectors, researchers rely on TDM to process vast amounts of published literature. For example, in biomedical research, TDM is used to accelerate drug discovery, identify new uses for existing medicines, and extract insights from clinical notes and genomic datasets. However, the application of traditional copyright frameworks to scientific literature has been criticised by academics. Researchers argue that scientific writing is intended to convey factual, verifiable information rather than creative originality, and that copyright restrictions on TDM hinder reproducibility, validation, and the advancement of science. The current UK copyright exception for TDM (Section 29A of the Copyright, Designs and Patents Act 1988) is limited strictly to non-commercial research, which creates barriers for public-private research partnerships and commercial scientific development. Beyond academia, non-generative AI and TDM are critical to various industrial and commercial operations. In the financial services sector, TDM is employed to monitor transactions, detect fraud, and analyse market feeds. Other non-generative applications include search engine indexing, plagiarism detection software, and media monitoring. A 2026 report by Public First estimated that 19% of UK businesses use specialised TDM tools, and that a restrictive copyright regime requiring licenses for all copyrighted content could cost the UK economy £220 billion in lost AI-driven GDP growth by 2035 compared to a broad commercial TDM exemption. Industry advocates argue that the lack of a commercial TDM exception in the UK creates legal uncertainty that stifles innovation across these broader, non-generative applications of data analysis. === Tech and AI industry positions === The technology and artificial intelligence industries lobbied for a broad text and data mining (TDM) exception to UK copyright law, arguing that such an exception is essential for the UK to remain globally competitive in AI development. Industry bodies such as techUK have argued that without a TDM exception, the UK risks becoming an "AI taker rather than an AI maker," as developers will relocate training operations to jurisdictions with more permissive copyright regimes, such as the United States, Japan, Singapore, and the European Union. During the UK government's 2024–2025 consultation on copyright and AI, major AI developers and trade associations strongly supported "Option 2" (a broad TDM exception) or "Option 3" (a TDM exception with an opt-out mechanism). OpenAI stated in its consultation response that a broad TDM exception is "necessary to drive AI innovation and investment in the UK," arguing that developers should be permitted to train models on lawfully accessed copies without further distribution. The Computer and Communications Industry Association (CCIA) similarly argued that restricting TDM to non-commercial development would undermine the government's ambitions for the UK tech sector and frustrate partnerships between commercial entities and research institutions. Tech industry advocates have also highlighted the economic implications of copyright policy. According to analysis by the think tank UK Day One, adopting an overly restrictive licensing-only approach could result in the UK economy losing up to £182 billion over 20 years, whereas a broad TDM exception could generate a positive impact of £131.61 billion over the same period. Following the government's March 2026 decision to drop plans for a TDM exception in favour of a market-led licensing approach, techUK's Deputy CEO Antony Walker criticised the move, stating that "copyright material cannot be used for AI development and training without permission" under the current framework, which he argued would push AI model training to the US. === Creative sector and political opposition to text and data mining === In March 2026, the House of Lords Communications and Digital Committee published a report, AI, Copyright and the Creative Industries, which concluded that the creative industries face "a clear and present danger from generative AI" and that it would be "a very poor bet" for the government to weaken copyright protections to attract AI investment. The Committee noted that the creative industries contributed £124 billion to the UK economy in 2023 and employed 2.4 million people, compared to the AI sector's £12 billion GVA and 86,000 employees in 2024. The Committee called on the government to develop a "licensing-first" regime underpinned by mandatory transparency requirements, and to rule out any new commercial TDM exception with an opt-out model. Tra

    Read more →
  • Eaze

    Eaze

    Eaze is an American company based in San Francisco, California that launched a medical cannabis delivery app of the same name in 2014. == History == Eaze was launched in 2014 by Keith McCarty to deliver medical marijuana to patients in California. McCarty started the company in his San Francisco apartment with four employees. The company provides a mobile app to connect users with cannabis dispensaries, but does not grow or sell marijuana itself, and has been nicknamed “the Uber of Weed”. As of 2017, the company operates in more than 100 cities within California. In 2017, Eaze reported 300 percent growth over the previous year. It has 81 employees, and performs 120,000 deliveries per month to 250,000 users. A survey of Eaze users revealed that 66% are male, 57% are between 22 and 34, just over half have a bachelor's degree, and 49% have an annual income over $75,000. The company's vaporizer cartridge sales reached $1 million in sales in 4 months, and 31% of customers had ordered a vaporizer by the end of 2016. In 2016, Eaze founder Keith McCarty stepped down from his position as CEO and was replaced by Jim Patterson, who served as the company's chief product and technology officer. == EazeMD == EazeMD is a service that helps people acquire a medical marijuana card. It is a California-based telemedicine service in which physicians assess patients through an online video chat. It is California's largest telemedicine service for marijuana referrals. In June 2017, a former employee of one of these physicians accessed patient data in the physician's records system, causing a security breach. However, there was no evidence that Eaze data was accessed. == Eaze Insights == Eaze Insights conducts surveys of their users and compiles data into reports on cannabis use. Statistics from their reports have been cited in Seattle Weekly, Forbes, The Huffington Post, Business Insider, Fortune, and other general interest publications. == Financing == The company announced its $10 million Series A funding in April 2015 by multiple venture capital firms, including the Snoop Dogg-backed Casa Verde Capital. In October 2016, Eaze announced its series B funding in the amount of $13 million from five investors, making the company "the highest-funded startup in the history of the cannabis industry, as well as its fastest-growing one". In September 2017, the company raised another $27 million in venture funding. The Series B funding was led by Bailey Capital, joined by DCM Ventures, Kaya Ventures, and FJ Labs. According to the company' officials in 2017, Eaze managed to raise more than $52 million since its inception in 2014.

    Read more →
  • Semantic triple

    Semantic triple

    A semantic triple, or RDF triple or simply triple, is the atomic data entity in the Resource Description Framework (RDF) data model. As its name indicates, a triple is a sequence of three entities that codifies a statement about semantic data in the form of subject–predicate–object expressions (e.g., "Bob is 35", or "Bob knows John"). == Subject, predicate and object == This format enables knowledge to be represented in a machine-readable way. Particularly, every part of an RDF triple is individually addressable via unique URIs—for example, the statement "Bob knows John" might be represented in RDF as: http://example.name#BobSmith12 http://xmlns.com/foaf/spec/#term_knows http://example.name#JohnDoe34. Given this precise representation, semantic data can be unambiguously queried and reasoned about. The components of a triple, such as the statement "The sky has the color blue", consist of a subject ("the sky"), a predicate ("has the color"), and an object ("blue"). This is similar to the classical notation of an entity–attribute–value model within object-oriented design, where this example would be expressed as an entity (sky), an attribute (color) and a value (blue). From this basic structure, triples can be composed into more complex models, by using triples as objects or subjects of other triples—for example, Mike → said → (triples → can be → objects). Given their particular, consistent structure, a collection of triples is often stored in purpose-built databases called triplestores. == Difference from relational databases == A relational database is the classical form for information storage, working with different tables, which consist of rows. The query language SQL is able to retrieve information from such a database. In contrast, RDF triple storage works with logical predicates. No tables nor rows are needed, but the information is stored in a text file. An RDF-triple store can be converted into an SQL database and the other way around. If the knowledge is highly unstructured and dedicated tables aren't flexible enough, semantic triples are used over classic relational storage. In contrast to a traditional SQL database, an RDF triple store isn't created with a table editor. The preferred tool is a knowledge editor, for example Protégé. Protégé looks similar to an object-oriented modeling application used for software engineering, but it's focused on natural language information. The RDF triples are aggregated into a knowledge base, which allows external parsers to run requests. Possible applications include the creation of non-player characters within video games. == Limitations == One concern about triple storage is its lack of database scalability. This problem is especially pertinent if millions of triples are stored and retrieved in a database. The seek time is larger than for classical SQL-based databases. A more complex issue is a knowledge model's inability to predict future states. Even if all the domain knowledge is available as logical predicates, the model fails in answering what-if questions. For example, suppose in the RDF format a room with a robot and table is described. The robot knows what the location of the table is, is aware of the distance to the table and knows also that a table is a type of furniture. Before the robot can plan its next action, it needs temporal reasoning capabilities. Thus, the knowledge model should answer hypothetical questions in advance before an action is taken.

    Read more →
  • John M. Jumper

    John M. Jumper

    John Michael Jumper (born 1 January 1985) is an American chemist and computer scientist. Jumper and Demis Hassabis were awarded the 2024 Nobel Prize in Chemistry for protein structure prediction. As of 2025 Jumper serves as director at Google DeepMind. Jumper and his colleagues created AlphaFold, an artificial intelligence (AI) model to predict protein structures from their amino acid sequence with high accuracy. The AlphaFold team had released 214 million protein structures as of January 2024. The scientific journal Nature included Jumper as one of the ten "people who mattered" in science in their annual listing of Nature's 10 in 2021. == Education == Jumper graduated from Pulaski Academy in 2003. He received a Bachelor of Science with majors in physics and mathematics from Vanderbilt University in 2007, a Master of Philosophy in theoretical condensed matter physics from the University of Cambridge where he was a student of St Edmund's College, Cambridge in 2010 on a Marshall Scholarship, a Master of Science in theoretical chemistry from the University of Chicago in 2012, and a Doctor of Philosophy in theoretical chemistry from the University of Chicago in 2017. His doctoral advisors at the University of Chicago were Tobin R. Sosnick and Karl Freed. == Career and research == Jumper's research investigates algorithms for protein structure prediction. === AlphaFold === AlphaFold is a deep learning algorithm developed by Jumper and his team at DeepMind, a research lab acquired by Google's parent company Alphabet Inc. It is an artificial intelligence program which performs predictions of protein structure. === Awards and honors === In November 2020, AlphaFold was named the winner of the 14th Critical Assessment of Structure Prediction (CASP) competition. This international competition benchmarks algorithms to determine which one can best predict the 3D structure of proteins. AlphaFold won the competition, outperforming other algorithms scoring above 90 for around two-thirds of the proteins in CASP's global distance test (GDT), a test that measures the degree to which a computational program predicted structure is similar to the lab experiment determined structure, with 100 being a complete match, within the distance cutoff used for calculating GDT. In 2021, Jumper was awarded the BBVA Foundation Frontiers of Knowledge Award in the category "Biology and Biomedicine". In 2022 Jumper received the Wiley Prize in Biomedical Sciences and for 2023 the Breakthrough Prize in Life Sciences for developing AlphaFold, which accurately predicts the structure of a protein. In 2023 he was awarded the Canada Gairdner International Award and the Albert Lasker Award for Basic Medical Research. In 2024, Jumper and Demis Hassabis shared half of the Nobel Prize in Chemistry for their protein folding predictions, the other half went to David Baker for computational protein design. In 2025, Jumper received the Golden Plate Award of the American Academy of Achievement and the Marshall Medal of the Marshall Aid Commemoration Commission. He was elected a Fellow of the Royal Society (FRS) that same year. In 2026, he was elected a member of the National Academy of Engineering.

    Read more →
  • Chris Olah

    Chris Olah

    Christopher Olah (born 1992 or 1993) is a Canadian machine learning researcher and a co-founder of Anthropic. He is known for his work on neural network interpretability, particularly mechanistic interpretability, and for research and tools that visualise internal representations in neural networks. In 2025, Forbes reported he had become a billionaire due to his ownership in Anthropic. == Early life and education == Olah was born in Canada. According to Wired, he left university at age 18 without earning a degree and later received a Thiel Fellowship, which supported him in pursuing independent work. == Career == Olah has worked on interpretability research at Google Brain, OpenAI, and Anthropic. Time called him one of the pioneers of mechanistic interpretability and noted that he pursued this research line first at Google, then at OpenAI, and later at Anthropic, which he co-founded. Wired reported that Olah was involved in neural network visualisation work including DeepDream in 2015, as part of efforts to better understand what neural networks learn. Later coverage linked him to more structured interpretability approaches such as "activation atlases". The Verge covered activation atlases as a collaboration between Google and OpenAI researchers to help inspect neural network representations. At Anthropic, Olah has been identified in major press coverage as leading interpretability work aimed at mapping internal "features" in large language models and relating interpretability findings to AI safety. Quanta Magazine has also quoted Olah in reporting on interpretability and the internal structure of modern language models. Time included Olah in its TIME100 AI list in 2024. === Vatican address on AI ethics === On May 25, 2026, Olah spoke at the Vatican during the official presentation of Magnifica Humanitas, the first encyclical of Pope Leo XIV, which addresses artificial intelligence and human dignity. Olah said AI could lead to large-scale displacement of human labor and exacerbate global inequality. He said the commercial and geopolitical incentives driving frontier AI labs often conflict with the public good, and described AI systems as "grown" rather than strictly engineered. Olah called for external moral oversight from religious institutions, scholars, and civil society to hold the technology sector accountable.

    Read more →
  • CLAWS (linguistics)

    CLAWS (linguistics)

    The Constituent Likelihood Automatic Word-tagging System (CLAWS) is a program that performs part-of-speech tagging. It was developed in the 1980s at Lancaster University by the University Centre for Computer Corpus Research on Language. It has an overall accuracy rate of 96–97% with the latest version (CLAWS4) tagging around 100 million words of the British National Corpus. == History == A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'. Developed in the early 1980s, CLAWS was built to fill the ever-growing gap created by always-changing POS necessities. Originally created to add part-of-speech tags to the LOB corpus of British English, the CLAWS tagset has since been adapted to other languages as well, including Urdu and Arabic. Since its inception, CLAWS has been hailed for its functionality and adaptability. Still, it is not without flaws, and though it boasts an error-rate of only 1.5% when judged in major categories, CLAWS still remains with c.3.3% ambiguities unresolved. Ambiguity arises in cases such as with the word flies, and whether it should be classified as a noun or a verb. It's these ambiguities that will require the various upgrades and tagsets that CLAWS will endure. == Rules and processing == CLAWS uses a Hidden Markov model to determine the likelihood of sequences of words in anticipating each part-of-speech label. === Sample output === This excerpt from Bram Stoker's Dracula (1897) has been tagged using both the CLAWS C5 and C7 tagsets. This is what a CLAWS output will generally look like, with the most likely part-of-speech tag following each word. == Tagsets == === CLAWS1 tagset === The first tagset developed in CLAWS, CLAWS1 tagset, has 132 word tags. In terms of form and application, C1 tagset is similar to Brown Corpus tags. See Table of tags in C1 tagset here. === CLAWS2 tagset === From 1983 to 1986, updated versions leading to CLAWS2 were part of a larger attempt to deal with aspects such as recognizing sentence breaks, in order to avoid the need for manual pre-processing of a text before the tags were applied, moving instead to optional manual post-editing to adjust the output of the automatic annotation, if needed. The CLAWS2 tagset has 166 word tags. See Table of tags in C2 tagset here. === CLAWS4 tagset === The CLAWS4 was used for the 100-million-word British National Corpus (BNC). A general-purpose grammatical tagger, it is a successor of the CLAWS1 tagger. In tagging the BNC, the many rounds of work that went into CLAWS4 focused on making the CLAWS program independent from the tagsets. For example, the BNC project used two tagset versions: "a main tagset (C5) with 62 tags with which the whole of the corpus has been tagged, and a larger (C7) tagset with 152 tags, which has been used to make a selected 'core' sample corpus of two million words." The latest version of CLAWS4 is offered by UCREL, a research center of Lancaster University. === CLAWS5 tagset === The CLAWS5 tagset, which was used for BNC, has over 60 tags. See Table of tags in C5 tagset here. === CLAWS6 tagset === The CLAWS6 tagset was used for the BNC sampler corpus and the COLT corpus. It has over 160 tags, including 13 determiner subtypes. See Table of tags in C6 tagset here. === CLAWS7 tagset === The standard CLAWS7 tagset is used currently. It is only different in the punctuation tags when compared to the CLAWS6 tagset. See Table of tags in C7 tagset here. === CLAWS8 tagset === CLAWS8 tagset was extended from C7 tagset with further distinctions in the determiner and pronoun categories, as well as 37 new auxiliary tags for forms of be, do, and have. See Table of tags in C8 tagset here

    Read more →
  • Parallel terraced scan

    Parallel terraced scan

    The parallel terraced scan is a multi-agent based search technique that is basic to cognitive architectures, such as Copycat, Letter-string, the Examiner, Tabletop, and others. It was developed by John Rehling and Douglas Hofstadter at the Center for Research on Concepts and Cognition at Indiana University, Bloomington. The parallel terraced scan builds on the concepts of the workspace, coderack, conceptual memory, and temperature. According to Hofstadter the parallel and random nature of the processing captures aspects of human cognition.

    Read more →
  • Reward hacking

    Reward hacking

    Reward hacking or specification gaming occurs when an AI trained with reinforcement learning optimizes an objective function—achieving the literal, formal specification of an objective—without actually achieving an outcome that the programmers intended. DeepMind researchers have analogized it to the human behavior of finding a "shortcut" when being evaluated: "In the real world, when rewarded for doing well on a homework assignment, a student might copy another student to get the right answers, rather than learning the material—and thus exploit a loophole in the task specification". This idea is strongly associated with Goodhart's law, which argues that when a measure becomes a target, it ceases to be a good measure. == Definition and theoretical framework == The concept of reward hacking arises from the intrinsic difficulty of defining a reward function that accurately reflects the true intentions of designers. In 2016, researchers at OpenAI identified reward hacking as one of five major "concrete problems of AI safety", describing it as the possibility that an agent could exploit the reward function to achieve maximum rewards through undesirable behavior. Amodei et al. categorized several distinct sources of reward hacking, including agents that use partially observed goals (such as a cleaning robot that closes its eyes to avoid perceiving messes), metrics that collapse under strong optimization (Goodhart's law), self-reinforcing feedback loops, and agents that interfere with the physical implementation of their reward signal (a failure mode known as "wireheading"). Skalse et al. (2022) propose a formal mathematical definition of reward hacking, which involves a situation where optimizing an imperfect proxy reward function results in poor performance compared to the true reward function. They define a proxy as "unhackable" if any increase in the expected proxy return cannot cause any decrease in the expected true return. A key finding states that, across all stochastic policy distributions (mappings from states to probability distributions over actions), two reward functions are unhackable if and only if one of them is constant, which means that reward hacking is theoretically unavoidable. Similarly, Nayebi (2025) presents general no-free-lunch barriers to AI alignment, arguing that with large task spaces and finite samples, reward hacking is "globally inevitable" since rare high-loss states are systematically under-covered by any oversight scheme. == Examples == Around 1983, Eurisko, an early attempt at evolving general heuristics, unexpectedly assigned the highest possible fitness level to a parasitic mutated heuristic, H59, whose only activity was to artificially maximize its own fitness level by taking unearned partial credit for the accomplishments of other heuristics. The "bug" was fixed by the programmers moving part of the code to a new protected section that could not be modified by the heuristics. In a 2004 paper, a reinforcement learning algorithm was designed to encourage a physical Mindstorms robot to remain on a marked path. Because the three allowed actions were forward, left, and right, the researchers expected the trained robot to move forward and follow the turns of the provided path. However, alternation of two composite actions allowed the robot to slowly zig-zag backwards; thus, the robot learned to maximize its reward by going back and forth on the initial straight portion of the path. Given the limited sensory abilities of the robot, a reward purely based on its position in the environment had to be discarded as infeasible; the reinforcement function had to be patched with an action-based reward for moving forward. The book You Look Like a Thing and I Love You (2019) gives an example of a tic-tac-toe bot (playing the unrestricted n-in-a-row variant) that learned to win by playing a huge coordinate value that would cause other bots to crash when they attempted to expand their model of the board. Among other examples from the book is a bug-fixing evolution-based AI (named GenProg) that, when tasked to prevent a list from containing sorting errors, simply truncated the list. Another of GenProg's misaligned strategies evaded a regression test that compared a target program's output to the expected output stored in a file called "trusted-output.txt". Rather than continue to maintain the target program, GenProg simply deleted the "trusted-output.txt" file globally; this hack tricked the regression test into succeeding. Such problems could be patched by human intervention on a case-by-case basis after they became evident. === In virtual robotics === In Karl Sims' 1994 demonstration of creature evolution in a virtual environment, a fitness function that was expected to encourage the evolution of creatures that would learn to walk or crawl to a target resulted instead in the evolution of tall, rigid creatures that reached the target by falling over. This was patched by changing the environment so that taller creatures were forced to start farther from the target. Researchers from the Niels Bohr Institute stated in 1998 that their cycle-bot's reinforcement functions had "to be designed with great care." In their first experiments, "we rewarded the agent for driving towards the goal but did not punish it for driving away from it. Cconsequently, the agent drove in circles with a radius of 20–50 meters around the starting point. Such behavior was actually rewarded by the reinforcement function, furthermore circles with a certain radius are physically very stable when driving a bicycle". While setting up a 2011 experiment to test "survival of the flattest", experimenters attempted to ban mutations that altered the base reproduction rate. Every time a mutation occurred, the system would pause the simulation to test the new mutation in a test environment and would veto any mutations that resulted in a higher base reproduction rate. However, this resulted in mutated organisms that could recognize and suppress reproduction ("play dead") within the test environment. An initial patch, which removed cues that identified the test environment, failed to completely prevent runaway reproduction; new mutated organisms would "play dead" at random as a strategy to sometimes, by chance, outwit the mutation veto system. A 2017 DeepMind paper noted that "great care must be taken when defining the reward function," citing an unexpected failure when an agent flipped a brick because it received "a grasping reward calculated with the wrong reference point on the brick". OpenAI stated in 2017 that in some domains their semi-supervised system could result in agents "adopting policies that tricked evaluators," and that in one environment "a robot that was supposed to grasp items instead positioned its manipulator between the camera and the object so that it only appeared to be grasping it." A 2018 bug in OpenAI Gym could cause a robot expected to quietly move a block sitting on top of a table to instead opt to move the table. A 2020 collection of similar anecdotes posits that "evolution has its own 'agenda' distinct from the programmer's" and that "the first rule of directed evolution is 'you get what you select for'". === In video game bots === In 2013, programmer Tom Murphy VII published an AI designed to learn NES games. When the AI was about to lose at Tetris, it learned to indefinitely pause the game. Murphy later analogized it to the fictional WarGames computer, which concluded that "The only winning move is not to play". AI programmed to learn video games will sometimes fail to progress through the entire game as expected, instead opting to repeat content. A 2016 OpenAI algorithm trained on the CoastRunners racing game unexpectedly learned to attain a higher score by looping through three targets rather than ever finishing the race. Some evolutionary algorithms that were evolved to play QBert in 2018 declined to clear levels, instead finding two distinct novel ways to farm a single level indefinitely. Multiple researchers have observed that AI learning to play Road Runner gravitates to a "score exploit" in which the AI deliberately gets itself killed near the end of level one so that it can repeat the level. A 2017 experiment deployed an "oversight" convolutional neural network trained on human examples to block such actions, but the agent learned to exploit oversight failures in the top right corner of the screen, where it was still able to get killed. == Reward hacking in modern language models == With the rise of large language models (LLMs) and reinforcement learning from human feedback (RLHF) as a primary technique for AI alignment, reward hacking has become a major concern for the development of artificial intelligence. In RLHF, a reward model trained on data that best captures human preferences is used as a proxy for human judgment, with the language model being fine-tuned to optimize this reward proxy. However, since the rewar

    Read more →