AI Face Look

AI Face Look — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • IT operations analytics

    IT operations analytics

    In the fields of information technology (IT) and systems management, IT operations analytics (ITOA) is an approach or method to retrieve, analyze, and report data for IT operations. ITOA may apply big data analytics to large datasets to produce business insights. In 2014, Gartner predicted its use might increase revenue or reduce costs. By 2017, it predicted that 15% of enterprises will use IT operations analytics technologies. == Definition == IT operations analytics (ITOA) (also known as advanced operational analytics, or IT data analytics) technologies are primarily used to discover complex patterns in high volumes of often "noisy" IT system availability and performance data. Forrester Research defined IT analytics as "The use of mathematical algorithms and other innovations to extract meaningful information from the sea of raw data collected by management and monitoring technologies." Note, ITOA is different than AIOps, which focuses on applying artificial intelligence and machine learning to the applications of ITOA. == History == Operations research as a discipline emerged from the Second World War to improve military efficiency and decision-making on the battlefield. However, only with the emergence of machine learning tech in the early 2000s could an artificially intelligent operational analytics platform actually begin to engage in the high-level pattern recognition that could adequately serve business needs. A critical catalyst towards ITOA development was the rise of Google, which pioneered a predictive analytics model that represented the first attempt to read into patterns of human behavior on the Internet. IT specialists then applied predictive analytics to the IT Industry, coming forward with platforms that can sift through data to generate insights without the need for human intervention. Due to the mainstream embrace of cloud computing and the increasing desire for businesses to adopt more big data practices, the ITOA industry has grown significantly since 2010. A 2016 ExtraHop survey of large and mid-size corporations indicates that 65 percent of the businesses surveyed will seek to integrate their data silos either this year or the next. The current goals of ITOA platforms are to improve the accuracy of their APM services, facilitate better integration with the data, and to enhance their predictive analytics capabilities. == Applications == ITOA systems tend to be used by IT operations teams, and Gartner describes seven applications of ITOA systems: Root cause analysis: The models, structures and pattern descriptions of IT infrastructure or application stack being monitored can help users pinpoint fine-grained and previously unknown root causes of overall system behavior pathologies. Proactive control of service performance and availability: Predicts future system states and the impact of those states on performance. Problem assignment: Determines how problems may be resolved or, at least, direct the results of inferences to the most appropriate individuals, or communities in the enterprise for problem resolution. Service impact analysis: When multiple root causes are known, the analytics system's output is used to determine and rank the relative impact, so that resources can be devoted to correcting the fault in the most timely and cost-effective way possible. Complement best-of-breed technology: The models, structures and pattern descriptions of IT infrastructure or application stack being monitored are used to correct or extend the outputs of other discovery-oriented tools to improve the fidelity of information used in operational tasks (e.g., service dependency maps, application runtime architecture topologies, network topologies). Real time application behavior learning: Learns & correlates the behavior of Application based on user pattern and underlying Infrastructure on various application patterns, create metrics of such correlated patterns and store it for further analysis. Dynamically baselines threshold: Learns behavior of Infrastructure on various application user patterns and determines the Optimal behavior of the Infra and technological components, bench marks and baselines the low and high water mark for the specific environments and dynamically changes the bench mark baselines with the changing infra and user patterns without any manual intervention. == Types == In their Data Growth Demands a Single, Architected IT Operations Analytics Platform, Gartner Research describes five types of analytics technologies: Log analysis Unstructured text indexing, search and inference (UTISI) Topological analysis (TA) Multidimensional database search and analysis (MDSA) Complex operations event processing (COEP) Statistical pattern discovery and recognition (SPDR) == Tools and ITOA platforms == A number of vendors operate in the ITOA space:

    Read more →
  • Semantic knowledge management

    Semantic knowledge management

    In computer science, semantic knowledge management is a set of practices that seeks to classify content so that the knowledge it contains may be immediately accessed and transformed for delivery to the desired audience, in the required format. This classification of content is semantic in its nature – identifying content by its type or meaning within the content itself and via external, descriptive metadata – and is achieved by employing XML technologies. The specific outcomes of these practices are: Maintain content for multiple audiences together in a single document Transform content into various delivery formats without re-authoring Search for content more effectively Involve more subject-matter experts in the creation of content without reducing quality Reduce production costs for delivery formats Reduce the manual administration of getting the right knowledge to the right people Reduce the cost and time to localize content == Notable semantic knowledge management systems == Learn eXact Thinking Cap LCMS Thinking Cap LMS Xyleme LCMS iMapping

    Read more →
  • ITools Resourceome

    ITools Resourceome

    iTools is a distributed infrastructure for managing, discovery, comparison and integration of computational biology resources. iTools employs Biositemap technology to retrieve and service meta-data about diverse bioinformatics data services, tools, and web-services. iTools is developed by the National Centers for Biomedical Computing as part of the NIH Road Map Initiative.

    Read more →
  • MultiNet

    MultiNet

    Multilayered extended semantic networks (MultiNets) are both a knowledge representation paradigm and a language for meaning representation of natural language expressions that has been developed by Prof. Dr. Hermann Helbig on the basis of earlier Semantic Networks. It is used in a question-answering application for German called InSicht. It is also used to create a tutoring application developed by the university of University of Hagen to teach MultiNet to knowledge engineers. MultiNet is claimed to be one of the most comprehensive and thoroughly described knowledge representation systems. It specifies conceptual structures by means of about 140 predefined relations and functions, which are systematically characterized and underpinned by a formal axiomatic apparatus. Apart from their relational connections, the concepts are embedded in a multidimensional space of layered attributes and their values. Another characteristic of MultiNet distinguishing it from simple semantic networks is the possibility to encapsulate whole partial networks and represent the resulting conceptual capsule as a node of higher order, which itself can be an argument of relations and functions. MultiNet has been used in practical NLP applications such as natural language interfaces to the Internet or question answering systems over large semantically annotated corpora with millions of sentences. MultiNet is also a cornerstone of the commercially available search engine SEMPRIA-Search, where it is used for the description of the computational lexicon and the background knowledge, for the syntactic-semantic analysis, for logical answer finding, as well as for the generation of natural language answers. MultiNet is supported by a set of software tools and has been used to build large semantically based computational lexicons. The tools include a semantic interpreter WOCADI, which translates natural language expressions (phrases, sentences, texts) into formal MultiNet expressions, a workbench MWR+ for the knowledge engineer (comprising modules for automatic knowledge acquisition and reasoning), and a workbench LIA+ for the computer lexicographer supporting the creation of large semantically based computational lexica.

    Read more →
  • Materialized view

    Materialized view

    In computing, a materialized view is a database object that contains the results of a query. For example, it may be a local copy of data located remotely, or may be a subset of the rows and/or columns of a table or join result, or may be a summary using an aggregate function. The process of setting up a materialized view is sometimes called materialization. This is a form of caching the results of a query, similar to memoization of the value of a function in functional languages, and it is sometimes described as a form of precomputation. As with other forms of precomputation, database users typically use materialized views for performance reasons, i.e. as a form of optimization. Materialized views that store data based on remote tables were also known as snapshots (deprecated Oracle terminology). In any database management system following the relational model, a view is a virtual table representing the result of a database query. Whenever a query or an update addresses an ordinary view's virtual table, the DBMS converts these into queries or updates against the underlying base tables. A materialized view takes a different approach: the query result is cached as a concrete ("materialized") table (rather than a view as such) that may be updated from the original base tables from time to time. This enables much more efficient access, at the cost of extra storage and of some data being potentially out-of-date. Materialized views find use especially in data warehousing scenarios, where frequent queries of the actual base tables can be expensive. In a materialized view, indexes can be built on any column. In contrast, in a normal view, it's typically only possible to exploit indexes on columns that come directly from (or have a mapping to) indexed columns in the base tables; often this functionality is not offered at all. == Implementations == === Oracle === Materialized views were implemented first by the Oracle Database: the Query rewrite feature was added from version 8i. Example syntax to create a materialized view in Oracle: === PostgreSQL === In PostgreSQL, version 9.3 and newer natively support materialized views. In version 9.3, a materialized view is not auto-refreshed, and is populated only at time of creation (unless WITH NO DATA is used). It may be refreshed later manually using REFRESH MATERIALIZED VIEW. In version 9.4, the refresh may be concurrent with selects on the materialized view if CONCURRENTLY is used. Example syntax to create a materialized view in PostgreSQL: === SQL Server === Microsoft SQL Server differs from other RDBMS by the way of implementing materialized view via a concept known as "Indexed Views". The main difference is that such views do not require a refresh because they are in fact always synchronized to the original data of the tables that compound the view. To achieve this, it is necessary that the lines of origin and destination are "deterministic" in their mapping, which limits the types of possible queries to do this. This mechanism has been realised since the 2000 version of SQL Server. Example syntax to create a materialized view in SQL Server: === Stream processing frameworks === Apache Kafka (since v0.10.2), Apache Spark (since v2.0), Apache Flink, Kinetica DB, Materialize, RisingWave, and Epsio all support materialized views on streams of data. === Others === Materialized views are also supported in Sybase SQL Anywhere. In IBM Db2, they are called "materialized query tables". ClickHouse supports materialized views that automatically refresh on merges. MySQL doesn't support materialized views natively, but workarounds can be implemented by using triggers or stored procedures or by using the open-source application Flexviews. Materialized views can be implemented in Amazon DynamoDB using data modification events captured by DynamoDB Streams. Google announced in 8 April 2020 the availability of materialized views for BigQuery as a beta release.

    Read more →
  • Common Crawl

    Common Crawl

    The Common Crawl Foundation (Common Crawl) is a nonprofit 501(c)(3) organization that crawls the web and freely provides its archives and datasets to the public. Common Crawl was founded by Gil Elbaz. The data had mostly been primarily used by researchers and some startups until the 2020s, when AI companies started training large language models using the data. In November 2025, an investigation by The Atlantic revealed that Common Crawl misled publishers when it claimed it respected paywalls in its scraping and it was not honoring requests from publishers to have their content removed from its databases. == History == Common Crawl was founded in 2007 in San Francisco. It began publishing its crawls in 2011. By 2013, sites like TinEye were building their products off of Common Crawl. The crawl reduces the reliance of companies and researchers on Google, which has the biggest dataset. Common Crawl was designed to have more and fresher data that was more efficient to analyze and utilize than the Wayback Machine created by the Internet Archive. By 2015, 1.8 billion webpages were on the Common Crawl, which started by crawling a list of URLs donated by the search engine Blekko. They use Amazon Web Services, which provides some of its services for free, allowing computing costs to average $2-4000/month. The Common Crawl website listed 30 studies based on Common Crawl data. Before 2023, Common Crawl was not very well known outside of academic researchers who utilize the data. Common Crawl received its first requests to redact information in 2023 and increasingly started seeing its crawler, CCBot, blocked. In 2023, it began receiving significant financial support from AI companies, including Anthropic and OpenAI, each of which donated $250,000. It was also used to train Google DeepMind's large language model Gemini. By April 2023, Common Crawl was capturing 3.1 billion webpages, with an estimated 5% of pages before 2021 containing hate speech or slurs. As of 2024, Common Crawl had been cited in more than 10,000 academic studies. By 2024, The Pile and Common Crawl had been the two main training datasets being used to train AI models. In November 2025, an investigation by technology journalist Alex Reisner for The Atlantic revealed that Common Crawl misled publishers when it claimed it respected paywalls in its scraping and when it said that it was honoring requests from publishers to have their content removed from its databases. It included misleading results in the public search function on its website that showed no entries for websites that had requested their archives be removed, when in fact those sites were still included in its scrapes used by AI companies. As of 2025, Reisner found that CCBot was the most widely-blocked bot by the top 1000 websites. A 2026 article in LWN.net discussed an advantage to services like Common Crawl being that it can limit the scraping costs to websites by allowing companies and researchers to download the data from Common Crawl instead of scraping it themselves. In April 2026, Common Crawl experimentally began to distribute its data through Hugging Face Storage Bucket, in addition to its standard storage on Amazon S3. == Organization == Peter Norvig and Joi Ito have served on the advisory board. Rich Skrenta is the executive director. It has received funding almost exclusively from the Elbaz Family Foundation Trust until 2023 when it started receiving donations from the AI industry. == Refined versions == A number of organizations take raw Common Crawl data and refine it into datasets that exclude edgy content or are otherwise higher-quality for their purposes, such as FineWeb, DCLM and C4. === Colossal Clean Crawled Corpus === Google version of the Common Crawl is called the Colossal Clean Crawled Corpus, or C4 for short. It was constructed for the training of the T5 language model series in 2019. As of 2023, there were some concerns over copyrighted content in the C4 as well as racist content. A 2024 study found that 45% of content was explicitly restricted by websites' terms of service to be used for purposes like AI training by for-profit companies.

    Read more →
  • Illia Polosukhin

    Illia Polosukhin

    Illia Polosukhin is a Ukrainian-born computer scientist and entrepreneur known for his work on the transformer architecture in machine learning and for co-founding the NEAR blockchain. == Early life and education == Polosukhin studied at the Kharkiv Polytechnic Institute, later relocating to San Diego and then moving to Silicon Valley. == Career == === Google and transformer research === Polosukhin worked at Google and was part of the team associated with research on self-attention that culminated in the 2017 paper Attention Is All You Need, widely credited with introducing the transformer architecture used in modern large language models. === NEAR Protocol === After his work in machine learning, Polosukhin became a co-founder of NEAR Protocol and later associated with the NEAR Foundation ecosystem. In 2023, Polosukhin publicly argued that increasingly capable A.I. systems should be more transparent and user-controlled, and expressed skepticism that conventional regulation alone would solve problems created by closed, corporate models, warning about risks such as regulatory capture. He has promoted “user-owned AI” concepts that combine open approaches with decentralized infrastructure aligned with the blockchain technology. In 2024, Polosukhin downplayed scenarios of A.I. independently causing human extinction, arguing that conflicts are driven by people and that misuse of AI would reflect human intent and incentives. Later this year, Polosukhin said the NEAR Foundation would reduce its workforce by about 40%. == Publications == Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Lukasz Kaiser, Illia Polosukhin; et al. (2017). "Attention Is All You Need". arXiv.{{cite journal}}: CS1 maint: multiple names: authors list (link)

    Read more →
  • Andrej Karpathy

    Andrej Karpathy

    Andrej Karpathy (born 23 October 1986) is a Slovak-Canadian AI researcher, who co-founded and formerly worked at OpenAI, where he specialized in deep learning and computer vision. He also worked as the director of artificial intelligence and Autopilot Vision at Tesla, and in 2024 he founded Eureka Labs, an AI education platform. In 2026 he joined Anthropic as part of the pretraining team. == Education and early life == Karpathy was born in Bratislava, Czechoslovakia (now Slovakia), and moved with his family to Toronto when he was 15. He completed his Computer Science and Physics bachelor's degrees at University of Toronto in 2009 and his master's degree at University of British Columbia in 2011, where he worked on physically simulated figures (for example, a simulated runner or a simulated person in a crowd) with his adviser Michiel van de Panne. In 2006, Karpathy began posting videos on YouTube on his channel, badmephisto. He garnered fame by posting Rubik's cube tutorials which have been used by famous speedcubers such as Feliks Zemdegs. The channel has over 9 million views as of June 2025. Karpathy received a PhD from Stanford University in 2015 under the supervision of Fei-Fei Li, focusing on the intersection of natural language processing and computer vision, and deep learning models suited for this task. == Career and research == He authored and was the primary instructor of the first deep learning course at Stanford, CS 231n: Convolutional Neural Networks for Visual Recognition. The course became one of the largest classes at Stanford, growing from 150 students in 2015 to 750 in 2017. Karpathy is a founding member of the artificial intelligence research group OpenAI, where he was a research scientist from 2015 to 2017. In June 2017 he became Tesla's director of artificial intelligence and reported to Elon Musk. He was named one of MIT Technology Review's Innovators Under 35 for 2020. After taking a several-months-long sabbatical from Tesla, he announced he was leaving the company in July 2022. As of February 2023, he makes YouTube videos on how to create artificial neural networks. On February 9, 2023, Karpathy announced he was returning to OpenAI. A year later on February 13, 2024, an OpenAI spokesperson confirmed that Karpathy had left OpenAI. In the same year, he was named one of Time Magazine's 100 Most Influential People in AI. On July 16, 2024, Karpathy announced on his X account that he started a new AI education company called Eureka Labs. Their first product was the AI course, LLM101n. He also has a broader educational effort, the "Zero to Hero" series on LLM fundamentals. The company also advocates for AI teaching assistants, a concept which has been criticized due to data privacy concerns and the removal of personal connection between teacher and student. In February 2025, Karpathy coined the term vibe coding to describe how AI tools allow hobbyists to construct apps and websites just by typing prompts. On May 19, 2026, he announced that he joined Anthropic via a statement on X, while the company stated that he will be leading a team for research in pretraining.

    Read more →
  • Starlight Information Visualization System

    Starlight Information Visualization System

    Starlight is a software product originally developed at Pacific Northwest National Laboratory and now by Future Point Systems. It is an advanced visual analysis environment. In addition to using information visualization to show the importance of individual pieces of data by showing how they relate to one another, it also contains a small suite of tools useful for collaboration and data sharing, as well as data conversion, processing, augmentation and loading. The software, originally developed for the intelligence community, allows users to load data from XML files, databases, RSS feeds, web services, HTML files, Microsoft Word, PowerPoint, Excel, CSV, Adobe PDF, TXT files, etc. and analyze it with a variety of visualizations and tools. The system integrates structured, unstructured, geospatial, and multimedia data, offering comparisons of information at multiple levels of abstraction, simultaneously and in near real-time. In addition Starlight allows users to build their own named entity-extractors using a combination of algorithms, targeted normalization lists and regular expressions in the Starlight Data Engineer (SDE). As an example, Starlight might be used to look for correlations in a database containing records about chemical spills. An analyst could begin by grouping records according to the cause of the spill to reveal general trends. Sorting the data a second time, they could apply different colors based on related details such as the company responsible, age of equipment or geographic location. Maps and photographs could be integrated into the display, making it even easier to recognize connections among multiple variables. Starlight has been deployed to both the Iraq and Afghanistan wars and used on a number of large-scale projects. PNNL began developing Starlight in the mid-1990s, with funding from the Land Information Warfare Agency, a part of the Army Intelligence and Security Command and continued developed at the laboratory with funding from the NSA and the CIA. Starlight integrates visual representations of reports, radio transcripts, radar signals, maps and other information. The software system was recently honored with an R&D 100 Award for technical innovation. In 2006 Future Point Systems, a Silicon Valley startup, acquired rights to jointly develop and distribute the Starlight product in cooperation with the Pacific Northwest National Laboratory. The software is now also used outside of the military/intelligence communities in a number of commercial environments.

    Read more →
  • National Security Memorandum on Artificial Intelligence

    National Security Memorandum on Artificial Intelligence

    The Memorandum on Advancing the United States' Leadership in Artificial Intelligence; Harnessing Artificial Intelligence to Fulfill National Security Objectives; and Fostering the Safety, Security, and Trustworthiness of Artificial Intelligence is a memorandum signed by U.S. president Joe Biden. The memorandum is described as seeking to advance U.S. leadership in the development of safe, secure, and trustworthy artificial intelligence (AI); enable the U.S. government to use AI for national security; and contribute to international AI governance.

    Read more →
  • Interactive activation and competition networks

    Interactive activation and competition networks

    Interactive activation and competition (IAC) networks are artificial neural networks used to model memory and intuitive generalizations. They are made up of nodes or artificial neurons which are arrayed and activated in ways that emulate the behaviors of human memory. The IAC model is used by the parallel distributed processing (PDP) Group and is associated with James L. McClelland and David E. Rumelhart; it is described in detail in their book Explorations in Parallel Distributed Processing: A Handbook of Models, Programs, and Exercises. This model does not contradict any currently known biological data or theories, and its performance is close enough to human performance as to warrant further investigation.

    Read more →
  • Project Maven

    Project Maven

    Project Maven (officially Algorithmic Warfare Cross Functional Team) is a United States Department of Defense initiative launched in 2017 to accelerate the adoption of machine learning and data integration across U.S. military intelligence workflows, specifically in intelligence, surveillance, target acquisition, and reconnaissance as well as in geospatial intelligence. It initially focused on applying computer vision for processing images and videos for intelligence purposes. Currently, the program operates under the National Geospatial-Intelligence Agency (NGA) and encompasses multiple applications across the Department of Defense spanning military operation targeting support, data integration and visualization for analysts, and training machine learning models on labeled datasets of military assets and infrastructure. It integrates data from drones, satellites, and other sensors to flag potential targets, present findings to human analysts, and relay their decisions to operational systems. The program originated under Deputy Secretary Robert O. Work after he raised concerns about China's advances in defense applications of artificial intelligence. Project leaders, Colonel Drew Cukor, USMC, and Lt. Gen. Jack Shanahan, framed the program as human-in-the-loop decision support inside the Department of Defense rather than as an autonomous weapons platform. Contractors supporting Maven have included Google, which withdrew in 2018 after internal protests, and follow-on integrators such as Palantir, Anduril, Amazon Web Services, and Anthropic (withdrew in 2026). The Pentagon credits Maven with providing 2024 targeting support for U.S. airstrikes in Iraq, Syria, and Yemen, along with locating hostile maritime assets in the Red Sea. == Administrative history == Initially, the effort was led by Robert O. Work who was concerned about China's military use of the emerging technology. Reportedly, Pentagon development stops short of acting as an AI weapons system capable of firing on self-designated targets. The project was established in a memo by the U.S. Deputy Secretary of Defense on 26 April 2017 proposing an "Algorithmic Warfare Cross-Functional Team". With the help of Defense Innovation Unit, the project obtained the support of top talents in AI outside of the traditional defense contracting base. It was initially funded for $70 million. Jack Shanahan was the director of the project during April 2017 to December 2018. At the second Defense One Tech Summit in July 2017, Cukor said that the investment in a "deliberate workflow process" was funded by the Department [of Defense] through its "rapid acquisition authorities" for about "the next 36 months". In the defense industry, the standard procedure for the military to acquire hardware is by way of research, development, test, and evaluation (RDT&E), followed by production and sustainment. In 2017, acquiring software was done in the same way as hardware. This created a problem, since software is constantly updated. Project Maven procured software using Broad Agency Announcements, a flexible contracting vehicle that categorized software as consistently RDT&E, allowing constant updating. Another issue was that the government usually acquired the intellectual property (IP) for procured software, and with the project, only parts of the IP of the software was acquired. Cukor used the principle of "platform IP belongs to the vendor, configurations on top are the customer's". For example, Palantir retained IP to their core platform, while the government obtained the IP to Maven-specific logic configured on top of it. According to US Air Force Lt. Gen. Jack Shanahan in November 2017, it is "designed to be that pilot project, that pathfinder, that spark that kindles the flame front of artificial intelligence across the rest of the [Defense] Department". Its chief, U.S. Marine Corps Col. Drew Cukor, said: "People and computers will work symbiotically to increase the ability of weapon systems to detect objects." Project Maven has been noted by allies, such as Australia's Ian Langford, for the ability to identify adversaries by harvesting data from sensors on UAVs and satellites. As of 2017 December, 150,000 images had been manually labelled to establish the first training data sets, and it was projected to reach one million by January 2018. Project Maven was funded for $221 million in fiscal 2020. In 2020, the House and Senate conferees on the National Defense Authorization Act for Fiscal Year 2021, agreed to the Senate's recommendation to fund the Pentagon's $250 million request for Project Maven. At the GEOINT Symposium of 2022, it was announced that Project Maven was transferred from the Office of the Under Secretary of Defense for Intelligence and Security to the NGA, under President Biden’s proposed budget for Fiscal Year 2023. It became a Program of Record on 2023 November 7. Frank "Trey" Whitworth, vice admiral, was the director of NGA from June 2022 to November 2025. Whitworth was initially skeptical of the program, suspecting it was incautious about the targeting principles, but later regarded it as "important work". As of 2024, the project is jointly administered by the NGA and the CDAO, and its director is Rachel Martin. Before 2025, Biden appointees within CDAO had held back AI development for safety and reliability concerns, though as of 2025, this has stopped. As of 2024, Maven provided the cloud infrastructure, software capabilities, and AI for CDAO's Combined Joint All-Domain Command and Control initiatives. As of summer 2025, there were eight Maven initiatives. Of these, five were in the NGA, including analyzing drone feeds and satellite imagery. On 18 September 2025, the UK government announced a new partnership with Palantir to develop AI-powered military capabilities for decision-making and targeting, identifying opportunities worth up to £750 million over five years. On 25 March 2025, the NATO Communications and Information Agency and Palantir finalized the acquisition of the Palantir Maven Smart System NATO (MSS NATO) for employment within NATO's Allied Command Operations. It was planned to be used within 30 days of acquisition. In a letter to Pentagon on 9 March 2026, Steve Feinberg stated that Project Maven will become an official program of record by September 2026, the close of the current fiscal year. The project would transfer from the NGA to the CDAO within 30 days. Future contracting with Palantir would be handled by the US Army. In 2026-03, it was announced that the US Army Combined Arms Command would integrate Maven into its training. == Technology == Project Maven uses machine learning algorithms to analyze and fuse vast amounts of surveillance data from multiple sources made possible through data integration using Palantir Technologies. The data sources include photographs, satellite imagery, geolocation data (IP address, geotag, metadata, etc) from communications intercepts, infrared sensors, synthetic-aperture radar, and more. The system is mainly used for assisting analysts in intelligence, surveillance, target acquisition, and reconnaissance. Machine learning systems, including object recognition systems, process the data and identify potential targets, such as enemy tanks or location of new military facility. The training dataset included at least 4 million images of military objects such as warships, labelled by humans. The user interface is called Maven Smart System. It could display information such as aircraft movements, logistics, locations of key personnel, locations on the no-strike list, ships, etc. Yellow-outlined boxes show potential targets. Blue-outlined boxes show friendly forces or no-strike zones. It could also transmit, directly to weapons, a human decision to fire weapons. Internal documentation referred to "Maven ATR: automatic target recognition". Initially the project focused on applications of computer vision. The project's leaders were particularly impressed by model performance on ImageNet. As of 2018, the purpose of the system was AI-enabled analysis of full-motion video. In 2022 it expanded to combatant commands under the AI and Data Acceleration Initiative. In 2022, it was reported that the project expanded to non-image data, including captured enemy material, maritime intelligence, and publicly available information. In 2024, it was stated that Maven's key technical contribution was data management: Maven standardizes heterogeneous data through an ontology layer so data can be fused, exchanged across cloud and edge systems, and used by multiple applications. The system was presented as a broader data-centric warfighting system that feeds apps for planning, preparing, and executing operations. In 2024, the Broad Area Surveillance-Targeting (BAS-T) is a part of Maven. The system detects objects in images and uses data fusion to produce a common operational picture containing "priority based, in-depth assessment of the enemy systems pre

    Read more →
  • Brill tagger

    Brill tagger

    The Brill tagger is an inductive method for part-of-speech tagging. It was described and invented by Eric Brill in his 1993 PhD thesis. It can be summarized as an "error-driven transformation-based tagger". It is: a form of supervised learning, which aims to minimize error; and, a transformation-based process, in the sense that a tag is assigned to each word and changed using a set of predefined rules. In the transformation process, if the word is known, it first assigns the most frequent tag, or if the word is unknown, it naively assigns the tag "noun" to it. High accuracy is eventually achieved by applying these rules iteratively and changing the incorrect tags. This approach ensures that valuable information such as the morphosyntactic construction of words is employed in an automatic tagging process. == Algorithm == The algorithm starts with initialization, which is the assignment of tags based on their probability for each word (for example, "dog" is more often a noun than a verb). Then "patches" are determined via rules that correct (probable) tagging errors made in the initialization phase: Initialization: Known words (in vocabulary): assigning the most frequent tag associated to a form of the word Unknown word == Rules and processing == The input text is first tokenized, or broken into words. Typically in natural language processing, contractions such as "'s", "n't", and the like are considered separate word tokens, as are punctuation marks. A dictionary and some morphological rules then provide an initial tag for each word token. For example, a simple lookup would reveal that "dog" may be a noun or a verb (the most frequent tag is simply chosen), while an unknown word will be assigned some tag(s) based on capitalization, various prefix or suffix strings, etc. (such morphological analyses, which Brill calls Lexical Rules, may vary between implementations). After all word tokens have (provisional) tags, contextual rules apply iteratively, to correct the tags by examining small amounts of context. This is where the Brill method differs from other part of speech tagging methods such as those using Hidden Markov Models. Rules are reapplied repeatedly, until a threshold is reached, or no more rules can apply. Brill rules are of the general form: tag1 → tag2 IF Condition where the Condition tests the preceding and/or following word tokens, or their tags (the notation for such rules differs between implementations). For example, in Brill's notation: IN NN WDPREVTAG DT while would change the tag of a word from IN (preposition) to NN (common noun), if the preceding word's tag is DT (determiner) and the word itself is "while". This covers cases like "all the while" or "in a while", where "while" should be tagged as a noun rather than its more common use as a conjunction (many rules are more general). Rules should only operate if the tag being changed is also known to be permissible, for the word in question or in principle (for example, most adjectives in English can also be used as nouns). Rules of this kind can be implemented by simple Finite-state machines. See Part of speech tagging for more general information including descriptions of the Penn Treebank and other sets of tags. Typical Brill taggers use a few hundred rules, which may be developed by linguistic intuition or by machine learning on a pre-tagged corpus. == Code == Brill's code pages at Johns Hopkins University are no longer on the web. An archived version of a mirror of the Brill tagger at its latest version as it was available at Plymouth Tech can be found on Archive.org. The software uses the MIT License.

    Read more →
  • Mira Murati

    Mira Murati

    Ermira "Mira" Murati (born 16 December 1988) is an Albanian-American business executive. She launched an AI startup called Thinking Machines Lab in February 2025. Previously she was the chief technology officer of OpenAI, and a senior product manager at Tesla. == Early life and education == Murati was born on 16 December 1988 in Vlorë, Albania. She is fluent in Italian. At age 16, she won a United World Colleges (UWC) scholarship to study at Pearson College on Vancouver Island in Canada, from which she graduated in 2007 with an International Baccalaureate. After Pearson, she went to the United States to pursue further studies through a dual-degree program, earning a Bachelor of Arts from Colby College in 2011, and a Bachelor of Engineering degree from Dartmouth College's Thayer School of Engineering in 2012. == Career == === Early career === Murati interned in 2011 as a summer analyst at Goldman Sachs in Tokyo, Japan. She then briefly worked for Zodiac Aerospace as an intern before joining the electric car company Tesla in 2013 as a product manager on the Model X. From 2016 to 2018, she worked for the augmented reality start-up Leap Motion (now Ultraleap). === OpenAI === In 2018, she joined OpenAI as the VP of Applied AI and partnerships. She became chief technology officer (CTO) in May 2022. She led OpenAI's work on ChatGPT, Dall-E, Codex and Sora, while overseeing its research, product and safety teams. She oversaw technical advancements and direction of OpenAI's various projects, including the development of advanced AI models and tools. Murati worked on several of OpenAI's notable products, such as the Generative Pretrained Transformer (GPT) series of language models. Commenting about the potential loss of creative jobs to AI, Murati said that "maybe [the jobs] shouldn’t have been there in the first place". In October 2023, Murati was ranked 57th on Fortune's list of "The 100 Most Powerful Women in Business of 2023". In November 2023, Murati became interim chief executive officer of OpenAI following the removal of Sam Altman from the job. She had collaborated with Ilya Sutskever, whose 52-page memo outlining concerns about Altman relied heavily on screenshots and information she provided, which contributed to the board's decision to oust him. Murati was replaced by Emmett Shear three days later, who left when Altman was reinstated five days later. Following these events, Murati returned to her role as CTO. In June 2024, Dartmouth College awarded Murati an honorary Doctor of Science for having "democratized technology and advanced a better, safer world for us all". In September 2024, Murati announced that she was stepping down as CTO to allow her the opportunity to "do my own exploration". This move came amid a wider executive exodus as OpenAI chief research officer Bob McGrew and a vice president of research, Barret Zoph, also announced their departures soon after. === Thinking Machines Lab === In February 2025, Murati launched Thinking Machines Lab, a new public benefit corporation aiming "to make AI systems more widely understood, customizable, and generally capable". She was reported to have hired "a team of about 30 leading researchers and engineers from competitors including Meta, Mistral, and OpenAI." People involved with the startup include OpenAI cofounder John Schulman, and advisors Alec Radford and Bob McGrew. The following month, Bloomberg reported that the company had reached an estimated valuation of $9 billion, with an "average founder stake value" of $1.4 billion. In April 2025, Thinking Machines Lab reportedly aimed for a $2 billion seed round (requiring a minimum investment of $50 million). The round was led by Andreessen Horowitz and included participation from the government of Albania, valuing the company at $12 billion. Thinking Machines Lab follows a governance structure wherein Mira Murati holds a deciding vote on board matters, weighted to provide her with a majority decision-making capability. In October 2025, Thinking Machines Lab announced its first product, Tinker, a tool used to create custom frontier AI models. == Publications == Murati, Ermira (Spring 2022). "Language & Coding Creativity". Daedalus. 151 (2). Cambridge, MA: American Academy of Arts and Sciences (AAAS): 156–167. doi:10.1162/daed_a_01907. Retrieved 25 September 2024.

    Read more →
  • Dr.Fill

    Dr.Fill

    Dr.Fill is a computer program that solves American-style crossword puzzles. It was developed by Matt Ginsberg and described by Ginsberg in an article in the Journal of Artificial Intelligence Research. Ginsberg claims in that article that Dr.Fill is among the top fifty crossword solvers in the world. == History == Dr.Fill participated in the 2012 American Crossword Puzzle Tournament, finishing 141st of approximately 650 entrants with a total score of just over 10,000 points. The appearance led to a variety of descriptions of Dr.Fill in the popular press, including The Economist, the San Francisco Chronicle and Gizmodo. A description of Dr.Fill appeared on the front page of the March 17, 2012 New York Times. Dr.Fill's score in 2013 improved to 10,550, which would have earned it 92nd place. Videos of the program solving the problems from the tournament are available on YouTube. The score in 2014 improved further to 10,790, which would have tied for 67th place. A video of the program solving the first six puzzles from that tournament, together with a talk given by Ginsberg describing its performance, can be found on YouTube. Dr.Fill has largely continued to improve since the 2014 event. In 2015, it scored 10,920 points and finished in 55th place. In 2016, it scored 11,205 points and finished in 41st place. In 2017, it scored 11,795 and finished in 11th place. In 2018, it scored 10,740 points, dropping to 78th place. Dr.Fill returned to "form" in 2019, once again scoring 11,795 and finishing in 14th place. The 2020 ACPT was cancelled due to COVID-19, and Dr.Fill participated as a non-competitor in the Boswords tournament instead. The program outperformed the humans, scoring 11,218 points (fast solves with a total of one mistake) while the best scoring human scored 10,994 points (slower solves but no mistakes). The 2021 ACPT was virtual, again due to COVID-19. The Dr.Fill effort was joined by the Berkeley NLP Group, creating a hybrid system named Berkeley Crossword Solver, and Dr.Fill won the main event, scoring 12,825 points with Erik Agard, the highest scoring human, scoring 12,810 points. The tournament was won by Tyler Hinman (12,760 points), who completed the championship puzzle perfectly in three minutes. Dr.Fill also completed that puzzle perfectly, but in 49 seconds. After winning the tournament, Ginsberg announced on August 8, 2021, that both he and Dr.Fill would be retiring from crosswords. == Algorithm == As described by Ginsberg, Dr.Fill works by converting a crossword to a weighted constraint satisfaction problem and then attempting to maximize the probability that the fill is correct. Probabilities for individual words or phrases in the puzzle are computed using relatively simple statistical techniques based on features such as previous appearances of the clue, number of Google hits for the fill, and so on. In doing this, Dr.Fill is attempting to solve a problem similar to that tackled by the Jeopardy!-playing program Watson; Dr.Fill runs on a laptop instead of a supercomputer and Ginsberg remarks that Watson is far more effective than Dr.Fill at solving this portion of the problem. Instead of computational horsepower, Dr.Fill relies on the constraints provided by crossing words to refine its answers. A variety of techniques from artificial intelligence are applied to attempt to find the most likely fill. These include a small amount of lookahead, limited discrepancy search, and postprocessing. Ginsberg remarks that postprocessing was chosen over branch and bound because the two techniques are mutually incompatible and postprocessing was found to be more effective in this domain.

    Read more →