In software engineering, containerization is operating-system-level virtualization or application-level virtualization over multiple resources so that software applications can run in isolated user spaces called containers in any cloud or non-cloud environment, regardless of type or vendor. The term "container" has different meanings in different contexts, and it is important to ensure that the intended definition aligns with the audience's understanding. == Usage == Each container is basically a fully functional and portable cloud or non-cloud computing environment surrounding the application and keeping it independent of other environments running in parallel. Individually, each container simulates a different software application and runs isolated processes by bundling related configuration files, libraries and dependencies. But, collectively, multiple containers share a common operating system kernel (OS). In recent times, containerization technology has been widely adopted by cloud computing platforms like Amazon Web Services, Microsoft Azure, Google Cloud Platform, and IBM Cloud. Containerization has also been pursued by the U.S. Department of Defense as a way of more rapidly developing and fielding software updates, with first application in its F-22 air superiority fighter. == History == The concept of containerization in computing originated from early operating system–level isolation mechanisms. One of the earliest implementations was the chroot system call introduced in Version 7 Unix in 1979, which changed the apparent root directory for a process and its children, providing a basic form of filesystem isolation. In the early 2000s, more advanced forms of operating system–level virtualization were developed. FreeBSD introduced "jails" in 2000, which extended isolation by restricting processes to a subset of system resources. Around the same time, Solaris introduced "zones" (also known as Solaris Containers), providing similar capabilities with resource management and isolation features. Linux later incorporated comparable functionality through kernel features such as namespaces and control groups (cgroups), which enabled isolation of process IDs, network stacks, filesystems, and resource allocation. These features formed the foundation for Linux Containers (LXC), which provided a userspace interface for managing containers. The widespread adoption of containerization accelerated with the release of Docker in 2013, which introduced a standardized format for packaging applications and their dependencies, along with tooling for image distribution and container management. == Types of containers == OS containers Application containers == Security issues == Because of the shared OS, security threats can affect the whole containerized system. In containerized environments, security scanners generally protect the OS, but not the application containers, which adds unwanted vulnerability. == Container management, orchestration, clustering == Container orchestration or container management is mostly used in the context of application containers. Implementations providing such orchestration include Kubernetes and Docker swarm. == Container cluster management == Container clusters need to be managed. This includes functionality to create a cluster, to upgrade the software or repair it, balance the load between existing instances, scale by starting or stopping instances to adapt to the number of users, to log activities and monitor produced logs or the application itself by querying sensors. Open-source implementations of such software include OKD and Rancher. Quite a number of companies provide container cluster management as a managed service, like Alibaba, Amazon, Google, and Microsoft.
Token maxxing
Token Maxxing or Token Maxing is a metric used in an attempt to track productivity in the workplace especially for those using Artificial Intelligence (AI) based services. AI services charge for each token which represent units of effort expended by an AI service to solve a problem. Some believe that token consumption equates to productivity and thus can be used as a metric to monitor an employee's work. Supporters believe that higher token usage indicates higher productivity and higher utilization of powerful AI services. This also suggests that those not consuming enough tokens may be less productive and underutilizing powerful AI services. This belief might lead to an environment that incentivizes higher token usage to predict increased productivity. Critics of token maxxing as a metric claim that prudent workers will maximize any metric that management wants increased to gain a workplace advantage. For example: Engineers in the tech industries pressed to consume as many tokens as possible might run several AI agents in tandem, enter longer input prompts, or automate their tasks to maximize their token consumption. To management, this higher token usage may indicate potential productivity, but in reality may cause additional token costs, worker burnout, or actually create more bloated code of lower quality. Another claim is AI service companies potentially benefit from such an emphasis on token consumption and actively encourage the trend. Some developers have publicly advocated the practice. Developer Sigrid Jin, who said he used 50 billion tokens in a single year, has argued that maximizing token consumption is the best way to understand the value of AI, advising others to spend as much on AI usage as they pay in rent to obtain a return on investment. == See Also == Goodhart's law Perverse incentive Jevons Paradox
Hive (artificial intelligence company)
Hive is an American artificial intelligence company offering machine learning models via APIs to enterprise customers. Hive uses around 700,000 gig workers to train data for its models through its Hive Work app. One of Hive's major offerings is to provide automated content moderation services. == Products == Hive is reported to have been engaged to provide content moderation services to social news aggregator Reddit, Giphy, BeReal, Donald Trump-affiliated social network Truth Social, and on online chat website Chatroulette. Parler, after its shutdown by content service providers in early 2021 due to a lack of content moderation, integrated with Hive and was allowed back in the App Store. Hive's content moderation models have been leveraged widely in the livestreaming industry, where the cost of human moderation is high. Hive's models have also been used in events such as the Super Bowl and March Madness, and its contextual advertising models used by NBC Universal and Vevo. Hive provides APIs to detect deepfakes and AI-generated artwork. In early 2023, Hive released a free demo text classifier intended to detect AI-generated text. Mark Hachman at PC World rated Hive's classifier favorably and found it more reliable than OpenAI's AI text classifier. == History == Hive was founded by Kevin Guo and Dmitriy Karpman, and in April 2021, announced $85M in new capital at a valuation of $2 billion.
Eline Van der Velden
Eline van der Velden is a Dutch comedian, writer, actress and producer based in London, England. She is best known for her work creating Tilly Norwood, an AI-generated "actress". == Early life == Van der Velden was born on the Dutch island of Curaçao, Netherlands Antilles to Dutch businessman Steven van der Velden and physiotherapist Quirine van der Velden. She moved to the United Kingdom at age 14 to study drama and musical theatre at Tring Park School for the Performing Arts. She graduated with an MSc in physics from Imperial College London in 2008. == Career == She was nominated by the International Academy of Digital Arts and Sciences for the Lovie Awards and won Best Online Comedy in 2013 for two of her submitted entries. She has created multiple online shows such as Sketch My Life with London Hughes and Emily Hartridge and Match.com Parody. She became managing director of Makers Channel (makerschannel.co.uk), the first curated video platform in Europe in 2015. Makers Channel has been recently acquired by a Belgian media company De Persgroep, due to its success in the Netherlands. In 2016, she appeared in adverts for the Dutch shampoo brand Andrelon. Miss Holland, a comedy character created by Van der Velden, made headlines in 2016 as she asked the British public to teach her the national anthem. As an actress, she has starred in Dutch TV series De Troon, Beatrix and the Golden Calf-winning series Overspel. In Belgium, she appeared opposite Jamie Dornan in Flying Home. Van der Velden starred in the BBC Three series Putting It Out There, in which she challenges social perceptions of body hair, heels, spit, personal space, and authority figures. In 2018, she starred in the BBC One comedy series Soft Border Patrol and the BBC Three comedy series Miss Holland. In 2025, Particle6 Group, which Van der Velden founded in 2016, introduced Tilly Norwood, an AI-generated "actress" at the Zurich Film Festival. The announcement was met with outrage and a condemnation by the American actors' union SAG-AFTRA. == Awards and recognition == Miss Holland won the Best Online Comedy at the 2013 Lovie Awards, judged by Stephen Fry. The Match.com Parody video won Best Online Comedy People's Lovie Award, the people's vote. Miss Holland and Match.com Parody Date 1 were also featured in the 2013 Google Lovie Letters.
Lobsang Monlam
Geshe Lobsang Monlam (Tibetan: དགེ་བཤེས་བློ་བཟང་སྨོན་ལམ, Wylie: dge bshes blo bzang smon lam), born in 1976 in Ngawa eastern Tibet, is a Tibetan Buddhist scholar and programmer who uses digital technologies to preserve the Tibetan language and culture. He is best known for developing Tibetan typefaces and for the multi-volume Great Monlam Tibetan Dictionary. In 2025, he received the Snow Lion Award for Human Rights from the International Campaign for Tibet. He is also working on developing a "Dalai Lama AI," a specialized language model. == Biography == Lobsang Monlam was born in 1976 in Ngawa, eastern Tibet, anciently Tibetan Amdo, where he became a monk at the age of 12.. At the age of 17, in 1993, Lobsang Monlam fled Tibet by crossing the Himalayas to reach southern India and discovered computer science in a monastery. In 1993, he was ordained monk in the Sera Mey College in Bylakuppe, Karnataka, India, where he obtained a Geshe title in 2013.. By the early 2000s, Lobsang Monlam had already learned to paint thangkas and to compose plans and drawings. He used this knowledge to design a new assembly hall for Sera Mey, which the monks needed. Thanks to his work, Lobsang Monlam received donations from patrons of the monastery, which he was able to use to buy his first computer. He bought his first laptop in 2002 and largely taught himself how to use the hardware and software with the help of manuals. As a Buddhist scholar, he combines meditation practice with his digital work. In 2012, he founded and directs the Monlam Tibetan Information Technology Research Center in Dharamsala, which specializes in Tibetan language and software projects. Since then, he is its director, researching Tibetan language-related software. In 2019, advised by the 14th Dalai Lama, he founded Monlam IT and Research (OPC) Private Limited. Since the 2000s, Monlam has been developing Tibetan typefaces; the first Monlam Tibetan font was created in 2005. Under his direction, the Monlam Great Tibetan Dictionary was created, comprising 223 printed volumes and over 300,000 entries; approximately 150 people worked on this project for over nine years. On May 27, 2022, the Dalai Lama inaugurated the Monlam Tibetan Dictionary, produced by the Monlam Tibetan Information Technology Research Center, at Namgyal Monastery in McLeod Ganj. According to Penpa Tsering, this is the world's largest dictionary, created with guidance from the Dalai Lama, based on proposals from Lobsang Monlam and his team under the direction of Samdhong Rinpoche, and other lamas from all schools of Tibetan Buddhism and Yungdrung Bön. On December 5, 2024, Lobsang Monlam testified at a hearing of the US Congressional-Executive Commission on China in Washington, chaired by Christopher Smith, on the difficulties of preserving the Tibetan language and culture in Tibet and the Tibetan diaspora, and on the interest of the Monlam Tibetan Informatics Research Center in developing technologies for the preservation of the Tibetan language. On December 12, 2024, the work was presented to the Library of Congress in Washington, D.C., and launched at an event. The free Monlam Great Tibetan Dictionary app is available in several languages; the German version was created in collaboration with the Tibet Institute Rikon and has been downloaded millions of times. In total, Monlam has created over 37 apps related to the Tibetan language and translation; In 2023, its center launched the Monlam artificial intelligence platform, equipped with modules for machine translation, optical character recognition, speech transcription and speech synthesis.. For their efforts, he and Sophie Richardson received the Snow Lion Award in 2025, which was presented by Richard Gere and came with a prize of €3,000. In 2019, he started a PhD at Bangalore University on Library Science. He obtained his doctorate on November 30, 2023. Currently, he spearheads Monlam AI. Lobsang Monlam is developing "Dalai Lama AI" to digitally preserve the teachings of the 14th Dalai Lama, now 90 years old, for future generations. Lobsang Monlam states, "If we succeed in preserving the Dalai Lama, we also preserve the movement."
Retrieval-augmented generation
Retrieval-augmented generation (RAG) is a technique that enables large language models (LLMs) to retrieve and incorporate new information from external data sources. With RAG, LLMs first refer to a specified set of documents, then respond to user queries. These documents supplement information from the LLM's pre-existing training data. This allows LLMs to use domain-specific and/or updated information that is not available in the training data. For example, this enables LLM-based chatbots to access internal company data or generate responses based on authoritative sources. RAG improves LLMs by incorporating information retrieval before generating responses. Unlike LLMs that rely on static training data, RAG pulls relevant text from databases, uploaded documents, or web sources. According to Ars Technica, "RAG is a way of improving LLM performance, in essence by blending the LLM process with a web search or other document look-up process to help LLMs stick to the facts." This method helps reduce AI hallucinations, which have caused chatbots to describe policies that don't exist, or recommend nonexistent legal cases to lawyers that are looking for citations to support their arguments. RAG also reduces the need to retrain LLMs with new data, saving on computational and financial costs. Beyond efficiency gains, RAG also allows LLMs to include sources in their responses, so users can verify the cited sources. This provides greater transparency, as users can cross-check retrieved content to ensure accuracy and relevance. The term retrieval-augmented generation (RAG) was introduced in a 2020 paper that described combining a parametric language model with a non-parametric external memory accessed through retrieval at inference time. == RAG and LLM limitations == LLMs can provide incorrect information. For example, when Google first demonstrated its LLM tool "Google Bard" (later re-branded to Gemini), the LLM provided incorrect information about the James Webb Space Telescope. This error contributed to a $100 billion decline in Google's stock value. RAG is used to prevent these errors, but it does not solve all the problems. For example, LLMs can generate misinformation even when pulling from factually correct sources if they misinterpret the context. MIT Technology Review gives the example of an AI-generated response stating, "The United States has had one Muslim president, Barack Hussein Obama." The model retrieved this from an academic book rhetorically titled Barack Hussein Obama: America's First Muslim President? The LLM did not "know" or "understand" the context of the title, generating a false statement. LLMs with RAG are programmed to prioritize new information. This technique has been called "prompt stuffing." Without prompt stuffing, the LLM's input is generated by a user; with prompt stuffing, additional relevant context is added to this input to guide the model's response. This approach provides the LLM with key information early in the prompt, encouraging it to prioritize the supplied data over pre-existing training knowledge. == Process == Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating an information-retrieval mechanism that allows models to access and utilize additional data beyond their original training set. Ars Technica notes that "when new information becomes available, rather than having to retrain the model, all that's needed is to augment the model's external knowledge base with the updated information" ("augmentation"). IBM states that "in the generative phase, the LLM draws from the augmented prompt and its internal representation of its training data to synthesize" an answer. === RAG key stages === Typically, the data to be referenced is converted into LLM embeddings, numerical representations in the form of a large vector space. RAG can be used on unstructured (usually text), semi-structured, or structured data (for example knowledge graphs). These embeddings are then stored in a vector database to allow for document retrieval. Given a user query, a document retriever is first called to select the most relevant documents that will be used to augment the query. This comparison can be done using a variety of methods, which depend in part on the type of indexing used. The model feeds this relevant retrieved information into the LLM via prompt engineering of the user's original query. Newer implementations (as of 2023) can also incorporate specific augmentation modules with abilities such as expanding queries into multiple domains and using memory and self-improvement to learn from previous retrievals. Finally, the LLM can generate output based on both the query and the retrieved documents. Some models incorporate extra steps to improve output, such as the re-ranking of retrieved information, context selection, and fine-tuning. == Applications == Retrieval-augmented generation is used in applications where generated responses need to be grounded in external or frequently updated information. Commonly cited use cases include search engines, question-answering systems, customer support chatbots, enterprise knowledge assistants, content generation, recommendation systems, retail and e-commerce, and industrial or manufacturing workflows. In healthcare, RAG has been studied as a way to ground large language model outputs in external medical knowledge sources, although reviews have noted continuing challenges around evaluation, ethics, and clinical reliability. == Improvements == Improvements to the basic process above can be applied at different stages in the RAG flow. === Encoder === These methods focus on the encoding of text as either dense or sparse vectors. Sparse vectors, which encode the identity of a word, are typically dictionary-length and contain mostly zeros. Dense vectors, which encode meaning, are more compact and contain fewer zeros. Various enhancements can improve the way similarities are calculated in the vector stores (databases). Performance improves by optimizing how vector similarities are calculated. Dot products enhance similarity scoring, while approximate nearest neighbor (ANN) searches improve retrieval efficiency over K-nearest neighbors (KNN) searches. Accuracy may be improved with Late Interactions, which allow the system to compare words more precisely after retrieval. This helps refine document ranking and improve search relevance. Hybrid vector approaches may be used to combine dense vector representations with sparse one-hot vectors, taking advantage of the computational efficiency of sparse dot products over dense vector operations. Other retrieval techniques focus on improving accuracy by refining how documents are selected. Some retrieval methods combine sparse representations, such as SPLADE, with query expansion strategies to improve search accuracy and recall. === Retriever-centric methods === These methods aim to enhance the quality of document retrieval in vector databases: Pre-training the retriever using the Inverse Cloze Task (ICT), a technique that helps the model learn retrieval patterns by predicting masked text within documents. Supervised retriever optimization aligns retrieval probabilities with the generator model's likelihood distribution. This involves retrieving the top-k vectors for a given prompt, scoring the generated response's perplexity, and minimizing KL divergence between the retriever's selections and the model's likelihoods to refine retrieval. Reranking techniques can refine retriever performance by prioritizing the most relevant retrieved documents during training. === Language model === By redesigning the language model with the retriever in mind, a 25-time smaller network can get comparable perplexity as its much larger counterparts. Because it is trained from scratch, this method (Retro) incurs the high cost of training runs that the original RAG scheme avoided. The hypothesis is that by giving domain knowledge during training, Retro needs less focus on the domain and can devote its smaller weight resources only to language semantics. The redesigned language model is shown here. It has been reported that Retro is not reproducible, so modifications were made to make it so. The more reproducible version is called Retro++ and includes in-context RAG. === Chunking === Chunking involves various strategies for breaking up the data into vectors so the retriever can find details in it. Three types of chunking strategies are: Fixed length with overlap. This is fast and easy. Overlapping consecutive chunks helps to maintain semantic context across chunks. Syntax-based chunks can break the document up into sentences. Libraries such as spaCy or NLTK can also help. File format-based chunking. Certain file types have natural chunks built in, and it's best to respect them. For example, code files are best chunked and vectorized as whole functions or classes. HTML files should leave