AI For Students Copilot

AI For Students Copilot — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Shaded Picture System

    Shaded Picture System

    The Shaded Picture System was a 3D raster computer display processor introduced by Evans & Sutherland in October 1973. The Shaded Picture System was the first general-purpose, commercially available raster computer graphics display processor capable of real-time, shaded 3D graphics. It could only display black and white graphics at a resolution of 256 by 256. It was extremely expensive, and very few units were ever sold. == History == The principles of shaded, hidden-line true 3D graphics were pioneered at the University of Utah in 1967. However, this algorithm was slow and would take several minutes to produce an image. In 1970, Gary Watkins developed a FORTRAN simulator of a faster algorithm that would theoretically generate shaded 3D images in real-time, "if implemented in suitable hardware". The simulator itself was still not capable of real-time shaded 3D image rendering. Evans & Sutherland developed a functional prototype of this "suitable hardware", which was later sold as the Shaded Picture System in 1973. About a year earlier in 1972, Evans & Sutherland sold the first and only CT1 to Case Western Reserve University. The CT1, or Continuous Tone 1, was a specialized image generator, not meant as a marketable or mass-produced product. At the time, the CT1, along with G.E./NASA's upgraded Electronic Scene Generator from 1971, would have been the only real-time raster graphics systems sold to customers comparable to the Shaded Picture System, although both the CT1 and Electronic Scene Generator were intentionally produced as one-off products and specialized for the needs of their customers. The Shaded Picture System, in contrast, was intentionally marketed.In early 1975, Evans & Sutherland demonstrated a random-access video frame buffer using relatively low-cost semiconductor memory, which was much more capable than the Shaded Picture System. When interfaced with a (non-shaded) E&S Picture System, the frame buffer had a resolution of 512 by 512 in grayscale and partial color capabilities. By the end of 1975, this frame buffer was commercially available.

    Read more →
  • Time Warp Edit Distance

    Time Warp Edit Distance

    In the data analysis of time series, Time Warp Edit Distance (TWED) is a measure of similarity (or dissimilarity) between pairs of discrete time series, controlling the relative distortion of the time units of the two series using the physical notion of elasticity. In comparison to other distance measures, (e.g. DTW (dynamic time warping) or LCS (longest common subsequence problem)), TWED is a metric. Its computational time complexity is O ( n 2 ) {\displaystyle O(n^{2})} , but can be drastically reduced in some specific situations by using a corridor to reduce the search space. Its memory space complexity can be reduced to O ( n ) {\displaystyle O(n)} . It was first proposed in 2009 by P.-F. Marteau. == Definition == δ λ , ν ( A 1 p , B 1 q ) = M i n { δ λ , ν ( A 1 p − 1 , B 1 q ) + Γ ( a p ′ → Λ ) d e l e t e i n A δ λ , ν ( A 1 p − 1 , B 1 q − 1 ) + Γ ( a p ′ → b q ′ ) m a t c h o r s u b s t i t u t i o n δ λ , ν ( A 1 p , B 1 q − 1 ) + Γ ( Λ → b q ′ ) d e l e t e i n B {\displaystyle \delta _{\lambda ,\nu }(A_{1}^{p},B_{1}^{q})=Min{\begin{cases}\delta _{\lambda ,\nu }(A_{1}^{p-1},B_{1}^{q})+\Gamma (a_{p}^{'}\to \Lambda )&{\rm {delete\ in\ A}}\\\delta _{\lambda ,\nu }(A_{1}^{p-1},B_{1}^{q-1})+\Gamma (a_{p}^{'}\to b_{q}^{'})&{\rm {match\ or\ substitution}}\\\delta _{\lambda ,\nu }(A_{1}^{p},B_{1}^{q-1})+\Gamma (\Lambda \to b_{q}^{'})&{\rm {delete\ in\ B}}\end{cases}}} whereas Γ ( α p ′ → Λ ) = d L P ( a p ′ , a p − 1 ′ ) + ν ⋅ ( t a p − t a p − 1 ) + λ {\displaystyle \Gamma (\alpha _{p}^{'}\to \Lambda )=d_{LP}(a_{p}^{'},a_{p-1}^{'})+\nu \cdot (t_{a_{p}}-t_{a_{p-1}})+\lambda } Γ ( α p ′ → b q ′ ) = d L P ( a p ′ , b q ′ ) + d L P ( a p − 1 ′ , b q − 1 ′ ) + ν ⋅ ( | t a p − t b q | + | t a p − 1 − t b q − 1 | ) {\displaystyle \Gamma (\alpha _{p}^{'}\to b_{q}^{'})=d_{LP}(a_{p}^{'},b_{q}^{'})+d_{LP}(a_{p-1}^{'},b_{q-1}^{'})+\nu \cdot (|t_{a_{p}}-t_{b_{q}}|+|t_{a_{p-1}}-t_{b_{q-1}}|)} Γ ( Λ → b q ′ ) = d L P ( b p ′ , b p − 1 ′ ) + ν ⋅ ( t b q − t b q − 1 ) + λ {\displaystyle \Gamma (\Lambda \to b_{q}^{'})=d_{LP}(b_{p}^{'},b_{p-1}^{'})+\nu \cdot (t_{b_{q}}-t_{b_{q-1}})+\lambda } Whereas the recursion δ λ , ν {\displaystyle \delta _{\lambda ,\nu }} is initialized as: δ λ , ν ( A 1 0 , B 1 0 ) = 0 , {\displaystyle \delta _{\lambda ,\nu }(A_{1}^{0},B_{1}^{0})=0,} δ λ , ν ( A 1 0 , B 1 j ) = ∞ f o r j ≥ 1 {\displaystyle \delta _{\lambda ,\nu }(A_{1}^{0},B_{1}^{j})=\infty \ {\rm {{for\ }j\geq 1}}} δ λ , ν ( A 1 i , B 1 0 ) = ∞ f o r i ≥ 1 {\displaystyle \delta _{\lambda ,\nu }(A_{1}^{i},B_{1}^{0})=\infty \ {\rm {{for\ }i\geq 1}}} with a 0 ′ = b 0 ′ = 0 {\displaystyle a'_{0}=b'_{0}=0} === Implementations === An implementation of the TWED algorithm in C with a Python wrapper is available at TWED is also implemented into the Time Series Subsequence Search Python package (TSSEARCH for short) available at [1]. An R implementation of TWED has been integrated into the TraMineR, a R package for mining, describing and visualizing sequences of states or events, and more generally discrete sequence data. Additionally, cuTWED is a CUDA- accelerated implementation of TWED which uses an improved algorithm due to G. Wright (2020). This method is linear in memory and massively parallelized. cuTWED is written in CUDA C/C++, comes with Python bindings, and also includes Python bindings for Marteau's reference C implementation. ==== Python ==== Backtracking, to find the most cost-efficient path: ==== MATLAB ==== Backtracking, to find the most cost-efficient path:

    Read more →
  • Knuth–Plass line-breaking algorithm

    Knuth–Plass line-breaking algorithm

    The Knuth–Plass algorithm is a line-breaking algorithm designed for use in Donald Knuth's typesetting program TeX. It integrates the problems of text justification and hyphenation into a single algorithm by using a discrete dynamic programming method to minimize a loss function that attempts to quantify the aesthetic qualities desired in the finished output. The algorithm works by dividing the text into a stream of three kinds of objects: boxes, which are non-resizable chunks of content, glue, which are flexible, resizeable elements, and penalties, which represent places where breaking is undesirable (or, if negative, desirable). The loss function, known as "badness", is defined in terms of the deformation of the glue elements, and any extra penalties incurred through line breaking. Making hyphenation decisions follows naturally from the algorithm, but the choice of possible hyphenation points within words, and optionally their preference weighting, must be performed first, and that information inserted into the text stream in advance. Knuth and Plass' original algorithm does not include page breaking, but may be modified to interface with a pagination algorithm, such as the algorithm designed by Plass in his PhD thesis. Typically, the cost function for this technique should be modified so that it does not count the space left on the final line of a paragraph; this modification allows a paragraph to end in the middle of a line without penalty. The same technique can also be extended to take into account other factors such as the number of lines or costs for hyphenating long words. == Computational complexity == A naive brute-force exhaustive search for the minimum badness by trying every possible combination of breakpoints would take an impractical O ( 2 n ) {\displaystyle O(2^{n})} time. The classic Knuth-Plass dynamic programming approach to solving the minimization problem is a worst-case O ( n 2 ) {\displaystyle O(n^{2})} algorithm but usually runs much faster, in close to linear time. Solving for the Knuth-Plass optimum can be shown to be a special case of the convex least-weight subsequence problem, which can be solved in O ( n ) {\displaystyle O(n)} time. Methods to do this include the SMAWK algorithm. == Simple example of minimum raggedness metric == For the input text AAA BB CC DDDDD with line width 6, a greedy algorithm that puts as many words on a line as possible while preserving order before moving to the next line, would produce: ------ Line width: 6 AAA BB Remaining space: 0 CC Remaining space: 4 DDDDD Remaining space: 1 The sum of squared space left over by this method is 0 2 + 4 2 + 1 2 = 17 {\displaystyle 0^{2}+4^{2}+1^{2}=17} . However, the optimal solution achieves the smaller sum 3 2 + 1 2 + 1 2 = 11 {\displaystyle 3^{2}+1^{2}+1^{2}=11} : ------ Line width: 6 AAA Remaining space: 3 BB CC Remaining space: 1 DDDDD Remaining space: 1 The difference here is that the first line is broken before BB instead of after it, yielding a better right margin and a lower cost 11.

    Read more →
  • Knowledge spillover

    Knowledge spillover

    Knowledge spillover is an exchange of ideas among individuals. Knowledge spillover is usually replaced by terminations of technology spillover, R&D spillover and/or spillover (economics) when the concept is specific to technology management and innovation economics. In knowledge management economics, knowledge spillovers are non-rival knowledge market costs incurred by a party not agreeing to assume the costs that has a spillover effect of stimulating technological improvements in a neighbor through one's own innovation. Such innovations often come from specialization within an industry. There are two kinds of knowledge spillovers: internal and external. Internal knowledge spillover occurs if there is a positive impact of knowledge between individuals within an organization that produces goods and/or services. An external knowledge spillover occurs when the positive impact of knowledge is between individuals outside of a production organization. Marshall–Arrow–Romer (MAR) spillovers, Porter spillovers and Jacobs spillovers are three types of spillovers. == Conceptualizations == === Marshall–Arrow–Romer === Marshall–Arrow–Romer (MAR) spillover has its origins in 1890, where the English economist Alfred Marshall developed a theory of knowledge spillovers. Knowledge spillovers later were extended by economists Kenneth Arrow (1962) and Paul Romer (1986). In 1992, Edward Glaeser, Hedi Kallal, José Scheinkman, and Andrei Shleifer pulled together the Marshall–Arrow–Romer views on knowledge spillovers and accordingly named the view MAR spillover in 1992. Under the Marshall–Arrow–Romer (MAR) spillover view, the proximity of firms within a common industry often affects how well knowledge travels among firms to facilitate innovation and growth. The closer the firms are to one another, the greater the MAR spillover. The exchange of ideas is largely from employee to employee, in that employees from different firms in an industry exchange ideas about new products and new ways to produce goods. The opportunity to exchange ideas that lead to innovations key to new products and improved production methods. Research on the Cambridge IT Cluster (UK) suggests that technological knowledge spillovers might only happen rarely and are less important than other cluster benefits such as labour market pooling. === Porter === Porter (1990), like MAR, argues that knowledge spillovers in specialized, geographically concentrated industries stimulate growth. He insists, however, that local competition, as opposed to local monopoly, fosters the pursuit and rapid adoption of innovation. He gives examples of Italian ceramics and gold jewellery industries, in which hundreds of firms are located together and fiercely compete to innovate since the alternative to innovation is demise. Porter's externalities are maximized in cities with geographically specialized, competitive industries. === Jacobs === Under the Jacobs spillover view, the proximity of firms from different industries affect how well knowledge travels among firms to facilitate innovation and growth. This is in contrast to MAR spillovers, which focus on firms in a common industry. The diverse proximity of a Jacobs spillover brings together ideas among individuals with different perspectives to encourage an exchange of ideas and foster innovation in an industrially diverse environment. Developed in 1969 by urbanist Jane Jacobs and John Jackson the concept that Detroit’s shipbuilding industry from the 1830s was the critical antecedent leading to the 1890s development of the auto industry in Detroit since the gasoline engine firms easily transitioned from building gasoline engines for ships to building them for automobiles. == Incoming and outgoing spillovers == Knowledge spillover has asymmetric directions. The focal entity and receives or outflows know-how to others, creating incoming and outgoing spillovers. Cassiman and Veugelers (2002) use survey data and estimate incoming and outgoing spillover and study the economic impacts. Incoming spillover increases growth opportunity and productivity improvements of receivers, while outgoing spillover leads to free rider problem in the technology competition. Chen et al. (2013) use econometric method to gauge incoming spillover, a way that applies for all companies without survey. They find that incoming spillover explains R&D profits of industrial firms. == Policy implications == As information is largely non-rival in nature, certain measures must be taken to ensure that, for the originator, the information remains a private asset. As the market cannot do this efficiently, public regulations have been implemented to facilitate a more appropriate equilibrium. As a result, the concept of intellectual property rights have developed and ensure the ability of entrepreneurs to temporarily hold on to the profitability of their ideas through patents, copyrights, trade secrets, and other governmental safeguards. Conversely, such barriers to entry prevent the exploitation of informational developments by rival firms within an industry. For example, Wang (2023) indicates that technology spillovers are reduced by 27% to 51% when trade secrets laws are implemented by the Uniform Trade Secrets Act in the US. On the other hand, when the research and development of a private firm results in a social benefit, unaccounted for within the market price, often greater than the private return of the firm's research, then a subsidy to offset the underproduction of that benefit might be offered to the firm in return for its continued output of that benefit. Government subsidies are often controversial, and while they might often result in a more appropriate social equilibrium, they could also lead to undesirable political repercussions as such a subsidy must come from taxpayers, some of whom may not directly benefit from the researching firm's subsidized knowledge spillover. The concept of knowledge spillover is also used to justify subsidies to foreign direct investment, as foreign investors help diffuse technology among local firms. == Examples == Business parks are a good specific example of concentrated businesses that may benefit from MAR spillover. Many semiconductor firms intentionally located their research and development facilities in Silicon Valley to take advantage of MAR spillover. In addition, the film industry in Los Angeles, California, and elsewhere relies on a geographic concentration of specialists (directors, producers, scriptwriters, and set designers) to bring together narrow aspects of movie-making into a final product. A general example of a knowledge spillover could be the collective growth associated with the research and development of online social networking tools like Facebook, YouTube, and Twitter. Such tools have not only created a positive feedback loop, and a host of originally unintended benefits for their users, but have also created an explosion of new software, programming platforms, and conceptual breakthroughs that have perpetuated the development of the industry as a whole. The advent of online marketplaces, the utilization of user profiles, the widespread democratization of information, and the interconnectivity between tools within the industry have all been products of each tool's individual developments. These developments have since spread outside the industry into the mainstream media as news and entertainment firms have developed their own market feedback applications within the tools themselves, and their own versions of online networking tools (e.g. CNN’s iReport).

    Read more →
  • Webull

    Webull

    Webull Corporation, often stylized as simply Webull, is a U.S.-based financial services holding company headquartered in St. Petersburg, Florida. It owns and operates the Webull electronic trading platform for self-directed retail investors. Depending on jurisdiction, the Webull platform offers trading in stocks, exchange-traded funds (ETFs), options, margin, bonds, cryptocurrency and futures, as well as market-data tools. Webull began operations in 2016 under Hunan Fumi Information Technology, a China-based financial technology company founded by Wang Anquan. It launched U.S. brokerage services through Webull Financial LLC in 2018 and expanded during the retail-trading boom of 2020 and 2021. In April 2025, Webull became a publicly traded company on the Nasdaq through a merger with special-purpose acquisition company SK Growth Opportunities Corporation. The company's U.S. brokerage revenue relies substantially on payment for order flow, with options trading accounting for the larger share of its order-flow rebates in 2025. Webull has faced regulatory actions related to options customer approvals, complaint handling, suspicious activity reporting, social-media marketing and customer disclosures. It has also faced scrutiny from U.S. lawmakers and state officials over its historical and operational ties to China and the handling of U.S. customer data. == History == === Founding === Webull was founded in 2016 under Hunan Fumi Information Technology, a China-based financial technology company, by Wang Anquan, a former employee of Alibaba Group and Xiaomi. Hunan Fumi Information Technology received backing from Xiaomi, Shunwei Capital, and other investors in China. Fumi Technology was a Hunan-based fintech start-up incubated by Xiaomi and raised about CNY200 million (approximately US$30 million) in a Series B financing round in 2018. On May 24, 2017, Webull Financial LLC was established as a Delaware limited liability company. It began offering brokerage services in the United States in May 2018. Wang hired Anthony Denier as CEO of the U.S. brokerage that year and the two mapped out their strategy on napkins at a Mexican restaurant in New York City. Webull Corporation was incorporated in the Cayman Islands in September 2019 as the group's holding company. === Retail trading boom === In May 2020, the company received SEC approval to launch a robo-advisor on its platform. By August 2020, the platform had over 11 million registered users, and in October 2020, it had 750,000 daily active users. Webull introduced options trading in 2020 and later added cryptocurrency trading through a separate digital-asset business. In November 2020, Webull began supporting cryptocurrency transactions. In December 2020, Webull launched trading services in Hong Kong. During the GameStop short squeeze in January 2021, Webull gained attention as some retail traders looked for alternatives to Robinhood. On January 27, 2021, Webull recorded its highest-ever number of active daily users, at 952,000, and the Webull app was downloaded across the Apple App and Google Play stores an estimated 100,000 times. That week, approximately 1.2 million people downloaded the Webull mobile app, which the company reported as a 1,548% week-over-week increase. On January 28, 2021, Webull was directed by its clearing house to temporarily halt buy orders for stocks affected by the GameStop short squeeze. In June 2021, Webull was reported to be considering a U.S. initial public offering that could raise up to $400 million. === Restructuring and expansion === Webull restructured its China-related corporate arrangements in 2022 and later stated that Hunan Fumi was no longer affiliated with the group. In 2022 and 2023, Webull expanded in several non-U.S. markets, including Singapore, Australia, South Africa, Japan, the United Kingdom and Indonesia. In June 2023, Webull moved cryptocurrency trading to a separate app called Webull Pay. By the end of 2023, Webull had 4.3 million funded accounts and US$8.2 billion in customer assets. In January 2024, Anthony Denier was promoted to group president of Webull Corporation. In November 2024, Webull launched overnight, or extended-hours, trading, expanding the trading window of U.S. stocks for users inside and outside the United States. === SPAC merger and Nasdaq listing === On February 28, 2024, Webull agreed to go public through a business combination with SK Growth Opportunities Corporation (NASDAQ: SKGR), a special-purpose acquisition company, in a deal that valued the company at approximately US$7.3 billion. The proposed valuation drew scrutiny because of Webull's limited financial disclosure at announcement, reliance on payment for order flow and small expected public float. SK Growth shareholders approved the business combination on March 30, 2025, and the transaction closed on April 10, 2025. Webull's Class A ordinary shares and warrants began trading on the Nasdaq on April 11, 2025 under the ticker symbols BULL and BULLW (incentive warrants traded under BULLZ until their redemption in June 2025). The merger brought Webull to the public market but generated little cash for the company: after shareholder redemptions, Webull disclosed net proceeds of US$430,066 from the transaction. After the listing, Webull's shares experienced extreme volatility, rising as much as 500% to US$79.56 on April 14, 2025, after closing at US$13.25 on the prior trading day. The initial post-listing surge increased the value of Webull holdings owned by earlier investors, including RIT Capital Partners, which had first invested in Webull in 2021. In April 2026, after Webull's shares had fallen about 70% over the previous year, the company authorized a US$100 million share repurchase program. == Business model and financials == Webull provides a self-directed electronic trading platform available through mobile, desktop and web applications. Depending on jurisdiction, the platform offers trading in stocks, exchange-traded funds, options, margin, futures, fixed income products, cryptocurrency, cash management features and market data tools. In the United States, Webull Financial LLC is a registered broker-dealer and member of FINRA and the Securities Investor Protection Corporation, while Webull operates in other markets through locally licensed brokerage subsidiaries. Webull operates a commission-free or low-cost brokerage model for self-directed retail investors. In the United States, a substantial part of its trading-related revenue comes from payment for order flow, while in some non-U.S. markets the company more commonly charges commissions directly to customers. The platform is aimed at more active retail investors, including users seeking options tools, extended-hours trading and real-time market data. For 2025, Webull reported total revenue of US$571.0 million, up from US$390.2 million in 2024. Equity and option order-flow rebates accounted for US$304.1 million, or 53.3% of revenue, making order-flow rebates the company's largest reported revenue category. Interest-related income accounted for US$154.3 million, handling charge income for US$87.3 million and other revenue for US$25.3 million. Options were the larger component of the company's order-flow rebates in 2025, generating US$210.0 million compared with US$94.2 million from equities. Webull also generates revenue from interest-related activities, including margin financing, customer bank deposits, stock lending and corporate bank deposits. The company has stated that its interest-related income is affected by interest rates, customer cash balances, margin balances and demand for stock lending. The company had approximately 20 million registered users worldwide as of February 2024. As of December 31, 2025, it reported 26.8 million registered users, 5.0 million funded accounts and US$24.6 billion in customer assets. As of March 2025, Webull operated in Hong Kong, Singapore, Australia, South Africa, Japan, the United Kingdom, the United States, Indonesia, Canada, Brazil, Thailand, Malaysia and Mexico. == Marketing and sponsorships == Webull has used paid digital advertising, referral incentives, free-stock promotions, affiliate marketing and sports sponsorships to acquire customers and promote its brand. In its 2025 annual filing, the company reported marketing and branding expenses of US$152.3 million in 2023, US$138.7 million in 2024 and US$135.9 million in 2025. Webull said most of its advertising and promotion costs were related to paid search and paid social advertising, and that it had reduced free-stock promotions while shifting toward deposit- and asset-transfer-based incentives. In September 2021, BSE Global, the parent company of the Brooklyn Nets and New York Liberty, entered into a global multi-year agreement with Webull. Under the agreement, Webull became an official sponsor and online brokerage partner of the teams, with branding that included a jersey patch on Brooklyn Nets uniforms. Spo

    Read more →
  • AI notetaker

    AI notetaker

    An AI notetaker is a tool using artificial intelligence to take notes during meetings. They are created by tech companies such as Microsoft and Google; by AI transcription services such Otter.ai, and by smaller firms such as Cluely and Krisp. Some business executives send AI notetakers to attend meetings not only to take notes, but also to answer questions on their behalf. The use of AI notetakers raises ethical questions, including recording meetings without the consent of all participants and the possibility that the notetaker will hallucinate and misrepresent what was said during meetings. There are also concerns when it comes to the privacy and security of meeting data and the sensitive information that lives inside meetings. Further controversies have developed from the use of AI notetakers such as Cluely to cheat in technical job interviews. == Technology == Large technology companies have integrated transcription capabilities into broader productivity and accessibility tools, including real-time captioning, dictation, and meeting documentation features embedded in operating systems and office platforms. Standalone transcription platforms, such as Transkriptor, focus specifically on automated transcription workflows and apply AI-based speech recognition to convert audio and video recordings into text. The software supports transcription in multiple languages and processes recordings uploaded via a web interface as well as through mobile and browser extensions. Tools of this type typically provide editable, time-aligned transcripts and export options for text and subtitle formats, cloud-based processing, multilingual support, and automation in transcription technology.

    Read more →
  • Savepoint

    Savepoint

    A savepoint is a way of implementing subtransactions (also known as nested transactions) within a relational database management system by indicating a point within a transaction that can be "rolled back to" without affecting any work done in the transaction before the savepoint was created. Multiple savepoints can exist within a single transaction. Savepoints are useful for implementing complex error recovery in database applications. If an error occurs in the midst of a multiple-statement transaction, the application may be able to recover from the error (by rolling back to a savepoint) without needing to abort the entire transaction. A savepoint can be declared by issuing a SAVEPOINT name statement. All changes made after a savepoint has been declared can be undone by issuing a ROLLBACK TO SAVEPOINT name command. Issuing RELEASE SAVEPOINT name will cause the named savepoint to be discarded, but will not otherwise affect anything. Issuing the commands ROLLBACK or COMMIT will also discard any savepoints created since the start of the main transaction. Savepoints are defined in the SQL standard and are supported by all established SQL relational databases, including PostgreSQL, Oracle Database, Microsoft SQL Server, MySQL, IBM Db2, SQLite (since 3.6.8), Firebird, H2 Database Engine, and Informix (since version 11.50xC3).

    Read more →
  • Secure Electronic Delivery

    Secure Electronic Delivery

    Secure Electronic Delivery (SED) is a service created in 2003 and provided by the British Library Document Supply Service (BLDSS). Its purpose is to enable faster delivery of digital materials as encrypted, copyright-compliant PDF Documents, to a personal e-mail address. These documents are supplied from the British Library via its On Demand service. When the British Library supplies articles electronically, it sends them securely in order to ensure its usage is permitted (research purposes) and copyright law is observed. == Methods == As the publishing industry, authors and creators become highly protective of their assets and intellectual property, they impose strict rules on delivery methods to prevent copyright infringement. Nowadays, DRM-enabled secure delivery appears to be the most widely used solution to address issues faced by libraries in supplying ebooks and digital materials to their users. SED, one of these solutions, is using Adobe LiveCycle Digital Rights Management (LCDRM) as an encryption method to deliver documents. == Advantages == SED offers convenience, quality and speed as documents are delivered upon request at any location and on any device. Requested articles are scanned for high quality reproduction, opened anywhere on any machine, including mobile devices. == Restrictions == The following are restrictions hold in a SED service implementation: The digital material is accessible only for 14 days via a link sent to a personal message. Due to copyright reasons, the material can be opened only once, saved for 14 days and does not allow a copy-paste action. Upon display, the material must be printed from the same device and reprinted only once. The On Demand encryption technology works best on the default Safari browser although other browsers may accommodate it.

    Read more →
  • DryvIQ

    DryvIQ

    DryvIQ is a software application that enables businesses to migrate on-site system files and associated data across storage and content management platforms, as well as create synchronized hybrid storage systems. == History == Before it was DryvIQ, the software SkySync was released in 2013 by Ann Arbor, Michigan based company, Portal Architects, Inc. The company created SkySync, a back-end, administrative application designed to transfer content across storage platforms, after abandoning 18 months of development on a desktop application called SkyBrary in 2011. Between 2014 and 2015, Portal Architects established partnerships with the following companies: Autodesk, Box, Dropbox, Egnyte, EMC, Google, Syncplicity, Huddle, IBM, Microsoft, OpenText, Oracle, Citrix ShareFile, Hightail and Internet2. SkySync (currently DryvIQ) was named a "Cool Vendor in Content Management" by Gartner in 2015. In 2022, SkySync changed its name to DryvIQ, which is now what the company is currently known as. == Overview == DryvIQ is a software application that syncs, migrates or backs up files including their associated properties, metadata, versions, user accounts and permissions across on-premises and Cloud-based storage platforms. The software deploys on a server, virtual machine or within Microsoft Azure, Amazon Web Services or other cloud computing services.

    Read more →
  • Metadata

    Metadata

    Metadata (or metainformation) is data (or information) that defines and describes the characteristics of other data. It often helps to describe, explain, locate, or otherwise make data easier to retrieve, use, or manage. For example, the title, author, and publication date of a book are metadata about the book. But, while a data asset is finite, its metadata is infinite. As such, efforts to define, classify types, or structure metadata are expressed as examples in the context of its use. The term "metadata" has a history dating to the 1960s where it occurred in computer science and in popular culture. Different types of metadata serve different functions. For example, descriptive metadata for a document might include the author, creation date, file size and keywords. Metadata has various purposes. It can help users find relevant information and discover resources. It can also help organize electronic resources, provide digital identification, and archive and preserve resources. Metadata allows users to access resources by "allowing resources to be found by relevant criteria, identifying resources, bringing similar resources together, distinguishing dissimilar resources, and giving location information". Metadata of telecommunication activities including Internet traffic is very widely collected by various national governmental organizations. This data is used for the purposes of traffic analysis and can be used for mass surveillance. Unique metadata standards exist for different disciplines (e.g., museum collections, digital audio files, websites, etc.). Describing the contents and context of data or data files increases its usefulness. For example, a web page may include metadata specifying what software language the page is written in (e.g., HTML), what tools were used to create it, what subjects the page is about, and where to find more information about the subject. This metadata can automatically improve the reader's experience and make it easier for users to find the web page online. A CD may include metadata providing information about the musicians, singers, and songwriters whose work appears on the disc. In many countries, government organizations routinely store metadata about emails, telephone calls, web pages, video traffic, IP connections, and cell phone locations. == Types == There are many distinct types of metadata, including: Descriptive metadata – the descriptive information about a resource. It is used for discovery and identification. It includes elements such as title, abstract, author, and keywords. Structural metadata – metadata about containers of data and indicates how compound objects are put together, for example, how pages are ordered to form chapters. It describes the types, versions, relationships, and other characteristics of digital materials. Administrative metadata – the information to help manage a resource, like resource type, and permissions, and when and how it was created. Reference metadata – the information about the contents and quality of statistical data. Statistical metadata – also called process data, may describe processes that collect, process, or produce statistical data. Legal metadata – provides information about the creator, copyright holder, and public licensing, if provided. Metadata is not strictly bound to one of these categories, as it can describe a piece of data in many other ways. While the metadata application is manifold, covering a large variety of fields, there are specialized and well-accepted models to specify types of metadata. Bretherton & Singley (1994) distinguish between two distinct classes: structural/control metadata and guide metadata. Structural metadata describes the structure of database objects such as tables, columns, keys and indexes. Guide metadata helps humans find specific items and is usually expressed as a set of keywords in a natural language. According to Ralph Kimball, metadata can be divided into three categories: technical metadata (or internal metadata), business metadata (or external metadata), and process metadata. Dan Linstedt, creator of the data vault methodology, says business metadata "...provide[s] definition of the functionality, definition of the data, definition of the elements, and definition of how the data is used within business...business metadata includes business requirements, time-lines, business metrics, business process flows, and business terminology." Business metadata is important because it can greatly facilitate the usefulness of the data to business people. A simple example of business metadata is a glossary entry. Hover functionality in an application or web form can enable a glossary definition to be shown when cursor is on a field or term. Other examples of business metadata include annotation ability within applications. For example, a business user may be viewing a business intelligence (BI) report and notice a trend in the data. The user may have background knowledge as to why this trend occurs. Some business intelligence tools enable the user to create an annotation within the report that explains the trend. Such an annotation can enhance other users' understanding of the data. This example is especially powerful because it is created by a business user for the use of other business people. NISO distinguishes three types of metadata: descriptive, structural, and administrative. Descriptive metadata is typically used for discovery and identification, as information to search and locate an object, such as title, authors, subjects, keywords, and publisher. Structural metadata describes how the components of an object are organized. An example of structural metadata would be how pages are ordered to form chapters of a book. Finally, administrative metadata gives information to help manage the source. Administrative metadata refers to the technical information, such as file type, or when and how the file was created. Two sub-types of administrative metadata are rights management metadata and preservation metadata. Rights management metadata explains intellectual property rights, while preservation metadata contains information to preserve and save a resource. Statistical data repositories have their own requirements for metadata in order to describe not only the source and quality of the data but also what statistical processes were used to create the data, which is of particular importance to the statistical community in order to both validate and improve the process of statistical data production. An additional type of metadata beginning to be more developed is accessibility metadata. Accessibility metadata is not a new concept to libraries; however, advances in universal design have raised its profile. Projects like Cloud4All and GPII identified the lack of common terminologies and models to describe the needs and preferences of users and information that fits those needs as a major gap in providing universal access solutions. Those types of information are accessibility metadata. The Schema.org website has incorporated several accessibility properties based on IMS Global Access for All Information Model Data Element Specification. While the efforts to describe and standardize the varied accessibility needs of information seekers are beginning to become more robust, their adoption into established metadata schemas has not been as developed. For example, while Dublin Core (DC)'s "audience" and MARC 21's "reading level" could be used to identify resources suitable for users with dyslexia and DC's "format" could be used to identify resources available in braille, audio, or large print formats, there is more work to be done. == History == Metadata was traditionally used in the card catalogs of libraries until the 1980s when libraries converted their catalog data to digital databases. In the 2000s, as data and information were increasingly stored digitally, this digital data was described using metadata standards. An early description of "meta data" for computer systems was written by David Griffel and Stuart McIntosh at the MIT Center for International Studies in 1967: "In summary then, we have statements in an object language about subject descriptions of data and token codes for the data. We also have statements in a meta language describing the data relationships and transformations, and ought/is relations between norm and data." == Definition == Metadata means "data about data". Metadata is defined as the data providing information about one or more aspects of the data; it is used to summarize basic information about data that can make tracking and working with specific data easier. Some examples include: Means of creation of the data Source of the data Time and date of creation Creator or author of the data Location on a computer network where the data was created Standards used Data quality For example, a digital image may include metadata that describes the size of the image, its color depth, resolution,

    Read more →
  • Reservoir sampling

    Reservoir sampling

    Reservoir sampling is a family of randomized algorithms for choosing a simple random sample, without replacement, of k items from a population of unknown size n in a single pass over the items. The size of the population n is not known to the algorithm and is typically too large for all n items to fit into main memory. The population is revealed to the algorithm over time, and the algorithm cannot look back at previous items. At any point, the current state of the algorithm must permit extraction of a simple random sample without replacement of size k over the part of the population seen so far. == Motivation == Suppose we see a sequence of items, one at a time. We want to keep 10 items in memory, and we want them to be selected at random from the sequence. If we know the total number of items n and can access the items arbitrarily, then the solution is easy: select 10 distinct indices i between 1 and n with equal probability, and keep the i-th elements. The problem is that we do not always know the exact n in advance. == Simple: Algorithm R == A simple and popular but slow algorithm, Algorithm R, was created by Jeffrey Vitter. Initialize an array R {\displaystyle R} indexed from 1 {\displaystyle 1} to k {\displaystyle k} , containing the first k items of the input x 1 , . . . , x k {\displaystyle x_{1},...,x_{k}} . This is the reservoir. For each new input x i {\displaystyle x_{i}} , generate a random number j uniformly in { 1 , . . . , i } {\displaystyle \{1,...,i\}} . If j ∈ { 1 , . . . , k } {\displaystyle j\in \{1,...,k\}} , then set R [ j ] := x i . {\displaystyle R[j]:=x_{i}.} Otherwise, discard x i {\displaystyle x_{i}} . Return R {\displaystyle R} after all inputs are processed. This algorithm works by induction on i ≥ k {\displaystyle i\geq k} . While conceptually simple and easy to understand, this algorithm needs to generate a random number for each item of the input, including the items that are discarded. The algorithm's asymptotic running time is thus O ( n ) {\displaystyle O(n)} . Generating this amount of randomness and the linear run time causes the algorithm to be unnecessarily slow if the input population is large. This is Algorithm R, implemented as follows: == Optimal: Algorithm L == If we generate n {\displaystyle n} random numbers u 1 , . . . , u n ∼ U [ 0 , 1 ] {\displaystyle u_{1},...,u_{n}\sim U[0,1]} independently, then the indices of the smallest k {\displaystyle k} of them is a uniform sample of the k {\displaystyle k} -subsets of { 1 , . . . , n } {\displaystyle \{1,...,n\}} . The process can be done without knowing n {\displaystyle n} : Keep the smallest k {\displaystyle k} of u 1 , . . . , u i {\displaystyle u_{1},...,u_{i}} that has been seen so far, as well as w i {\displaystyle w_{i}} , the index of the largest among them. For each new u i + 1 {\displaystyle u_{i+1}} , compare it with u w i {\displaystyle u_{w_{i}}} . If u i + 1 < u w i {\displaystyle u_{i+1} Read more →

  • World Congress of Universal Documentation

    World Congress of Universal Documentation

    The World Congress of Universal Documentation was held from 16 to 21 August 1937 in Paris, France. Delegates from 45 countries met to discuss means by which all of the world's information, in print, in manuscript, and in other forms, could be efficiently organized and made accessible. == The Congress in the history of information science == The Congress, held at the Trocadéro under "the auspices" of the Institut International de Bibliographie, was "the apotheosis" of a general movement in the 1930s towards the classification of the growing mass of information and the improvement of access to that information. For the first time in the history of information science, technological means were beginning to catch up with theoretical ends, and the discussions at the conference reflected that fact. Its participation in the Congress was one of the first projects of the American Documentation Institute (ADI). Participants in the conference discussed what has been more recently called "a continuously updated hypertext encyclopedia." Joseph Reagle sees many of the ideas considered at the conference as forerunners of some of the key goals and norms of Wikipedia. == Microfilm == The main resolution adopted by the congress proposed that microfilm be used to make information universally available. Watson Davis, chairman of the American delegation and president of the ADI, stated that the volume of information being produced created difficult problems of access and preservation, but that these could be solved by the use of microfilm. In his address to the Congress, Davis said: Most immediate and practical to put into operation is the microfilming of material in libraries upon demand. It will become fashionable and economical to send a potential book borrower a little strip of microfilm for his permanent possession instead of the book and then badgering him to return it before he has had a chance to use it effectively. I believe that reading machines for microfilm will become as common as typewriters in studies and laboratories. If the principal libraries and information centers of the world will cooperate in such "bibliofilm services," as they are called, if they exchange orders and have essentially uniform methods, forms for ordering, standard microfilm format and production methods and comparable if not uniform prices, the resources of any library will be placed at the disposal of any scholar or scientist anywhere in the world. All the libraries cooperating will merge into one world library without loss of identity or individuality. The world's documentation will become available to even the most isolated and individualistic scholar. The Congress included two separate exhibits on microfilm. One was of the equipment used at the Bibliothèque nationale de France and the other, coordinated by Herman H. Fussler of the University of Chicago, consisting of "an entire microfilm laboratory," complete with cameras, a darkroom, and various kinds of reading machines. Emanuel Goldberg presented a paper on an early copying camera he had invented. Other resolutions passed by the Congress concerned uniform standards for the preparation of articles, for classifying books and other documents, for indexing newspapers and periodicals, and for cooperation between libraries. == H. G. Wells == In his address to the Congress, H. G. Wells said that he thought that his idea of the "world brain" was a precursor to the ideas other delegates were proposing, and explicitly linked the projects being discussed to the work of the encyclopédistes: I am speaking of a process of mental organization throughout the world which I believe to be as inevitable as anything can be in human affairs. All the distresses and horrors of the present time are fundamentally intellectual. The world has to pull its mind together, and this [Congress] is the beginning of its efforts. Civilization is a Phoenix. It perishes in flames and even as it dies it is born again. This synthesis of knowledge upon which you are working is the necessary beginning of a new world. It is good to be meeting here in Paris where the first encyclopedia of power was made. It would be impossible to overrate our debt to Diderot and his associates. == Other participants == Participants in the Congress included authors, librarians, scholars, archivists, scientists, and editors. Some of the notable people in attendance not mentioned above were:

    Read more →
  • Text Retrieval Conference

    Text Retrieval Conference

    The Text REtrieval Conference (TREC) is an ongoing series of workshops focusing on a list of different information retrieval (IR) research areas, or tracks. It is co-sponsored by the National Institute of Standards and Technology (NIST) and the Intelligence Advanced Research Projects Activity (part of the office of the Director of National Intelligence), and began in 1992 as part of the TIPSTER Text program. Its purpose is to support and encourage research within the information retrieval community by providing the infrastructure necessary for large-scale evaluation of text retrieval methodologies and to increase the speed of lab-to-product transfer of technology. TREC's evaluation protocols have improved many search technologies. A 2010 study estimated that "without TREC, U.S. Internet users would have spent up to 3.15 billion additional hours using web search engines between 1999 and 2009." Hal Varian the Chief Economist at Google wrote that "The TREC data revitalized research on information retrieval. Having a standard, widely available, and carefully constructed set of data laid the groundwork for further innovation in this field." Each track has a challenge wherein NIST provides participating groups with data sets and test problems. Depending on track, test problems might be questions, topics, or target extractable features. Uniform scoring is performed so the systems can be fairly evaluated. After evaluation of the results, a workshop provides a place for participants to collect together thoughts and ideas and present current and future research work.Text Retrieval Conference started in 1992, funded by DARPA (US Defense Advanced Research Project) and run by NIST. Its purpose was to support research within the information retrieval community by providing the infrastructure necessary for large-scale evaluation of text retrieval methodologies. == Goals == Encourage retrieval search based on large text collections Increase communication among industry, academia, and government by creating an open forum for the exchange of research ideas Speed the transfer of technology from research labs into commercial products by demonstrating substantial improvements retrieval methodologies on real world problems To increase the availability of appropriate evaluation techniques for use by industry and academia including development of new evaluation techniques more applicable to current systems TREC is overseen by a program committee consisting of representatives from government, industry, and academia. For each TREC, NIST provide a set of documents and questions. Participants run their own retrieval system on the data and return to NIST a list of retrieved top-ranked documents. NIST pools the individual result judges the retrieved documents for correctness and evaluates the results. The TREC cycle ends with a workshop that is a forum for participants to share their experiences. == Relevance judgments in TREC == TREC defines relevance as: "If you were writing a report on the subject of the topic and would use the information contained in the document in the report, then the document is relevant." Most TREC retrieval tasks use binary relevance: a document is either relevant or not relevant. Some TREC tasks use graded relevance, capturing multiple degrees of relevance. Most TREC collections are too large to perform complete relevance assessment; for these collections it is impossible to calculate the absolute recall for each query. To decide which documents to assess, TREC usually uses a method call pooling. In this method, the top-ranked n documents from each contributing run are aggregated, and the resulting document set is judged completely. == Various TRECs == In 1992 TREC-1 was held at NIST. The first conference attracted 28 groups of researchers from academia and industry. It demonstrated a wide range of different approaches to the retrieval of text from large document collections .Finally TREC1 revealed the facts that automatic construction of queries from natural language query statements seems to work. Techniques based on natural language processing were no better no worse than those based on vector or probabilistic approach. TREC2 Took place in August 1993. 31 group of researchers participated in this. Two types of retrieval were examined. Retrieval using an ‘ad hoc’ query and retrieval using a ‘routing' query In TREC-3 a small group experiments worked with Spanish language collection and others dealt with interactive query formulation in multiple databases TREC-4 they made even shorter to investigate the problems with very short user statements TREC-5 includes both short and long versions of the topics with the goal of carrying out deeper investigation into which types of techniques work well on various lengths of topics In TREC-6 Three new tracks speech, cross language, high precision information retrieval were introduced. The goal of cross language information retrieval is to facilitate research on system that are able to retrieve relevant document regardless of language of the source document TREC-7 contained seven tracks out of which two were new Query track and very large corpus track. The goal of the query track was to create a large query collection TREC-8 contain seven tracks out of which two –question answering and web tracks were new. The objective of QA query is to explore the possibilities of providing answers to specific natural language queries TREC-9 Includes seven tracks In TREC-10 Video tracks introduced Video tracks design to promote research in content based retrieval from digital video In TREC-11 Novelty tracks introduced. The goal of novelty track is to investigate systems abilities to locate relevant and new information within the ranked set of documents returned by a traditional document retrieval system TREC-12 held in 2003 added three new tracks; Genome track, robust retrieval track, HARD (Highly Accurate Retrieval from Documents) == Tracks == === Current tracks === New tracks are added as new research needs are identified, this list is current for TREC 2018. CENTRE Track – Goal: run in parallel CLEF 2018, NTCIR-14, TREC 2018 to develop and tune an IR reproducibility evaluation protocol (new track for 2018). Common Core Track – Goal: an ad hoc search task over news documents. Complex Answer Retrieval (CAR) – Goal: to develop systems capable of answering complex information needs by collating information from an entire corpus. Incident Streams Track – Goal: to research technologies to automatically process social media streams during emergency situations (new track for TREC 2018). The News Track – Goal: partnership with The Washington Post to develop test collections in news environment (new for 2018). Precision Medicine Track – Goal: a specialization of the Clinical Decision Support track to focus on linking oncology patient data to clinical trials. Real-Time Summarization Track (RTS) – Goal: to explore techniques for real-time update summaries from social media streams. === Past tracks === Chemical Track – Goal: to develop and evaluate technology for large scale search in chemistry-related documents, including academic papers and patents, to better meet the needs of professional searchers, and specifically patent searchers and chemists. Clinical Decision Support Track – Goal: to investigate techniques for linking medical cases to information relevant for patient care Contextual Suggestion Track – Goal: to investigate search techniques for complex information needs that are highly dependent on context and user interests. Crowdsourcing Track – Goal: to provide a collaborative venue for exploring crowdsourcing methods both for evaluating search and for performing search tasks. Genomics Track – Goal: to study the retrieval of genomic data, not just gene sequences but also supporting documentation such as research papers, lab reports, etc. Last ran on TREC 2007. Dynamic Domain Track – Goal: to investigate domain-specific search algorithms that adapt to the dynamic information needs of professional users as they explore in complex domains. Enterprise Track – Goal: to study search over the data of an organization to complete some task. Last ran on TREC 2008. Entity Track – Goal: to perform entity-related search on Web data. These search tasks (such as finding entities and properties of entities) address common information needs that are not that well modeled as ad hoc document search. Cross-Language Track – Goal: to investigate the ability of retrieval systems to find documents topically regardless of source language. After 1999, this track spun off into CLEF. FedWeb Track – Goal: to select best resources to forward a query to, and merge the results so that most relevant are on the top. Federated Web Search Track – Goal: to investigate techniques for the selection and combination of search results from a large number of real on-line web search services. Filtering Track – Goal: to binarily decide retrieval of new

    Read more →
  • Enterprise bus matrix

    Enterprise bus matrix

    The enterprise bus matrix is a data warehouse planning tool and model created by Ralph Kimball, and is part of the data warehouse bus architecture. The matrix is the logical definition of one of the core concepts of Kimball's approach to dimensional modeling conformed dimension. The bus matrix defines part of the data warehouse bus architecture and is an output of the business requirements phase in the Kimball lifecycle. It is applied in the following phases of dimensional modeling and development of the data warehouse. The matrix can be categorized as a hybrid model, being part technical design tool, part project management tool and part communication tool == Background == The need for an enterprise bus matrix stems from the way one goes about creating the overall data warehouse environment. Historically there have been two approaches: a structured, centralized and planned approach and a more loosely defined, department specific approach, in which solutions are developed in a more independent matter. Autonomous projects can result in a range of isolated stove pipe data marts. Naturally each approach has its issues; the visionary approach often struggles with long delivery cycles and lack of reaction time as needs emerge and scope issues arise. On the other hand, the development of isolated data marts leads to stovepipe systems that lack synergy in development. Over time this approach will lead to a so-called data-mart-in-a-box architecture where interoperability and lack of cohesion is apparent, and can hinder the realization of an overall enterprise data warehouse. As an attempt to handle this issue, Ralph Kimball introduced the enterprise bus. == Description == The bus matrix purpose is one of high abstraction and visionary planning on the data warehouse architectural level. By dictating coherency in the development and implementation of an overall data warehouse the bus architecture approach enables an overall vision of the broader enterprise integration and consistency while at the same time dividing the problem into more manageable parts – all in a technology and software independent manner. The bus matrix and architecture builds upon the concept of conformed dimensions, creating a structure of common dimensions that ideally can be used across the enterprise by all business processes related to the data warehouse and the corresponding fact tables from which they derive their context. According to Kimball and Margy Ross's article “Differences of Opinion” "The Enterprise Data warehouse built on the bus architecture ”identifies and enforces the relationship between business process metrics (facts) and descriptive attributes (dimensions)”. The concept of a bus is well known in the language of information technology, and is what reflects the conformed dimension concept in the data warehouse, creating the skeletal structure where all parts of a system connect, ensuring interoperability and consistency of data, and at the same time considers future expansion. This makes the conformed dimensions act as the integration ‘glue’, creating a robust backbone of the enterprise Data Warehouse.

    Read more →
  • Best arm identification

    Best arm identification

    Best arm identification (BAI) is a sequential one-player game where the player has to find the best action (arm) among a list of actions (arms) by collecting information in the most efficient way. It is a multi-armed bandit game as a player only gets information about an arm by playing it. The most common objective in multi-armed bandit games is to minimize the regret (i.e., play the best action as much as possible), but in BAI, the goal is to find the best arm as efficiently as possible. This problem naturally arises in scenarios such as adaptive clinical trials where the number of patients is limited and the quantification of the confidence in a treatment is important. It also arises in hyperparameter optimization where the goal is to find the optimal choice of hyperparameters for an algorithm with the smallest possible number of experiments, as it can be costly in terms of time, energy, or money. == Stochastic multi-armed bandit == The stochastic multi-armed bandit (MAB) is a sequential game with one player and K {\displaystyle K} actions (arms). Each arm has an unknown probability distribution associated with it. At each turn, the player has to choose one action and receive an observation from the probability distribution associated with the arm. The more you play an arm, the more you get information on its probability distribution. === Best arm identification === In BAI the goal is to find the arm that has the probability distribution with the highest mean. BAI may be either fixed confidence or fixed horizon. In a fixed-confidence game, a confidence level δ {\displaystyle \delta } is fixed at the beginning of the game and the goal is to find the best arm with this confidence level in as few turns as possible. In a fixed horizon game, the number of turns T {\displaystyle T} is fixed, and the goal is to find the best arm with the highest possible confidence in T {\displaystyle T} turns. === Math formalisation === We have one player and K {\displaystyle K} actions (arms). Behind each arm k ∈ { 1 , … , K } {\displaystyle k\in \{1,\ldots ,K\}} lies an unknown distribution ν k {\displaystyle \nu _{k}} with mean μ k {\displaystyle \mu _{k}} . Each distribution ν k {\displaystyle \nu _{k}} belongs to a known family D {\displaystyle {\mathcal {D}}} (such as the set of Gaussian distributions or Bernoulli distributions). At each time step t {\displaystyle t} , the player selects an arm a t {\displaystyle a_{t}} and observes an independent sample X t ∼ ν a t {\displaystyle X_{t}\sim \nu _{a_{t}}} from the corresponding distribution. We will note μ ∗ := max μ a {\displaystyle \mu ^{}:=\max \mu _{a}} the highest mean. An arm a {\displaystyle a} that satisfies μ a = μ ∗ {\displaystyle \mu _{a}=\mu ^{}} is called an optimal arm; otherwise it is called suboptimal arm. In best arm identification (BAI) the objective is to identify an optimal arm. Two main settings for BAI appear in the literature: Fixed confidence: In this setting, one typically assumes that there exists a unique optimal arm. A confidence level δ ∈ ( 0 , 1 ) {\displaystyle \delta \in (0,1)} is specified at the beginning. The algorithm must stop at some finite stopping time τ δ < + ∞ {\displaystyle \tau _{\delta }<+\infty } and return an arm a ^ τ δ {\displaystyle {\hat {a}}_{\tau _{\delta }}} such that the probability of error is bounded: P ( a ^ τ δ ≠ a ∗ ) ≤ δ {\displaystyle \mathbb {P} ({\hat {a}}_{\tau _{\delta }}\neq a^{})\leq \delta } . The objective is to minimize the expected sample complexity E [ τ δ ] {\displaystyle \mathbb {E} [\tau _{\delta }]} . Such a setting appears, for example, when a constraint on the confidence is required (for example, if we require a confidence level of 95%, so δ = 1 − 0.95 = 0.05 {\displaystyle \delta =1-0.95=0.05} ). Fixed horizon: In this setting, the number of samples T {\displaystyle T} is fixed in advance. The goal is to design an algorithm that minimizes the probability of misidentifying the optimal arm: P ( a ^ T ≠ a ∗ ) {\displaystyle \mathbb {P} ({\hat {a}}_{T}\neq a^{})} . This setting appears when the number of experiments is limited (for drug tests, the number of patients can be fixed in advance). === Example of simple modelling === In the case where we have K {\displaystyle K} treatments and we want to be sure with a confidence level of 95% which treatment is the best to heal a specific disease. Each treatment heals or does not heal the disease with a probability μ k {\displaystyle \mu _{k}} , which means that each distribution is a Bernoulli distribution, so D {\displaystyle {\mathcal {D}}} is the set of Bernoulli distributions. We can use a BAI algorithm to minimize E [ τ 0.05 ] {\displaystyle \mathbb {E} [\tau _{0.05}]} , the number of patients required to find the best treatment with probability 95%. == Applications == Best arm identification naturally arises in several practical domains: Adaptive clinical trials: The objective is to identify the most effective treatment based on sequentially collected patient data. Each treatment can be modeled as having an underlying distribution of outcomes. The goal is to identify the treatment with the highest expected outcome with high confidence (fixed confidence setting δ {\displaystyle \delta } ) while minimizing the number of drug test patients (minimise E [ τ δ ] {\displaystyle \mathbb {E} [\tau _{\delta }]} ), as it costs to pay patients for this and we would like to use as little as possible less effective drugs. Hyperparameter tuning: Selecting the best configuration for machine learning models efficiently by treating each hyperparameter setting as an arm. The goal is to find the best hyperparameter with as few experiments possible as experiments are costly in time and in energy == Fixed confidence level == In the fixed-confidence setting, the goal is to design an algorithm that identifies the best arm with a prescribed confidence level δ {\displaystyle \delta } while minimizing the expected number of samples. Any such algorithm requires two key components: Stopping rule: A decision criterion that determines when to stop sampling. Formally, this defines a stopping time τ δ {\displaystyle \tau _{\delta }} and returns an arm a ^ τ δ {\displaystyle {\hat {a}}_{\tau _{\delta }}} such that P ( a ^ τ δ ≠ a ⋆ ) ≤ δ {\displaystyle \mathbb {P} ({\hat {a}}_{\tau _{\delta }}\neq a^{\star })\leq \delta } and P ( τ δ < + ∞ ) = 1 {\displaystyle \mathbb {P} (\tau _{\delta }<+\infty )=1} . Sampling rule: A policy π {\displaystyle \pi } that, at each round t {\displaystyle t} , selects the next arm to sample a t {\displaystyle a_{t}} based on all previous observations ( a s , X s ) s < t {\displaystyle (a_{s},X_{s})_{s Read more →