AI Detector Zero

AI Detector Zero — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

List of publications in data science

This is a list of publications in data science, generally organized by order of use in a data analysis workflow. See the list of publications in statistics for more research-based and fundamental publications; while this list is more applied, business oriented, and cross-disciplinary. General article inclusion criteria are: Papers from notable practitioners or notable professors, either with a Wikipedia page or reference to their notability Common knowledge all data professionals should know, with references validating this claim Highly cited applied statistics and machine learning publications Discussion-facilitating papers on the field of data science as a whole (for example, the Attention Is All You Need paper is arguably a landmark paper that can be added here, but it is specific to generative artificial intelligence, not for all practitioners of data) Some reasons why a particular publication might be regarded as important: Topic creator – A publication that created a new topic Breakthrough – A publication that changed scientific knowledge significantly Influence – A publication which has significantly influenced the world or has had a massive impact on the teaching of data science. When possible, a reference is used to validate the inclusion of the publication in this list. == History == Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) Author: Leo Breiman Publication data: Online version: https://projecteuclid.org/journals/statistical-science/volume-16/issue-3/Statistical-Modeling--The-Two-Cultures-with-comments-and-a/10.1214/ss/1009213726.pdf Description: Describes two cultures of statistics, one using a parsimonious and generative stochastic model, while the other is an algorithmic model with no known mechanism for how the data is generated. Breiman argues that while statistics has traditionally favored using the stochastic model, there is value in expanding the methods that statisticians can use to study phenomenon. Importance: Influence on the philosophies of statisticians right before the increased use of machine learning and deep learning methods. In a 20-year retrospective on this article, "Breiman's words are perhaps more relevant than ever". Notable statisticians at the time wrote opinion pieces about the publication. Although overall critical of the publication, David Cox writes that the publication "contains enough truth and exposes enough weaknesses to be thought-provoking." Bradley Efron commented that this publication is a "stimulating paper". Emanuel Parzen also comments about this publication that "Breiman alerts us to systematic blunders (leading to wrong conclusions) that have been committed applying current statistical practice of data modeling". Data Scientist: The Sexiest Job of the 21st Century Author: Thomas H. Davenport and DJ Patil Publication data: Online version: hbr.org/2022/07/is-data-scientist-still-the-sexiest-job-of-the-21st-century Description: Describes the new role at companies that is coined "Data scientist", what they do, how an organization might recruit one to their organization, and how to work with one effectively. Importance: This publication has been an influence on the data community as mentioned near the time it was published in 2012 by institutions like IEEE Spectrum, but also mentioned nearly a decade later asking the same question the title poses. In a retrospective response to their own publication 10 years earlier, authors Davenport and Patil have reflected that the role of a data scientist has "become better institutionalized, the scope of the job has been redefined, the technology it relies on has made huge strides, and the importance of non-technical expertise, such as ethics and change management, has grown". 50 Years of Data Science Author: David Donoho Publication data: Online version: https://www.tandfonline.com/doi/full/10.1080/10618600.2017.1384734 Description: Retrospective discussion paper on the history and origins of data science, with a number of commentary from notable statisticians. Importance: This has been described as "the first in the field to present such a comprehensive and in-depth survey and overview", and helps to define the field that has many definitions. The Composable Data Management System Manifesto Author: Pedro Pedreira, Orri Erling, Konstantinos Karanasos, Scott Schneider, Wes McKinney, Satya R Valluri, Mohamed Zait, Jacques Nadeau Publication data: Online version: https://www.vldb.org/pvldb/vol16/p2679-pedreira.pdf Description: The vision paper advocating for a paradigm shift in how data management systems are designed using standard, composable, interoperable tools rather than siloed software tools. Importance: A paradigm shifting view on how future data science software tools should be designed for more efficient workflows, the principles of which "will be especially crucial for addressing fragmentation, improving interoperability, and promoting user-centricity as data ecosystems grow increasingly complex". == Data collection and organization == Tidy Data Author: Hadley Wickham Publication data: Online version: https://www.jstatsoft.org/article/view/v059i10/ https://vita.had.co.nz/papers/tidy-data.pdf Description: Describes a framework for data cleaning that is summarized in the quote, "each variable is a column, each observation is a row, and each type of observational unit is a table". This allows a standard data structure for which data analysis tools can be consistently built around. Importance: Cited over 1,500 times, this effort for tidy data has been described by David Donoho as having "more impact on today's practice of data analysis than many highly regarded theoretical statistics articles". In the context of data visualization, this publication is said to support "efficient exploration and prototyping because variables can be assigned different roles in the plot without modifying anything about the original dataset". Data Organization in Spreadsheets Author: Karl W. Broman and Kara H. Woo Publication data: Online version: https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375989 Description: This article offers practical recommendations for organizing data in spreadsheets, like Microsoft Excel and Google Sheets, to reduce errors and lower the barrier for later analyses due to limitations in spreadsheets or quirks in the software. Importance: Influences teaching both data and non-data practitioners to create more analysis-friendly spreadsheets, and has been described to outline "spreadsheet best practices". == Data visualizations == Quantitative Graphics in Statistics: A Brief History Author: James R. Beniger and Dorothy L. Robyn Publication data: Online version: https://www.jstor.org/stable/2683467 Description: Outlines history and evolution of quantitative graphics in statistics, going through spatial organization (17th and 18th centuries), discrete comparison (18th and 19th centuries), continuous distribution (19th century), and multivariate distribution and correlation (late 19th and 20th centuries). Importance: Helps put into perspective for learning data practitioners the recency of graphics that are used. A later publication "Graphical Methods in Statistics" by Stephen Fienberg in 1979 writes that his publication "owes much to the work of Beniger and Robyn". == Practice == Data Science for Business Author: Foster Provost and Tom Fawcett Publication data: Online version: N/A Description: Broadly outlines principles of data science and data-analytic thinking for businesses. Importance: Cited over 3,000 times, it is "highly recommended for students" but also it is also recommended due to its "relevance to senior management leaders who want to build and lead a team of data scientists and implement data science in solving complex business problems". == Tooling == Hidden Technical Debt in Machine Learning Systems Author: D. Sculley, Gary Holy, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo, Dan Dennison Publication data: Online version: https://proceedings.neurips.cc/paper_files/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf Description: This paper argues that it is "dangerous to think of [complex machine learning] quick wins as coming for free" and overviews risk factors to account for when implementing a machine learning system. Importance: All authors worked for Google, article is cited over 2,000 times, and helped practitioners thinking about quickly implementing a machine learning tool without understanding the long-term maintenance of the tool. A few useful things to know about machine learning Author: Pedro Domingos Publication data: Online version: https://dl.acm.org/doi/10.1145/2347736.2347755 https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf Description: The purpose of this paper is to distill inaccessible "folk knowledge" to effectively implement machine learning projects because "machin
Read more →
Over-the-top media services in India

As per Govt of India, there are currently about 57 providers of over-the-top media services (OTT) in India, which distribute streaming media or video on demand over the Internet. == History and growth == The first dependent Indian OTT platform was BIGFlix, launched by Reliance Entertainment in 2008. In 2010 Digivive launched India's first OTT mobile app called nexGTv, which provides access to both live TV and on–demand content. nexGTV was the first app to live–stream Indian Premier League matches on smart phones and did so during 2013 and 2014. The livestream of the IPL since 2015, when rights were won, played an important role in the growth of another OTT platform, Hotstar (now JioHotstar) in India. OTT Platforms gained significant momentum in India when both DittoTV (Zee) and Sony Liv were launched in the Indian market around 2013. Following the initial push of Regional OTT platforms like Aha, Hoichoi, Sun NXT, Planet Marathi, Chaupal & MX Player. The Indian OTT industry saw rapid transformation with the entry of global OTT companies such as Netflix and Amazon Prime Video into the Indian market in 2016. Replacement of this competition with global enterprises caused local rivals to innovate in both region and hyper-regional content. === Hotstar === Hotstar (now JioHotstar) is the most subscribed–to OTT platform in India, owned by JioStar as of February 2025, with around 500 million active users and over 650 million downloads. According to Hotstar's India Watch Report 2018, 96% of watch time on Hotstar comes from videos longer than 20 minutes, while one–third of Hotstar subscribers watch television shows. In 2019, Hotstar began investing ₹120 crore in generating original content such as "Hotstar Specials." 80% of the viewership on Hotstar comes from drama, movies and sports programs. Hotstar has the exclusive streaming rights of IPL in India. === Netflix === American streaming service Netflix entered India in January 2016. In April 2017, it was registered as a limited liability partnership (LLP) and started commissioning content. It earned a net profit of ₹2020,000 (₹2.02 million) for fiscal year 2017. In fiscal year 2018, Netflix earned revenues of ₹580 million. According to Morgan Stanley Research, Netflix had the highest average watch time of more than 120 minutes but viewer counts of around 20 million in July 2018. As of 2018, Netflix has six million subscribers, of which 5–6% are paid members. India was not affected by Netflix's July 2018 increase in subscription rates for the US and Latin America. Netflix has stated its intent to invest ₹600 crore in the production of Indian original programming. In late 2018, Netflix bought 150,000 square feet (14,000 m2) of office space in Bandra–Kurla Complex (BKC) in Mumbai as their head office. As of December 2018, Netflix has more than 40 employees in India. === Other OTT providers === Sun NXT is an Indian video on demand service run by Sun TV Network. It was launched in June 2017, streaming in the Tamil language and six other languages. The platform has more than 4,000 Tamil movies and 200 Tamil shows, as well as regional movies and shows. Sun NXT also streams a large library of its own Sun TV shows and movies. Amazon Prime Video was launched in 2016. The platform has 2,300 titles available including 2,000 movies and about 400 shows. It has announced that it will invest ₹20 billion in creating original content in India. Besides English, Prime Video is available in six Indian languages as of December 2018. Amazon India launched Amazon Prime Music in February 2018. Eros Now, an OTT platform launched by Eros International, has the most content among the OTT providers in India, including over 12,000 films, 100,000 music tracks and albums, and 100 TV shows. Eros Now was named the Best OTT Platform of the Year 2019 at the British Asian Media Awards. It has 211.5 million registered users and 36.2 million paying subscribers as of September 2020. In February 2020, Aha OTT platform was launched, broadcasting exclusively Telugu content. In 2021, Planet Marathi became the first OTT platform dedicated to Marathi content in India, including web-series, films, music, theater, fiction and non-fiction reality shows. It is available for both Android and iOS mobile devices along with Android TV and Amazon Fire TV devices. Bollywood actress Madhuri Dixit helped launch the platform. With rising interest for Korean dramas, Rakuten Viki saw its biggest jump of web traffic from India in 2020 due to the COVID-19 lockdown, which led to ad localization on the platform. The OTT market in fiscal year 2020 was estimated to be worth $1.7 billion. === SonyLIV and ZEE5 === In December 2021, Sony and Zee announced their merger, and announced plans to merge their OTT platforms. The merger was called off. === OTT services launched as Amazon Prime video channels === The list is by alphabetical order, not by rank or popularity. == Content regulation == Due to the absence of any rules and regulation regarding OTT content, many OTT providers were accused of showing nudity, vulgarity and obscenity and hurting Hindu religious sentiments in their shows. Series which were the focus of controversy include Four More Shots Please!, Tandav, Paatal Lok, Sacred Games, Mirzapur Lust stories franchise, Rana Naidu. Thank You for Coming, and Annapoorani (2023). According to media reports, between 2018 and 2024, some OTT platforms emerged which started showing porn in the form of web series. Both the Supreme Court and Delhi High Court say that OTT regulation is necessary. === OTT regulation === On 25 Feb 2021, Indian govt introduced self-regulation rules for OTT platforms to stop obscene content and abusive language. On 19 March 2023, I&B minister Anurag Thakur said that self regulation does not mean that OTT should show obscenity and nudity. On 15 April 2023, I&B Secretary Apurva Chandra has said because of the government's soft-touch regulations on OTT industry have led to the creation of content that is undesirable and vulgar. On 26 April 2023, MIB India said that if nudity and obscenity is seen on any OTT platform, strict action will be taken against it. On 16 May 2023, Don't show obscene content, parliamentary panel told to Netflix and Amazon Prime Video. On 20 June 2023, the government told Netflix, Disney+ Hotstar and all other streaming services that their content should be independently reviewed for obscenity and violence before being shown online. On 27 June 2023, DPCGC took punitive action against Ullu for streaming obscene content and asked them to remove all their explicit shows or remove all adult scenes within 15 days. On 18 July 2023, Anarug Thakur said in a meeting with all OTT stakeholders that demeaning Indian culture will not be tolerated. OTT can't show vulgarity and nudity in the garb of 'creative expression'.The cited sources do not mention vulgarity - they say this was about demeaning Indian culture/society. On 22 August 2023, Indian government assured that it will bring rules and regulation to regulate vulgar and obscene content on social media and OTT platforms. On 10 November 2023, MIB India introduces the 'Broadcasting Service Regulation Bill', which included Programme code with Content Evaluation Committee(CEC) for every OTT platforms. Currently public consultation is ongoing till 15 January 2024. The draft bill mandates that all OTT streaming platforms can only broadcast those web series or content, which will be duly certified by Content Evaluation Committee(CEC). On 14 March 2024, the Ministry of Information and Broadcasting banned over 18 OTT apps from Google play store and suspended all of their 57 social media accounts, as well as closed nineteen streaming websites. The banned platforms were MoodX, Prime Play, Hunters, Besharams, Rabbit movies, Voovi, Fugi, Mojflix, Chikooflix, Nuefliks, Xtramood, NeonX VIP, X Prime, Tri Flicks, Uncut Adda, Dreams Films, Hot Shots VIP, and Yessma. On 25 July 2025, the Ministry of Information and Broadcasting banned from 25 OTT apps from Google play store and suspended all of their 40 social media accounts, as well as 26 closed streaming websites. The banned platforms were include ALTT, Ullu, Big Shots App, Desiflix, Boomex, NeonX VIP, Navarasa Lite, Gulab App, Kangan App, Bull App, ShowHit, Jalva App, Wow Entertainment, Look Entertainment, Hitprime, Fugi, Feneo, ShowX, Sol Talkies, Adda TV, HotX VIP, Hulchul App, MoodX, Triflicks, and Mojflix. On 24 February 2026, the Ministry of Information and Broadcasting banned from 5 OTT apps from Google play store and suspended all of their 5 social media accounts, as well as 5 closed streaming websites. The banned platforms were include Feel App, Digi Movieplex, Jugnu App, MoodX VIP, and Koyal Playpro. === Legal action === Currently OTT is regulated under the IT Rules 2021, which clearly stated that 'No content that is prohibited by law at the time being force can be Publishing or transmitted'. MIB has continuously taking action
Read more →
Algorithmic curation

Algorithm curation is the selection of online media by technologies such as recommender systems and personalized search. Curation entails the selective sharing of online content and recommendations based on inferred interests. Curation algorithms implement different filter approaches, such as collaborative filtering and content-based filtering. Examples include search engine and social media products such as the Twitter feed, Facebook's News Feed, and Google Personalized Search. == History == === Early algorithmic curation === Online platforms use newsfeed algorithms to determine what content to present to each user. The volume of content published on social media platforms created a need for automated filtering, as manual review of all available content by users is not feasible. These systems function as a form of gatekeeper, shaping which new material users are exposed to and influencing knowledge, attention, and political exposure. ==== Information overload ==== Early ranking algorithms addressed information overload by surfacing the most recent or most popular posts. Later systems shifted toward ranking content based on predicted engagement, aiming to increase the time users spend on a platform. Research has found that these engagement-oriented systems can increase the spread of misinformation and contribute to political polarization as a side effect of optimising for user interaction. ==== How algorithm changes users' feeds over time ==== Algorithmic curation has been found to increase source diversity in some respects while simultaneously reducing the number of external links presented to users, which limits exposure to off-platform content. Research using agent-based modelling has examined how user behaviour, information quality, and algorithmic design interact with one another over time. === Emergence of AI === Platforms increasingly shifted from rule-based ranking systems toward machine-learning and AI-driven approaches, which allow feeds to be personalised at a larger scale and with greater responsiveness to user behaviour. For example, X (formerly Twitter) moved away from a chronological feed toward an AI-powered ranking system that personalises content for each user. These systems are capable of making ranking decisions across volumes of content and user interactions that would not be practical to handle manually. == Approach == === Filter types === ==== Collaborative filtering ==== Collaborative filtering (CF) methods create recommendations based on a person's usage patterns. CF predicts a person's preference for an item by matching their interests with those of users who have similar interests. This process allows for the sharing of ratings between users with similar profiles. CF is based on patterns of human behaviour rather than machine analysis of content itself. Users of CF systems rate items they have interacted with, and these ratings form a profile of interests. The CF system then matches that user with others who have similar profiles, and uses their ratings to generate recommendations. Collaborative filtering can be applied across various content types including text, images, music, and financial products, and can account for complex attributes such as taste and quality that are difficult to represent explicitly. ==== Content-based filtering ==== Content-based filtering (CBF) builds a user profile to represent the types of items a user has engaged with, based on keywords and attributes used to describe those items. Recommendations are generated by presenting items similar to those the user has previously engaged with or is currently viewing. The CBF method creates a profile for each item based on discrete attributes and features, and then constructs a content-based user profile using a weighted vector of those features derived from items the user has rated, purchased, or interacted with. The weights represent the relative importance of each feature, and can be computed using techniques such as Bayesian classifiers, cluster analysis, decision trees, and artificial neural networks, with the goal of estimating the probability that a user will engage with a suggested item. One application of content-based filtering is Pandora Radio, where users provide an artist, genre, or composer to generate a station, and the system surfaces music with similar attributes. == Technology == === Recommender system === Recommender systems rank and suggest content to users based on a combination of implicit and explicit user input. Implicit signals include time spent viewing or engaging with a specific item. Explicit signals include actions such as liking posts, saving store pages, reading news articles, or sharing content. === Personalized search === Personalized search aims to retrieve results most relevant to the user by incorporating contextual factors beyond the explicit query, such as past queries, browsing history, and inferred interests. Social media platforms such as X (formerly Twitter) and Bluesky generate recommendations based on similar users and the content those users interact with. Personalized search may also allow users to explicitly filter results by blocking content containing certain phrases or hashtags. For first-time users without prior history, personalized search may draw on content-based filtering to establish an initial context. Similar processes are used by search engines and retail platforms to tailor results and product recommendations to individual users. == AI contribution == Artificial intelligence contributes to algorithmic curation through machine-learning models capable of processing large volumes of data. Techniques such as deep learning and reinforcement learning allow curation algorithms to model user preferences with greater granularity alongside established filtering approaches. This enables platforms to adjust content rankings rapidly in response to user behaviour. In social media and streaming contexts, AI-driven systems arrange feeds according to predicted relevance, with the outputs shaped by patterns present in the training data. == Social media and potential impact == === Echo chambers === Social media algorithms, such as those used by X (formerly Twitter), recommend content that the system predicts a user will engage with positively. Content from accounts with differing perspectives is less likely to be surfaced, which may reduce source and topic diversity and contribute to the formation of echo chambers. For example, Facebook's news feed is designed to surface content aligned with users' prior engagement, which may reinforce existing views. This dynamic may contribute to filter bubbles, in which users are seldom exposed to content outside their existing interests. Users may further narrow their feeds by actively blocking certain content or accounts. === Over-representation === A pattern observed across social media platforms is the concentration of algorithmic visibility among a small subset of users. Content from the most active users, those with the largest followings, or those generating the most engagement tends to be surfaced more frequently, meaning a small number of accounts can account for a disproportionate share of what appears in other users' feeds.
Read more →
Digital cassettes

Digital audio cassette formats introduced to the professional audio and consumer markets: Digital Audio Tape (or DAT) is the most well-known, and had some success as an audio storage format among professionals and "prosumers" before the prices of hard drive and solid-state flash memory-based digital recording devices dropped in the late 1990s. Hard-drive recording has mostly made DAT obsolete, as hard disk recorders offer more editing versatility than tape, and easier importation into digital audio workstations (DAWs) and non-linear video editing (NLE) systems. Digital Compact Cassette was intended as a digital replacement for the mass-market analog cassette tape, but received very little attention or adaptation. Its failure is generally attributed to higher production costs than audio CDs, durability and indifferent reception by consumers. Digital video cassettes include: Betacam IMX (Sony) D-VHS (JVC) D1 (Sony) D2 (Sony) D3 D5 HD Digital-S D9 (JVC) Digital Betacam (Sony) Digital8 (Sony) DV HDV ProHD (JVC) MiniDV MicroMV == Analog cassettes used as digital data storage == Historically, the compact audio cassette which was originally designed for analog storage of music was used as an alternative to disk drives in the late 1970s and early 1980s to provide data storage for home computers. There is a number of unique and incompatible cassette tape data storage formats that all use the same analog compact audio cassette tape media. The ADAT system uses Super VHS tapes to record 8 synchronized digital audiotracks at once. There have also been several audio recording systems that used VHS video recorders as storage devices and video tape transports, generally by encoding the digital data to be recorded into an analog composite video signal (which resembles static) and then recording this to magnetic tape. These systems were often used as "mixdown" recorders, to record the finished mix from a multi-track recorder in preparation for the manufacture of a vinyl record, cassette tape, or CD. An example was the Dbx Model 700. Another example is the Sony PCM adaptor series. Several companies sold VHS backup solutions in the 1980s and 1990s where data was converted to a video image which was then saved onto a VHS tape. the Corvus "Mirror" ( U.S. patent 4380047A ) the Metrum Model 64 on S-VHS tape, the Danmere Backer tape backup system, the Alpha Microsystems Videotrax the Legacy Storage Systems International VAST (Variable Array Storage) the ArVid the Video Backup System Amiga, The S2 VLBI system at three NASA Deep Space Network complexes and over 20 other radio telescopes stores digital data on SVHS tapes.
Read more →
Hidden layer

In artificial neural networks, a hidden layer is a layer of artificial neurons that is neither an input layer nor an output layer. The simplest examples appear in multilayer perceptrons (MLP), as illustrated in the diagram. An MLP without any hidden layer is essentially just a linear model. With hidden layers and activation functions, however, nonlinearity is introduced into the model. In typical machine learning practice, the weights and biases are initialized, then iteratively updated during training via backpropagation.
Read more →
CloudLibrary

CloudLibrary (stylized as "cloudLibrary") is a cloud-based software system through which libraries lend electronic books; it is also the name of the app that users download to access the e-books. CloudLibrary was created in 2011 by 3M as part of its library systems unit as a competitor to OverDrive, Inc.; in 2015 3M sold the North American part of that unit to Bibliotheca Group GmbH, a company founded in 2011 that was funded by One Equity Partners Capital Advisors, a division of JP Morgan Chase. By 2019, Bibliotecha had tried, unsuccessfully, to negotiate with Amazon to add Kindle-ebook compatibility to cloudLibrary - something that, as of then, Amazon had only made available to Overdrive. In that year, cloudLibrary, along with hoopla offered by Midwest Tape, ODILO, and Baker & Taylor’s Axis 360, were the main competitors to the Overdrive and Libby apps offered by OverDrive, Inc. in the library e-book market. In April 2024, Bibliotheca sold cloudLibrary to the nonprofit cooperative OCLC. By that time, cloudLibrary was used by around 500 libraries in around 20 countries in around 50 languages, and was used to lend audiobooks, digital magazines, newspapers, and comics, and streaming media, along with e-books.
Read more →
Global digital divide

The global digital divide describes global disparities, primarily between developed and developing countries, in regards to access to computing and information resources such as the Internet and the opportunities derived from such access. The Internet is expanding very quickly, and not all countries—especially developing countries—can keep up with the constant changes. The term "digital divide" does not necessarily mean that someone does not have technology; it could mean that there is simply a difference in technology. These differences can refer to, for example, high-quality computers, fast Internet, technical assistance, or telephone services. == Statistics == There is a large inequality worldwide in terms of the distribution of installed telecommunication bandwidth. In 2014 only three countries (China, US, Japan) host 50% of the globally installed bandwidth potential (see pie-chart Figure on the right). This concentration is not new, as historically only ten countries have hosted 70–75% of the global telecommunication capacity (see Figure). The U.S. lost its global leadership in terms of installed bandwidth in 2011, being replaced by China, which hosts more than twice as much national bandwidth potential in 2014 (29% versus 13% of the global total). == Versus the digital divide == The global digital divide is a special case of the digital divide; the focus is set on the fact that "Internet has developed unevenly throughout the world" causing some countries to fall behind in technology, education, labor, democracy, and tourism. The concept of the digital divide was originally popularized regarding the disparity in Internet access between rural and urban areas of the United States of America; the global digital divide mirrors this disparity on an international scale. The global digital divide also contributes to the inequality of access to goods and services available through technology. Computers and the Internet provide users with improved education, which can lead to higher wages; the people living in nations with limited access are therefore disadvantaged. This global divide is often characterized as falling along what is sometimes called the North–South divide of "northern" wealthier nations and "southern" poorer ones. == Obstacles to a solution == Some people argue that necessities need to be considered before achieving digital inclusion, such as an ample food supply and quality health care. Minimizing the global digital divide requires considering and addressing the following types of access: === Physical access === Involves "the distribution of ICT devices per capita…and land lines per thousands". Individuals need to obtain access to computers, landlines, and networks in order to access the Internet. This access barrier is also addressed in Article 21 of the convention on the Rights of Persons with Disabilities by the United Nations. === Financial access === The cost of ICT devices, traffic, applications, technician and educator training, software, maintenance, and infrastructures require ongoing financial means. Financial access and "the levels of household income play a significant role in widening the gap". === Socio-demographic access === Empirical tests have identified that several socio-demographic characteristics foster or limit ICT access and usage. Among different countries, educational levels and income are the most powerful explanatory variables, with age being a third one. While a Global Gender Gap in access and usage of ICT's exist, empirical evidence shows that this is due to unfavorable conditions concerning employment, education and income and not to technophobia or lower ability. In the contexts understudy, women with the prerequisites for access and usage turned out to be more active users of digital tools than men. In the US, for example, the figures for 2018 show 89% of men and 88% of women use the Internet. === Cognitive access === In order to use computer technology, a certain level of information literacy is needed. Further challenges include information overload and the ability to find and use reliable information. === Design access === Computers need to be accessible to individuals with different learning and physical abilities including complying with Section 508 of the Rehabilitation Act as amended by the Workforce Investment Act of 1998 in the United States. === Institutional access === In illustrating institutional access, Wilson states "the numbers of users are greatly affected by whether access is offered only through individual homes or whether it is offered through schools, community centers, religious institutions, cybercafés, or post offices, especially in poor countries where computer access at work or home is highly limited". === Political access === Guillen & Suarez argue that "democratic political regimes enable faster growth of the Internet than authoritarian or totalitarian regimes." The Internet is considered a form of e-democracy, and attempting to control what citizens can or cannot view is in contradiction to this. Recently situations in Iran and China have denied people the ability to access certain websites and disseminate information. Iran has prohibited the use of high-speed Internet in the country and has removed many satellite dishes in order to prevent the influence of Western culture, such as music and television. === Cultural access === Many experts claim that bridging the digital divide is not sufficient and that the images and language needed to be conveyed in a language and images that can be read across different cultural lines. A 2013 study conducted by Pew Research Center noted how participants taking the survey in Spanish were nearly twice as likely not to use the internet. == Examples == In the early 21st century, residents of developed countries enjoy many Internet services which are not yet widely available in developing countries, including: Mobile phones and small electronic communication devices; E-communities and social-networking; Fast broadband Internet connections, enabling advanced Internet applications; Affordable and widespread Internet access, either through personal computers at home or work, through public terminals in public libraries and Internet cafes, and through wireless access points; E-commerce enabled by efficient electronic payment networks like credit cards and reliable shipping services; Virtual globes featuring street maps searchable down to individual street addresses and detailed satellite and aerial photography; Online research systems which enable users to peruse newspaper and magazine articles that may be centuries old, without having to leave home; Electronic readers such as Kindle, Sony Reader, Samsung Papyrus and Iliad by iRex Technologies; Price engines which help consumers find the best possible online prices and similar services which find the best possible prices at local retailers; Electronic services delivery of government services, such as the ability to pay taxes, fees, and fines online. Further civic engagement through e-government and other sources such as finding information about candidates regarding political situations. == Proposed remedies == There are four specific arguments why it is important to "bridge the gap": Economic equality – For example, the telephone is often seen as one of the most important components, because having access to a working telephone can lead to higher safety. If there were to be an emergency, one could easily call for help if one could use a nearby phone. In another example, many work-related tasks are online, and people without access to the Internet may not be able to complete work up to company standards. The Internet is regarded by some as a basic component of civic life that developed countries ought to guarantee for their citizens. Additionally, welfare services, for example, are sometimes offered via the Internet. Social mobility – Computer and Internet use is regarded as being very important to development and success. However, some children are not getting as much technical education as others, because lower socioeconomic areas cannot afford to provide schools with computer facilities. For this reason, some kids are being separated and not receiving the same chance as others to be successful. Democracy – Some people believe that eliminating the digital divide would help countries become healthier democracies. They argue that communities would become much more involved in events such as elections or decision making. Economic growth – It is believed that less-developed nations could gain quick access to economic growth if the information infrastructure were to be developed and well used. By improving the latest technologies, certain countries and industries can gain a competitive advantage. While these four arguments are meant to lead to a solution to the digital divide, there are a couple of other components that need to be considered. The first one is rural living versus s
Read more →
Digital divide

Digital divide is inequitable access to and use of digital technology, encompassing four interrelated dimensions: motivational, material, skills, and usage access. The digital divide worsens inequality in access to information and resources. According to 2026 data from the U.S. Census Bureau, a significant 'digital divide' persists, with over 15.7 million Americans lacking access to high-speed broadband. Students from low-income households often face limited access to reliable internet and digital devices, which negatively affects their educational opportunities. In the Information Age, people without access to the Internet and other technology are at a disadvantage, for they are less able to connect with others, find and apply for jobs, shop, and learn. People living in poverty, in insecure housing or who are homeless, elderly people, and those living in rural communities may have limited access to the Internet; in contrast, urban middle class people have easy access to the Internet. Another divide is between producers and consumers of Internet content, which could be a result of educational disparities. While social media use varies across age groups, a US 2010 study reported no racial divide. == History == The historical roots of the digital divide in the United States refer to the increasing gap that occurred during the early modern period between those who could and could not access the real time forms of calculation, decision-making, and visualization offered via written and printed media. "Over time, focus has shifted from binary access to differentiated use, where quality and purpose of engagement vary across socio-economic groups." Within this context, ethical discussions regarding the relationship between education and the free distribution of information were raised by thinkers such as Immanuel Kant, Jean Jacques Rousseau, and Mary Wollstonecraft (1712–1778). The latter advocated that governments should intervene to ensure that any society's economic benefits should be fairly and meaningfully distributed. Amid the Industrial Revolution in Great Britain, Rousseau's idea helped to justify poor laws that created a safety net for those who were harmed by new forms of production. Later, when telegraph and postal systems evolved, many used Rousseau's ideas to argue for full access to those services, even if it meant subsidizing hard-to-serve citizens. Thus, "universal services" referred to innovations in regulation and taxation that would allow phone services such as AT&T in the United States to serve hard-to-serve rural users. In 1996, as telecommunications companies merged with Internet companies, the Federal Communications Commission adopted Telecommunications Act of 1996 to consider regulatory strategies and taxation policies to close the digital divide. Though the term "digital divide" was coined among consumer groups that sought to tax and regulate information and communications technology (ICeT) companies to close the digital divide, the topic soon moved onto a global stage. The focus was the World Trade Organization which passed the Telecommunications Services Act, which resisted regulation of ICT companies so that they would be required to serve hard-to-serve individuals and communities. In 1999, to assuage anti-globalization forces, the WTO hosted the "Financial Solutions to Digital Divide" in Seattle, US, co-organized by Craig Warren Smith of Digital Divide Institute and Bill Gates Sr. the chairman of the Bill and Melinda Gates Foundation. It catalyzed a full-scale global movement to close the digital divide, which quickly spread to all sectors of the global economy. In 2000, US president Bill Clinton mentioned the term in the State of the Union Address. Since the early 2000s, the international community has transitioned from a focus on domestic infrastructure to a global, multi-dimensional framework for digital equity. This shift was formalized through the World Summit on the Information Society (WSIS) in Geneva (2003) and Tunis (2005), where the International Telecommunication Union (ITU) established a roadmap for bridging the Global North-South disparity as part of the Sustainable Development Goals. Academic and policy discourse has since evolved to distinguish between the first-level divide (physical access), the second-level divide (digital literacy), and the third-level divide (the ability to translate technology use into socio-economic capital). By the 2020s, critical reflections on national development emphasized that the divide is fundamentally a socio-institutional gap. Research by Tiwari, Kostenko, and Yekhanurov (2025) identifies four pillars for achieving national digital maturity which are digital governance capacity, institutional design to prevent adverse digital incorporation, infrastructure resilience, and citizen capability. This modern era is characterized by the pursuit of meaningful connectivity, a standard that requires internet access to be not only available but affordable, high-speed, and supportive of active content creation. === During the COVID-19 pandemic === At the outset of the COVID-19 pandemic, governments worldwide issued stay-at-home orders that imposed lockdowns, quarantines, restrictions, and closures. The resulting interruptions to schooling, public services, and business operations drove nearly half of the world's population into seeking alternative methods to live while in isolation. These methods included telemedicine, virtual classrooms, online shopping, technology-based social interactions and working remotely, all of which require access to high-speed or broadband internet access and digital technologies. A Pew Research Centre study reports that 90% of Americans describe the use of the Internet as "essential" during the pandemic. The accelerated use of digital technologies created a landscape where the ability, or lack thereof, to access digital spaces became a crucial factor in everyday life. According to the Pew Research Center, 59% of children from lower-income families were likely to face digital obstacles in completing school assignments. These obstacles included the use of a cellphone to complete homework, having to use public Wi-Fi because of unreliable internet service in the home and lack of access to a computer in the home. This difficulty, titled the homework gap, affects more than 30% of K-12 students living below the poverty threshold, and disproportionally affects American Indian/Alaska Native, Black, and Hispanic students. These types of interruptions or privilege gaps in education exemplify problems in the systemic marginalization of historically oppressed individuals in primary education. The pandemic exposed inequity causing discrepancies in learning. "Large-scale events such as COVID-19 intensify both access and skills gaps, underlining the need for resilient digital inclusion policies. Studies during COVID-19 reveal first-level (access) and second-level (skills) divides, with underserved students struggling with reliable internet, devices, and platform navigation ” A lack of "tech readiness", that is, confident and independent use of devices, was reported among the US elderly population; with more than 50% reporting an inadequate knowledge of devices and more than one-third reporting a lack of confidence. "Older adults often face skills and confidence barriers, illustrating later-stage divides in van Dijk’s model." Moreover, according to a UN research paper, similar results can be found across various Asian countries, with those aged over 74, reporting less confident or inconsistent use of digital devices. This aspect of the digital divide and the elderly occurred during the pandemic as healthcare providers increasingly relied upon telemedicine to manage chronic and acute health conditions. == Aspects == There are various definitions of the digital divide, all with slightly different emphasis, which is evidenced by related concepts like digital inclusion, digital participation, digital skills, media literacy, and digital accessibility.“Van Dijk’s model identifies sequential barriers—motivational, material, skills, and usage—that must be addressed to bridge the divide.” === Infrastructure === The infrastructure by which individuals, households, businesses, and communities connect to the Internet addresses the physical mediums that people use to connect to the Internet such as desktop computers, laptops, basic mobile phones or smartphones, MP3 players, gaming consoles, electronic book readers, and tablets. Traditionally, the nature of the divide has been measured in terms of the existing numbers of subscriptions and digital devices. Given the increasing number of such devices, some have concluded that the digital divide among individuals has increasingly been closing as the result of a natural and almost automatic process. Others point to persistent lower levels of connectivity among women, racial and ethnic minorities, people with lower incomes, rura
Read more →
Weak supervision

Weak supervision (also known as semi-supervised learning) is a paradigm in machine learning, the relevance and notability of which increased with the advent of large language models due to the large amount of data required to train them. It is characterized by using a combination of a small amount of human-labeled data (exclusively used in more expensive and time-consuming supervised learning paradigm), followed by a large amount of unlabeled data (used exclusively in unsupervised learning paradigm). In other words, the desired output values are provided only for a subset of the training data. The remaining data is unlabeled or imprecisely labeled. Intuitively, it can be seen as an exam and labeled data as sample problems that the teacher solves for the class as an aid in solving another set of problems. In the transductive setting, these unsolved problems act as exam questions. In the inductive setting, they become practice problems of the sort that will make up the exam. == Problem == The acquisition of labeled data for a learning problem often requires a skilled human agent (e.g. to transcribe an audio segment) or a physical experiment (e.g. determining the 3D structure of a protein or determining whether there is oil at a particular location). The cost associated with the labeling process thus may render large, fully labeled training sets infeasible, whereas acquisition of unlabeled data is relatively inexpensive. In such situations, semi-supervised learning can be of great practical value. Semi-supervised learning is also of theoretical interest in machine learning and as a model for human learning. == Technique == More formally, semi-supervised learning assumes a set of l {\displaystyle l} independently identically distributed examples x 1 , … , x l ∈ X {\displaystyle x_{1},\dots ,x_{l}\in X} with corresponding labels y 1 , … , y l ∈ Y {\displaystyle y_{1},\dots ,y_{l}\in Y} and u {\displaystyle u} unlabeled examples x l + 1 , … , x l + u ∈ X {\displaystyle x_{l+1},\dots ,x_{l+u}\in X} are processed. Semi-supervised learning combines this information to surpass the classification performance that can be obtained either by discarding the unlabeled data and doing supervised learning or by discarding the labels and doing unsupervised learning. Semi-supervised learning may refer to either transductive learning or inductive learning. The goal of transductive learning is to infer the correct labels for the given unlabeled data x l + 1 , … , x l + u {\displaystyle x_{l+1},\dots ,x_{l+u}} only. The goal of inductive learning is to infer the correct mapping from X {\displaystyle X} to Y {\displaystyle Y} . It is unnecessary (and, according to Vapnik's principle, imprudent) to perform transductive learning by way of inferring a classification rule over the entire input space; however, in practice, algorithms formally designed for transduction or induction are often used interchangeably. == Assumptions == In order to make any use of unlabeled data, some relationship to the underlying distribution of data must exist. Semi-supervised learning algorithms make use of at least one of the following assumptions: === Continuity / smoothness assumption === Points that are close to each other are more likely to share a label. This is also generally assumed in supervised learning and yields a preference for geometrically simple decision boundaries. In the case of semi-supervised learning, the smoothness assumption additionally yields a preference for decision boundaries in low-density regions, so few points are close to each other but in different classes. === Cluster assumption === The data tend to form discrete clusters, and points in the same cluster are more likely to share a label (although data that shares a label may spread across multiple clusters). This is a special case of the smoothness assumption and gives rise to feature learning with clustering algorithms. === Manifold assumption === The data lie approximately on a manifold of much lower dimension than the input space. In this case learning the manifold using both the labeled and unlabeled data can avoid the curse of dimensionality. Then learning can proceed using distances and densities defined on the manifold. The manifold assumption is practical when high-dimensional data are generated by some process that may be hard to model directly, but which has only a few degrees of freedom. For instance, human voice is controlled by a few vocal folds, and images of various facial expressions are controlled by a few muscles. In these cases, it is better to consider distances and smoothness in the natural space of the generating problem, rather than in the space of all possible acoustic waves or images, respectively. == History == The heuristic approach of self-training (also known as self-learning or self-labeling) is historically the oldest approach to semi-supervised learning, with examples of applications starting in the 1960s. The transductive learning framework was formally introduced by Vladimir Vapnik in the 1970s. Interest in inductive learning using generative models also began in the 1970s. A probably approximately correct learning bound for semi-supervised learning of a Gaussian mixture was demonstrated by Ratsaby and Venkatesh in 1995. == Methods == === Generative models === Generative approaches to statistical learning first seek to estimate p ( x | y ) {\displaystyle p(x|y)} , the distribution of data points belonging to each class. The probability p ( y | x ) {\displaystyle p(y|x)} that a given point x {\displaystyle x} has label y {\displaystyle y} is then proportional to p ( x | y ) p ( y ) {\displaystyle p(x|y)p(y)} by Bayes' rule. Semi-supervised learning with generative models can be viewed either as an extension of supervised learning (classification plus information about p ( x ) {\displaystyle p(x)} ) or as an extension of unsupervised learning (clustering plus some labels). Generative models assume that the distributions take some particular form p ( x | y , θ ) {\displaystyle p(x|y,\theta )} parameterized by the vector θ {\displaystyle \theta } . If these assumptions are incorrect, the unlabeled data may actually decrease the accuracy of the solution relative to what would have been obtained from labeled data alone. However, if the assumptions are correct, then the unlabeled data necessarily improves performance. The unlabeled data are distributed according to a mixture of individual-class distributions. In order to learn the mixture distribution from the unlabeled data, it must be identifiable, that is, different parameters must yield different summed distributions. Gaussian mixture distributions are identifiable and commonly used for generative models. The parameterized joint distribution can be written as p ( x , y | θ ) = p ( y | θ ) p ( x | y , θ ) {\displaystyle p(x,y|\theta )=p(y|\theta )p(x|y,\theta )} by using the chain rule. Each parameter vector θ {\displaystyle \theta } is associated with a decision function f θ ( x ) = argmax y p ( y | x , θ ) {\displaystyle f_{\theta }(x)={\underset {y}{\operatorname {argmax} }}\ p(y|x,\theta )} . The parameter is then chosen based on fit to both the labeled and unlabeled data, weighted by λ {\displaystyle \lambda } : argmax Θ ( log ⁡ p ( { x i , y i } i = 1 l | θ ) + λ log ⁡ p ( { x i } i = l + 1 l + u | θ ) ) {\displaystyle {\underset {\Theta }{\operatorname {argmax} }}\left(\log p(\{x_{i},y_{i}\}_{i=1}^{l}|\theta )+\lambda \log p(\{x_{i}\}_{i=l+1}^{l+u}|\theta )\right)} === Low-density separation === Another major class of methods attempts to place boundaries in regions with few data points (labeled or unlabeled). One of the most commonly used algorithms is the transductive support vector machine, or TSVM (which, despite its name, may be used for inductive learning as well). Whereas support vector machines for supervised learning seek a decision boundary with maximal margin over the labeled data, the goal of TSVM is a labeling of the unlabeled data such that the decision boundary has maximal margin over all of the data. In addition to the standard hinge loss ( 1 − y f ( x ) ) + {\displaystyle (1-yf(x))_{+}} for labeled data, a loss function ( 1 − | f ( x ) | ) + {\displaystyle (1-|f(x)|)_{+}} is introduced over the unlabeled data by letting y = sign ⁡ f ( x ) {\displaystyle y=\operatorname {sign} {f(x)}} . TSVM then selects f ∗ ( x ) = h ∗ ( x ) + b {\displaystyle f^{}(x)=h^{}(x)+b} from a reproducing kernel Hilbert space H {\displaystyle {\mathcal {H}}} by minimizing the regularized empirical risk: f ∗ = argmin f ( ∑ i = 1 l ( 1 − y i f ( x i ) ) + + λ 1 ‖ h ‖ H 2 + λ 2 ∑ i = l + 1 l + u ( 1 − | f ( x i ) | ) + ) {\displaystyle f^{}={\underset {f}{\operatorname {argmin} }}\left(\displaystyle \sum _{i=1}^{l}(1-y_{i}f(x_{i}))_{+}+\lambda _{1}\|h\|_{\mathcal {H}}^{2}+\lambda _{2}\sum _{i=l+1}^{l+u}(1-|f(x_{i})|)_{+}\right)} An exact solution is intractable due to the non-convex term ( 1 − | f ( x ) | ) + {\displayst
Read more →
Bulletin (service)

Bulletin was an online newsletter platform launched by Facebook on July 6, 2021, that allows notable writers to make announcements directly to their subscribers. Its competitors included Substack, of which Bulletin was called a "near-clone." Writers participating in the platform's launch included Malcolm Gladwell, Mitch Albom, Tan France, Jessica Yellin, Jane Wells, Erin Andrews and Dorie Greenspan. Facebook CEO Mark Zuckerberg stated that Bulletin represented the first time that the company had "built a project that is directly for journalists and individual writers." In October 2022 Meta announced the shutdown of Bulletin. The platform went into read only mode in January 2023 and became unavailable in April 2023. == History == Facebook announced Bulletin as its online newsletter platform on June 29, 2021. and launched by the company on July 6, 2021. Facebook CEO Mark Zuckerberg touted the service by saying that Bulletin represented the first time that the company had "built a project that is directly for journalists and individual writers." Writers participating in the platform's launch included Malcolm Gladwell, Mitch Albom, Tan France, Jessica Yellin, Jane Wells, Erin Andrews and Dorie Greenspan. == Reception == Unlike competitor such as Substack, Facebook indicated upon service's launch that it would not take a cut of subscription fees of writers using that platform. According to Washington Post technology writer Will Oremus, the move was criticized by those who viewed it as a form of predatory pricing intended by Facebook to force those competitors out of business. Sandeep Vaheesan, legal director of the think tank Open Markets, called for the government to reexamine predatory pricing as a violation of antitrust law, saying, "We want companies to compete by making better products, investing in new equipment and tech — not purely relying on their financial advantages to capture market share."
Read more →
List of search appliance vendors

A search appliance is a type of computer which is attached to a corporate network for the purpose of indexing the content shared across that network in a way that is similar to a web search engine. It may be made accessible through a public web interface or restricted to users of that network. A search appliance is usually made up of: a gathering component, a standardizing component, a data storage area, a search component, a user interface component, and a management interface component. == Vendors of search appliances == Fabasoft Google InfoLibrarian Search Appliance™ Maxxcat Searchdaimon Thunderstone == Former/defunct vendors of search appliances == Black Tulip Systems Google Search Appliance Index Engines Munax Perfect Search Appliance
Read more →
M-DISC

M-DISC (Millennial Disc) is a write-once optical disc technology introduced in 2009 by Millenniata, Inc. and available as DVD and Blu-ray discs. == Overview == M-DISC's design is intended to provide archival media longevity. M-Disc claims that properly stored M-DISC DVD recordings will last up to 1000 years. The M-DISC DVD looks like a standard disc, except it is almost transparent with later DVD and BD-R M-Disks having standard and inkjet printable labels. The patents protecting the M-DISC technology assert that the data layer is a glassy carbon material that is substantially inert to oxidation and has a melting point of 200–1000 °C (392–1832 °F). M-Discs are readable by most regular DVD players made after 2005 and Blu-Ray and BDXL disc drives and writable by most made after 2011. Available recording capacities conform to standard DVD/Blu-ray sizes: 4.7 GB DVD+R to 25 GB BD-R, 50 GB BD-R and 100 GB BDXL. == History == M-DISC developer Millenniata, Inc. was co-founded by Brigham Young University professors Barry Lunt, Matthew Linford, CEO Henry O'Connell and CTO Doug Hansen. The company was incorporated on May 13, 2010, in American Fork, Utah. Millenniata, Inc. officially went bankrupt in December 2016. Under the direction of CEO Paul Brockbank, Millenniata had issued convertible debt. When the obligation for conversion was not satisfied, the company defaulted on the debt payment and the debt holders took possession of all of the company's assets. The debt holders subsequently started a new company, Yours.co, to sell M-DISCs and related services. As of the 2020s, there are only 2 licensed manufacturers of M-Discs: Ritek, sold under the Ritek and RiDATA brands, and Verbatim with co-branded discs, marketed as the "Verbatim M-DISC". 128 GB BDXL never made it to market due to the 2016 bankruptcy. Early in 2022, Verbatim changed the formulation of their M-DISC branded Blu-rays. These new discs could be written at a faster rate than the previous ones – 6× speed instead of 4×. The new discs also had different colouration and markings compared with older version. Later in the year customers accused Verbatim of selling an inferior product and deceptive marketing. Verbatim responded that the new discs were a further development of the older discs and should have the same longevity, and that the technical changes therein were responsible for the altered appearance and higher write speeds. The updated M-DISC currently sold on the market uses the same metal ablative layer (MABL) metal oxide inorganic recording layer used in many of Verbatim's regular Blu-ray products. == Durability claims == The original M-DISC DVD+R was tested according to ISO/IEC 10995:2011 and ECMA-379 with a projected rated lifespan of several hundred years in archival use. The glassy carbon layers, in theory if preserved correctly in an environment like a salt mine, could store the data for over 10,000 years before going outside of readable specifications. However, the polycarbonate plastics, which are commonly used by almost all optical media and heavily in CBRN and ballistic protective equipment due to their optical, physical impact and chemical resistant properties, have a lifespan rating of only around 1000 years before degradation. Verbatim Japan claims that M-DISCs now use a titanium layer to prevent moisture ingression and to provide environmental stability. M-DISCs sold in Japan are advertised to have a projected lifespan of 100 years or more based on internal ISO/IEC 16963 testing, while other regional Verbatim websites claim that M-DISCs have a projected lifespan of "several hundred years" based on ISO/IEC 16963 testing. == Durability testing == In 2009, testing was done by the US Department of Defense (DoD) producing the China Lake Report testing Millenniata's M-Disk DVD to current market offerings from Delkin, MAM-A, Mitsubishi, Taiyo Yuden and Verbatim with all brands using organic dyes failing to pass the series of accelerated aging tests. From 2010 to 2012, the French National Laboratory of Metrology and Testing (LNE) used high-temperature accelerated aging testing, at 90 °C (194 °F) and 85% relative humidity inside a CLIMATS Excal 5423-U, for 250 to 1000 hours with a mix of inorganic DVD+R discs from MPO, Verbatim, Maxell, Syylex and DataTresor. The summary of the tests states that Syylex Glass Master Disc was rated for 1000+ hours, DataTresor Disc 250 hours+ and M-Disk under 250 hours. The Syylex disc was a custom-ordered product that could not be burned in a consumer player when they were still purchaseable from Syylex before their bankruptcy, so it was not truly in the same category as the others. In 2016, a consumer Mol Smith did real world stress testing on the 25 GB BD-R M-Disc alongside TDK's standard BD-R 25 GB disc using a copied movie, which demonstrated the reliability of M-Disc's molding compared to standard discs; after 60 days of outdoor direct exposure the M-Disk was played without error, while the TDK disc was physically destroyed. In 2022, the NIST Interagency Report NIST IR 8387 listed the M-Disc as an acceptable archival format rated for 100+ years, citing the aforementioned 2009 and 2012 tests by the US Department of Defense and French National Laboratory of Metrology and Testing as sources. == Commercial support == While recorded discs are readable in conventional DVD and BD drives, M-disc DVDs can only be burned by drives with firmware that supports the slightly higher power mode that M-Disk requires for burning its inorganic layers, as such writing speed is typically 2× speed. Blu-ray M-discs can be both written and read in most standard Blu-ray drives and are certified by the Blu-ray Disc Association to meet all current standard specifications as of 2019. Typically, the M-Discs cost 1.5–3× the price of standard Blu-Ray discs with DVD M-Discs now having sparse availability. With the first-generation DVD M-DISCs, it was difficult to determine which was the writable side of the disc due to being near fully translucent, until coloring and later labels similar to that on standard DVD discs was added to discs to help distinguish the sides preventing user error. Asus, LG Electronics, Lite-On, Pioneer, Buffalo Technology, and Hitachi-LG produce drives that can record M-DISC media while Verbatim and Ritek produce M-DISC discs. == Adoption == The regional government of the U.S. state of Utah has used M-Disc since 2011. Some consumers and avid datahoarders have adopted the format for cold digital data storage. == Alternative technologies == === Optical === Syylex Glass Master Disc: these discs use etched glass and are only typically degradable by physical or chemical damage, but not by normal ageing inside an archival environment. Current BD 25 GB, BD-R DL 50 GB & BDXL 100 GB (three layer) and Sony's BDXL 128 GB (four layer) discs are rated for up to 50 years (Standard inorganic HTL discs). Sony's Optical Disc Archive, is an optical competitor to the LTO tape-based data storage system, currently with up to 5.5 TB cartridges of dual-sided 120mm discs, with desktop readers and automated rackmount standard archival systems allowing for large scale archival and data retrieval rated for an estimated 100+ years. Pioneer DM for Archive is a disc media and drive combination developed by Pioneer to meet the requirements laid out by the Japanese government for preservation of financial data for a minimum of 100 years. The discs use a MABL type recording layer and are manufactured with tight tolerances. Although burnable in any BD Writer, when burned in Pioneers DM for Archive writers using the DM Archiver software the media and burn quality meet ISO/IEC 18630 which defines the testing methods needed for ensuring media and burn quality. === Magnetic === Linear Tape-Open (LTO) is rated for up to 30 years in a climate-controlled environment and is currently in use by most industries, including broadcast and corporate digital data systems. The latest generation released in 2026 is LTO-10, it defines two unique cartridge types which can hold 30 TB or 40 TB each Hard disk drives are currently available up to 30 TB (HDD) capacity in 3.5-inch format and 5 TB in 2.5-inch laptop format. However, unlike optical media, they are limited to 5–25 years of operation lifespan due to inevitable mechanical failure or magnetic instability. == Gallery ==
Read more →
Physics-informed neural networks

In machine learning, physics-informed neural networks (PINNs), also referred to as theory-trained neural networks (TTNs), are a type of universal function approximator that can embed the knowledge of any physical laws that govern a given data-set in the learning process, and can be described by partial differential equations (PDEs). Low data availability for some biological and engineering problems limit the robustness of conventional machine learning models used for these applications. The prior knowledge of general physical laws acts in the training of neural networks (NNs) as a regularization agent that limits the space of admissible solutions, increasing the generalizability of the function approximation. This way, embedding this prior information into a neural network results in enhancing the information content of the available data, facilitating the learning algorithm to capture the right solution and to generalize well even with a low amount of training examples. Because they process continuous spatial and time coordinates and output continuous PDE solutions, they can be categorized as neural fields. == Function approximation == Most of the physical laws that govern the dynamics of a system can be described by partial differential equations. For example, the Navier–Stokes equations are a set of partial differential equations derived from the conservation laws (i.e., conservation of mass, momentum, and energy) that govern fluid mechanics. The solution of the Navier–Stokes equations with appropriate initial and boundary conditions allows the quantification of flow dynamics in a precisely defined geometry. However, these equations cannot be solved exactly and therefore numerical methods must be used (such as finite differences, finite elements and finite volumes). In this setting, these governing equations must be solved while accounting for prior assumptions, linearization, and adequate time and space discretization. Recently, solving the governing partial differential equations of physical phenomena using deep learning has emerged as a new field of scientific machine learning (SciML), leveraging the universal approximation theorem and high expressivity of neural networks. In general, deep neural networks could approximate any high-dimensional function given that sufficient training data are supplied. However, such networks do not consider the physical characteristics underlying the problem, and the level of approximation accuracy provided by them is still heavily dependent on careful specifications of the problem geometry as well as the initial and boundary conditions. Without this preliminary information, the solution is not unique and may lose physical correctness. To remedy this, Physics-Informed Neural Networks (PINNs) leverage governing physical equations in neural network training. Namely, PINNs are designed to be trained to satisfy the given training data as well as the imposed governing equations. In this fashion, a neural network can be guided with training datasets that do not necessarily need to be large or complete. An accurate solution of partial differential equations can potentially be found without knowing the boundary conditions. Therefore, with some knowledge about the physical characteristics of the problem and some form of training data (even sparse and incomplete), PINNs may be used for finding an optimal solution with high fidelity. PINNs can be applied to a wide range of problems in computational science, and are a pioneering technology leading to the development of new classes of numerical solvers for PDEs. PINNs can be thought of as a mesh-free alternative to traditional approaches (e.g., CFD for fluid dynamics), and new data-driven approaches for model inversion and system identification. Notably, a trained PINN network can be used to predict values on simulation grids of different resolutions without needing to be retrained. Additionally, the derivatives used in the partial differential equations can be computed using automatic differentiation (AD), which is assessed to be superior to numerical or symbolic differentiation. == Modeling and computation == A general nonlinear partial differential equation can be written as: u t + N [ u ; λ ] = 0 , x ∈ Ω , t ∈ [ 0 , T ] {\displaystyle u_{t}+{\mathcal {N}}[u;\lambda ]=0,\quad x\in \Omega ,\quad t\in [0,T]} where u ( t , x ) {\displaystyle u(t,x)} denotes the solution, N [ ⋅ ; λ ] {\displaystyle {\mathcal {N}}[\cdot ;\lambda ]} is a nonlinear operator parameterized by λ {\displaystyle \lambda } , and Ω {\displaystyle \Omega } is a subset of R D {\displaystyle \mathbb {R} ^{D}} . This general form of governing equations summarizes a wide range of problems in mathematical physics, such as conservative laws, diffusion process, advection-diffusion systems, and kinetic equations. Given noisy measurements of a generic dynamic system described by the equation above, PINNs can be designed to solve two classes of problems: data-driven solutions of partial differential equations data-driven discovery of partial differential equations === Data-driven solution of partial differential equations === The data-driven solution of PDE computes the hidden state u ( t , x ) {\displaystyle u(t,x)} of the system given boundary data and/or measurements z {\displaystyle z} , and fixed model parameters λ {\displaystyle \lambda } . We solve: u t + N [ u ] = 0 , x ∈ Ω , t ∈ [ 0 , T ] {\displaystyle u_{t}+{\mathcal {N}}[u]=0,\quad x\in \Omega ,\quad t\in [0,T]} . by defining the residual f ( t , x ) {\displaystyle f(t,x)} as: f := u t + N [ u ] {\displaystyle f:=u_{t}+{\mathcal {N}}[u]} , and approximating u ( t , x ) {\displaystyle u(t,x)} by a deep neural network. This network can be differentiated using automatic differentiation. The parameters of u ( t , x ) {\displaystyle u(t,x)} and f ( t , x ) {\displaystyle f(t,x)} can be then learned by minimizing the following loss function L tot {\displaystyle L_{\text{tot}}} : L tot = L u + L f {\displaystyle L_{\text{tot}}=L_{u}+L_{f}} where: L u = ‖ u − z ‖ Γ {\displaystyle L_{u}=\Vert u-z\Vert _{\Gamma }} is the error between the PINN u ( t , x ) {\displaystyle u(t,x)} and the set of boundary conditions and measured data on the set of points Γ {\displaystyle \Gamma } where the boundary conditions and data are defined. L f = ‖ f ‖ Γ {\displaystyle L_{f}=\Vert f\Vert _{\Gamma }} is the mean-squared error of the residual function. This second term encourages the PINN to learn the structural information expressed by the PDE during the training process. This approach has been used to yield computationally efficient physics-informed surrogate models with applications in the forecasting of physical processes, model predictive control, multi-physics and multi-scale modeling, and simulation. It has been shown to converge to the solution of the PDE. === Data-driven discovery of partial differential equations === Given noisy and incomplete measurements z {\displaystyle z} of the state of the system, the data-driven discovery of PDEs results in computing the unknown state u ( t , x ) {\displaystyle u(t,x)} and learning model parameters λ {\displaystyle \lambda } that best describe the observed data: u t + N [ u ; λ ] = 0 , x ∈ Ω , t ∈ [ 0 , T ] {\displaystyle u_{t}+{\mathcal {N}}[u;\lambda ]=0,\quad x\in \Omega ,\quad t\in [0,T]} By defining f ( t , x ) {\displaystyle f(t,x)} as: f := u t + N [ u ; λ ] = 0 {\displaystyle f:=u_{t}+{\mathcal {N}}[u;\lambda ]=0} , and approximating u ( t , x ) {\displaystyle u(t,x)} by a deep neural network, f ( t , x ) {\displaystyle f(t,x)} results in a PINN. This network can be derived using automatic differentiation. The parameters of u ( t , x ) {\displaystyle u(t,x)} and f ( t , x ) {\displaystyle f(t,x)} , together with the parameter λ {\displaystyle \lambda } of the differential operator can be then learned by minimizing the following loss function L tot {\displaystyle L_{\text{tot}}} : L tot = L u + L f {\displaystyle L_{\text{tot}}=L_{u}+L_{f}} where: L u = ‖ u − z ‖ Γ {\displaystyle L_{u}=\Vert u-z\Vert _{\Gamma }} , with u {\displaystyle u} and z {\displaystyle z} state solutions and measurements at sparse location Γ {\displaystyle \Gamma } , respectively. L f = ‖ f ‖ Γ {\displaystyle L_{f}=\Vert f\Vert _{\Gamma }} is the residual function. This second term requires the structured information represented by the partial differential equations to be satisfied in the training process. This strategy allows for discovering dynamic models described by nonlinear PDEs assembling computationally efficient and fully differentiable surrogate models that may find application in predictive forecasting, control, and data assimilation. == Extensions and applications == === For piece-wise function approximation === PINNs are unable to approximate PDEs that have strong non-linearity or sharp gradients (such as those that commonly occur in practical fluid flow problems). Piecewise approximation has been an old practic
Read more →
Acquisition of DirecTV by AT&T

AT&T Inc. announced an agreement with the DirecTV Group on May 18, 2014, to acquire the company for $48.5 billion in a joint cash-stock transaction and assumed debts of $18.6 billion for a total offer of $67.1 billion. Due to stalling growth in the wireless sector, AT&T began diversifying into mass media to expand its consumer offerings. After regulatory agencies approved the purchase on July 24, 2015, AT&T briefly became the largest Pay-TV provider. DirecTV was brought under AT&T's communication segment and DirecTV Now was launched on November 30, 2016, as an alternative to cord-cutting. In the years following the purchase, DirecTV lost millions of subscribers across its satellite and streaming services and by 2019, calls grew for AT&T to divest itself off the business. Initially, AT&T rejected these calls and defended the acquisition, but by February 2021, it reached a deal with TPG Inc. to transfer ownership of DirecTV. Under the terms of the agreement, AT&T would retain a 70% majority stake in DirecTV but would no longer oversee its daily operations. The deal was finalized by August 2, 2021, with AT&T receiving $7.1 billion. By July 3, 2025, AT&T sold its majority stake to TPG, ending any ties of involvement. == Background and Development == === AT&T's history === The company to bear the name "AT&T" was founded on March 3, 1885, as American Telephone and Telegraph Company (or AT&T Corporation) by Theodore Newton Vail as a long-distance subsidiary of the Bell Telephone Company. By December 1899, the Bell Telephone's assets were transferred to AT&T, with the latter gaining control of the Bell System, a regional network of local telecom companies. Theodore Vail became AT&T's President in 1907 and under his leadership, AT&T gained a monopoly over the telephone sector in the United States. This near century dominance earned AT&T the nickname of "Ma Bell." In 1974, the U.S. Department of Justice sued AT&T on accounts of antitrust violations. AT&T challenged the lawsuit, but in 1982, it reached a settlement with the DOJ to break apart its Bell System monopoly into seven regional companies. On January 1, 1984, the Bell System came to an end and led to a reshaped telecom industry. One of these regional companies, Southwestern Bell, emerged as the smallest, but after the passage of the 1996 Telecom Act, deregulated telecom rules allowed SBC to become a major telecom company. AT&T briefly became the largest cable and broadband company by the end of the 20th Century, but later deconsolidated to exit those industries. In 2005, SBC acquired its former parent, AT&T, and took on its branding as AT&T Inc, while retaining its previous business history. The newly reincorporated AT&T acquired BellSouth in 2006 and reconstituted much of its former Bell System. === DirecTV's history === == Acquisition Timeline == == Managing DirecTV == == Divestment and Spinoff ==
Read more →
Watcher Entertainment

Watcher Entertainment is an American digital media and entertainment company, founded by Steven Lim, Shane Madej, and Ryan Bergara. The channel features a variety of comedy, paranormal, gaming, cooking, and educational shows – typically hosted by Madej and Bergara. The Watcher main channel has over 400 million views and 2.9 million subscribers. The company launched their own streaming service, WatcherTV, in 2024. == History == === Buzzfeed and the creation of Watcher Entertainment (2019) === Madej, Bergara, and Lim met while working at the digital media company BuzzFeed. Madej and Bergara were co-hosts of the popular true crime and paranormal series Buzzfeed Unsolved and Lim was the creator and co-host of the popular internet food series Worth It. Both shows generated a combined 2 billion views with 15 billion minutes watched, making them two of the most successful shows on Buzzfeed. In 2019, Madej, Bergara, and Lim quit Buzzfeed as full-time employees. They each stayed on as contracted employees to complete their respective shows. The trio credited their departure to their desire to found a company with more "creative opportunities" and the ability to have "actual ownership of the content" made. The company is majority-owned by the trio. They received funding from Neuro, a caffeinated energy gum company; Boba Guys, a bubble-milk tea chain; and Steve Chen, a YouTube co-founder. Watcher Entertainment gained its name from the infamous true crime case of The Westfield Watcher, which Madej and Bergara had covered in a Buzzfeed Unsolved episode. The trio began the company as co-CEOs; however, Bergara and Madej stepped down from the role in 2023 to focus on content creation. === Watcher Entertainment (2020–present) === Watcher Entertainment was launched in January 2020. The company debuted with seven series and a weekly interactive talk show: Homemade, Grocery Run, Weird Wonderful World, Puppet History, Tourist Trapped, Top 5 Beatdown, Spooky Small Talk, and Watcher Weekly. The channel reached over 300,000 subscribers within the first month of launching. They were signed by talent agency CAA in the same year. Puppet History, a comedy educational game show, quickly became a success and gained a significant audience. The show, which stars Madej as a fluffy blue puppet, has spanned seven seasons and led to the creation of a variety of merchandise. It has featured a variety of guest stars on every episode, including other former Buzzfeed employees. The company premiered its first horror series in July 2020 with Are You Scared?. Following the end of Buzzfeed Unsolved: Supernatural in 2021, the studio premiered its highly anticipated successor, Ghost Files, just months after. The show followed a similar format, with Bergara and Madej investigating reportedly haunted locations and attempting to find evidence of the paranormal. The show had significant success, with critics noting the improved production value and design from its predecessor. In 2023, Bergara and Madej went on a tour across the United States to premiere episodes of the second season. The series was renewed for a third season, which they premiered with a United Kingdom tour in 2024. That year, Watcher premiered a light-hearted successor to the graphic Buzzfeed Unsolved: True Crime, with Mystery Files. In this rendition, Bergara or Madej present unusual crime or supernatural mysteries with a collection of theoretical solutions. The show was met with great success by audiences and was quickly renewed for a second season. Watcher launched a second channel, 'WatcherPodcasts,' in October 2023. The channel features podcasts hosted by Lim, Bergara, and Madej. On April 19, 2024, the company launched its Watcher streaming service. Going forward, all of their content would be released exclusively on the service and the company planned to transition away from YouTube. This announcement was met with overwhelmingly negative reactions from their fans, with many calling for the company to reverse the decision. Additionally, their YouTube channel lost over 50,000 subscribers in the day following the announcement. On April 22, 2024, the company issued an apology and changed their decision, stating that episodes would instead be released on the streaming service a month before their premiere on YouTube. In May 2025, the channel 'Andrew, Steven, and Adam' was launched as a subsidiary of Watcher with the release of the second season of Travel Season. Travel Season is a spiritual successor to Worth It with the same cast of Lim, Andrew Ilnyckyj, and Adam Bianchi. The channel focuses on food reviews and the behind of the scenes of making it. The main channel is now set to be focused primarily on horror, creepy, and paranormal content. == Channels and shows == === Watcher === ==== Current shows ==== Puppet History (2020–present) A whimsical puppet host walks through history's wildest tales as two guests compete for the title of history wizard. Making Watcher (2020–present) What happens when 3 creators with no business experience decide to make their own company? A multi-series documentary on the journey of creating Watcher Entertainment. Weird Wonderful World (2020–present) Curious pals Madej and Bergara explore lesser-known destinations and the fascinating subcultures within them. Too Many Spirits (2020–present) Bergara and Madej read and rate audience-submitted ghost stories, while getting progressively more tipsy drinking cocktails prepared by Steven and Ricky Wang. Top 5 Beatdown (2020–present) Bergara and Madej compare asinine top 5 lists with a topical expert, inspiring surprisingly heated debate. Are You Scared? (2020–2022, 2024–present) Bergara reads the internet's scariest stories (some true, some false) to his pal Madej as they try to figure out if the story is experienced or imagined. Ghost Files (2021–present) Bergara and Madej investigate haunted locations to discover whether something paranormal really lies within. Mystery Files (2023–present) Bergara and Madej present unusual crime or supernatural mysteries with a collection of theoretical solutions. Survival Mode (2023–present) Bergara and Madej play a variety of horror games and give a spooky review. ==== Former shows ==== Grocery Run (2020) Madej interviews a celeb on their typical grocery run, before returning to their home to help prepare their signature dish. Homemade (2020) Lim examines popular food by comparing an elevated restaurant experience vs. a home-cooked experience. Spooky Small Talk (2020) Bergara interviews celebs in a haunted house, exposing their fears and if they can manage it, a little about themselves too. Social Distancing D&D (2020) Socially Distance along with the motley gang of Watchers as they embark on a great quest of Dungeons and Dragons! Tourist Trapped (2020) Begara and Madej battle for tour guide supremacy, highlighting the two sides of a city, tourist attractions and hidden gems. Watcher Weekly (2020–2021) Lim, Bergara, and Madej chat the week's content and answer questions, with the occasional musical guest! Dish Granted (2021–2022) A show where host and amateur home cook Lim attempts to create the most extravagant dishes for his friends. Pretty Historic (2022) Selorm and guests explore beauty and fashion trends from history, try them, and decide whether the trends should remain in the past or come to the present. Worth a Shot (2022–2023) Take a seat at a Master Mixologist's bar as pro Ricky Wang crafts the unbelievable into a digestible drink for his guests. === Watcher Podcast === ==== Current shows ==== Get Scared with Shane, Ryan, and Steven (2023–2025) Previously named 'Pod Watcher' Madej, Bergara, and Lim host a weekly podcasts, exploring a variety of topics and answering viewer questions. Guests occasionally appear to replace one host. Matt Real serves as the producer and a fourth voice for the podcast. For Your Amusement (2023–present) Bergara explores a variety of topics surrounding theme parks. === Andrew, Steven, and Adam === Travel Season (2024–present) Lim reunites with Worth It costars Andrew Ilnyckyj and Adam Bianchi in a new food review show. == Awards and nominations ==
Read more →