AI For Students Writing

AI For Students Writing — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Large language model

    Large language model

    A large language model (LLM) is a neural network trained on a vast amount of text for natural language processing tasks, especially language generation. LLMs can typically generate, summarize, translate and analyze text in many contexts, and are a foundational technology behind modern chatbots. Biased or inaccurate training data can make an LLM's output less reliable. As of 2026, the most capable LLMs are based on transformer architectures, which, according to the 2017 paper "Attention Is All You Need", can be more efficient and parallelizable than earlier statistical and recurrent neural network models. Benchmark evaluations for LLMs attempt to measure model reasoning, factual accuracy, alignment, and safety. == History == Before the emergence of transformer-based models in 2017, some language models were considered large relative to the computational and data constraints of their time. In the early 1990s, IBM's statistical models pioneered word alignment techniques for machine translation, laying the groundwork for corpus-based language modeling. In 2001, a smoothed n-gram model, such as those employing Kneser–Ney smoothing, trained on 300 million words, achieved state-of-the-art perplexity on benchmark tests. During the 2000s, with the rise of widespread internet access, researchers began compiling massive text datasets from the web ("web as corpus") to train statistical language models. Moving beyond n-gram models, researchers started in 2000 to use neural networks as language models. Following the breakthrough of deep neural networks in image classification around 2012, similar architectures were adapted for language tasks. This shift was marked by the development of word embeddings (e.g., Word2Vec by Mikolov in 2013) and sequence-to-sequence (seq2seq) models using LSTM. In 2016, Google transitioned its translation service to neural machine translation (NMT), replacing statistical phrase-based models with deep recurrent neural networks. These early NMT systems used LSTM-based encoder-decoder architectures, as they preceded the invention of transformers. At the 2017 NeurIPS conference, Google researchers introduced the transformer architecture in their landmark paper "Attention Is All You Need". This paper's goal was to improve upon 2014 seq2seq technology, and was based mainly on the attention mechanism developed by Bahdanau et al. in 2014. The following year in 2018, BERT was introduced and quickly became "ubiquitous". Though the original transformer has both encoder and decoder blocks, BERT is an encoder-only model. Academic and research usage of BERT began to decline in 2023, following rapid improvements in the abilities of decoder-only models (such as GPT) to solve tasks via prompting. Although decoder-only GPT-1 was introduced in 2018, it was GPT-2 in 2019 that caught widespread attention because OpenAI claimed to have initially deemed it too powerful to release publicly, out of fear of malicious use. GPT-3 in 2020 went a step further and as of 2025 is available only via API with no offering of downloading the model to execute locally. But it was the consumer-facing chatbot ChatGPT in late 2022 that received extensive media coverage and public attention by 2023. The 2023 GPT-4 was praised for its increased accuracy and as a "holy grail" for its multimodal capabilities. OpenAI did not reveal the high-level architecture and the number of parameters of GPT-4. The release of ChatGPT led to an uptick in LLM usage across several research subfields of computer science, including robotics, software engineering, and societal impact work. In 2024, OpenAI released the reasoning model OpenAI o1, which generates long chains of thought before returning a final answer. Many LLMs with parameter counts comparable to those of OpenAI's GPT series have been developed. Since 2022, weights-available models have been gaining popularity, especially at first with BLOOM and LLaMA, though both have restrictions on usage and deployment. Mistral AI's open-weight models Mistral 7B and Mixtral 8x7B have a more permissive Apache License. In January 2025, DeepSeek released DeepSeek R1, a 671-billion-parameter open-weight model that performs comparably to OpenAI o1 but at a much lower price per token for users. Since 2023, many LLMs have been trained to be multimodal, having the ability to also process or generate other types of data, such as images, audio, or 3D meshes. Open-weight LLMs have become more influential since 2023. Per Vake et al. (2025), community-driven contributions to open-weight models improve their efficiency and performance via collaborative platforms such as Hugging Face. == Dataset preprocessing == === Tokenization === As machine learning algorithms process numbers rather than text, the text must be converted to numbers. In the first step, a vocabulary is decided upon, then integer indices are arbitrarily but uniquely assigned to each vocabulary entry, and finally, an embedding is associated with the integer index. Algorithms include byte-pair encoding (BPE) and WordPiece. There are also special tokens serving as control characters, such as [MASK] for masked-out token (as used in BERT), and [UNK] ("unknown") for characters not appearing in the vocabulary. Also, some special symbols are used to denote special text formatting. For example, "Ġ" denotes a preceding whitespace in RoBERTa and GPT and "##" denotes continuation of a preceding word in BERT. For example, the BPE tokenizer used by the legacy version of GPT-3 would split tokenizer: texts -> series of numerical "tokens" as Tokenization also compresses the datasets. Because LLMs generally require input to be an array that is not jagged, the shorter texts must be "padded" until they match the length of the longest one. ==== Byte-pair encoding ==== As an example, consider a tokenizer based on byte-pair encoding. In the first step, all unique characters (including blanks and punctuation marks) are treated as an initial set of n-grams (i.e. initial set of uni-grams). Successively the most frequent pair of adjacent characters is merged into a bi-gram and all instances of the pair are replaced by it. All occurrences of adjacent pairs of (previously merged) n-grams that most frequently occur together are then again merged into even lengthier n-gram, until a vocabulary of prescribed size is obtained. After a tokenizer is trained, any text can be tokenized by it, as long as it does not contain characters not appearing in the initial-set of uni-grams. === Dataset cleaning === In the context of training LLMs, datasets are typically cleaned by removing low-quality, duplicated, or toxic data. Cleaned datasets can increase training efficiency and lead to improved downstream performance. A trained LLM can be used to clean datasets for training a further LLM. With the increasing proportion of LLM-generated content on the web, data cleaning in the future may include filtering out such content. LLM-generated content can pose a problem if the content is similar to human text (making filtering difficult) but of lower quality (degrading performance of models trained on it). === Synthetic data === Training of largest language models might need more linguistic data than naturally available, or that the naturally occurring data is of insufficient quality. In these cases, synthetic data might be used. == Training == An LLM is a type of foundation model (large X model) trained on language. LLMs can be trained in different ways. In particular, GPT models are first pretrained to predict the next word on a large amount of data, before being fine-tuned. === Cost === Substantial infrastructure is necessary for training the largest models. The tendency towards larger models is visible in the list of large language models. For example, the training of GPT-2 (i.e. a 1.5-billion-parameter model) in 2019 cost $50,000, while training of the PaLM (i.e. a 540-billion-parameter model) in 2022 cost $8 million, and Megatron-Turing NLG 530B (in 2021) cost around $11 million. The qualifier "large" in "large language model" is inherently vague, as there is no definitive threshold for the number of parameters required to qualify as "large". === Fine-tuning === Before being fine-tuned, most LLMs are next-token predictors. The fine-tuning shapes the LLM's behavior via techniques like reinforcement learning from human feedback (RLHF) or constitutional AI. Instruction fine-tuning is a form of supervised learning used to teach LLMs to follow user instructions. In 2022, OpenAI demonstrated InstructGPT, a version of GPT-3 similarly fine-tuned to follow instructions. Reinforcement learning from human feedback (RLHF) involves training a reward model to predict which text humans prefer. Then, the LLM can be fine-tuned through reinforcement learning to better satisfy this reward model. Since humans typically prefer truthful, helpful and harmless answers, RLHF favors such answers. == Architecture == LLMs are generally based on the tra

    Read more →
  • Webedia

    Webedia

    Webedia S.A. is a company specializing in online media, a subsidiary of the Fimalac group based in Levallois-Perret, France. Webedia is active in more than twenty countries including France (AlloCiné, Jeuxvideo.com, MGG, Puremédias, Ode, Pureshopping, Volum, Terrafemina, 750g, easyVoyage, l’Automobile Magazine, Le 10 Sport), Brazil (AdoroCinema, Tudo Gostoso, Minhavida), Germany (Filmstarts, Moviepilot, GameStar), Spain and Latin America (Xataka, SensaCine, Raiser Games), Poland (Gry-Online and GetHero) and the United States (Boxoffice Pro). == History == === Early years (2007-2013) === Webedia was created in France in 2007, following the successive launches of the websites Purepeople, Puretrend and Purefans. Webedia bought the comparison shopping website Shopoon in 2008 and renamed it Pureshopping, and the website Ozap (media news) from M6 group in 2011 and renamed it Puremédias. Webedia was acquired by Fimalac in May 2013 and became its Internet media subsidiary. === Growth (2013-2016) === In 2013, Fimalac acquired AlloCiné, the websites Newsring and Youmag, the cooking website 750g and the cultural platform Exponaute. In 2014, Webedia acquired OverBlog, Jeuxvideo.com (through L'Odyssée Interactive and moved to Paris in 2015), Moviepilot (Germany), and Gameo Consulting (owner of Millenium, electronic sports), In December 2014, Webedia announced a license agreement with Ziff Davis to launch sites under the IGN franchise in Brazil and France at the beginning of 2015. The French version of IGN was launched on 2, it targets the general public and casual gamers. In 2015, Webedia acquired Côté Ciné Group (technological solutions for movie theaters and specialized press magazines: BoxOffice Pro in the United States and Côté Ciné in France), 57% of Easyvoyage group (online travel comparators Easyvol and Alibabuy, Mixicom (website JeuxActu and multi-channel network), 50% of the Brazilian network Paramaker, and West World Media (digital marketing company for the film industry). In 2016, Webedia bought Scimob (mobile video game studio), Surprizemi (home-delivered surprise boxes), Eklablog (blogging platform) Oxent (eSports World Convention), and Bang Bang Management (sports PR agency). In addition, an agreement is made with Paris Saint-Germain for Webedia to recruit and manage e-sports players on behalf of Paris Saint-Germain eSports. On November 15, 2016, the LFP announced that it had reached an agreement with beIN Sports and Webedia for the broadcasting of the first edition of the e-League 1. The competition is renewed for two additional seasons on July 26, 2017, the broadcasting agreements are renewed. On December 8, 2016, Webedia joined forces with Chronopost to launch Pourdebon, a home delivery service that connects Internet users and labeled producers (AOC, organic AB, etc.). Webedia has a slight majority (53%) in this new platform. === 2017 === On January 19, 2017, Webedia announced the acquisition of the English company Peach Digital, specializing in web development and digital marketing for movie theaters. In February 2017, Le Figaro announced that Webedia had invested 10 million euros in Illico Fresco, a home delivery service for baskets of recipes. The same month, FDJ and Webedia announced a partnership for the creation of eSports competitions: a professional one (FDJ Masters League) and another one for amateur gamers (FDJ Open Series) starting in March 2017. They are broadcast on Webedia's Web TV. At the end of February 2017, the media group finalized the acquisition of MyPoseo, a SaaS publisher specialized on SEO analytics. On March 8, 2017, Webedia launched LeStream, a Twitch Web TV dedicated to video games, the result of two years of development, in the company of several YouTubers including Cyprien and Squeezie,. On March 29, 2017, Webedia bought the Brazilian web publisher Minha Vida, a website devoted to health, nutrition, beauty and fitness, which attracts 14.3 million unique monthly visitors. Webedia reaches 44 million unique visitors in Brazil, and thus becomes the leading publisher on entertainment themes. In June 2017, the company made its largest international acquisition, with the American agency 3BlackDot, a media and marketing agency focused on videogamers. The agency, based in Los Angeles, manages 36 YouTubers followed by millions of subscribers on their channels which total 700 million videos viewed per month. In July 2017, Webedia bought IDZ, an audiovisual production company, and thus strengthened its production activities and its leadership on the YouTube channel networks in France. That year, Webedia was the first French media group to use the measurement of their global audiences by Comscore. It represents deduplicated coverage on desktops, laptops, smartphones and tablets, and includes audiences for websites, mobile applications and videos. This new measure allows Webedia to establish a deduplicated global audience of 177 million unique visitors in April 2017. In October 2017, Webedia announced its intention to launch a TV channel dedicated to electronic sports, called ES1. The channel was officially launched on January 10, 2018, on Orange TV and on February 6, 2018, on Free and Bouygues Telecom. In November 2017, Webedia, with the support of CDC International Capital, entered into exclusive negotiations with the Saudi company Uturn Entertainment, specializing in online entertainment, particularly on YouTube, and the production of digital content for the region's youth, with a view to merging it with Diwanee, a Webedia subsidiary in the Middle East, for an amount close to $100 million. In December 2017, Webedia acquired a majority stake in the United States–based company called Creators Media, which brings together social and video production platforms specializing in popular culture and entertainment. That same month, Webedia joined forces with Elephant, Emmanuel Chain's audiovisual production company, to create a new content production label aimed at Millennials. === 2018-2019 === In January 2018, Webedia launched a sports marketing agency: Only Sports & Passions. That same month, Illico Fresco, specialist in the delivery of kit meals belonging to Webedia, joined forces with Weight Watchers, the world leader in slimming products. In April 2018, Webedia published new audience figures in partnership with Comscore, 188 million unique monthly visitors in December 2017, an increase of 6.2% compared to the previous measure dating from April 2017. The same month, Webedia unveils its ambitions concerning content production, as a partnership with the video game studio Focus Home Interactive is signed with a title "Fear the Wolves" already planned for 2018, co-production projects of films, cartoons or series are announced. In July 2018, Webedia bought the American authors company Full Fathom Five, a company that helps authors produce books, TV series, films and video games. In October 2018, Webedia announced that it was focusing on both esports clubs PSG Esports and LeStream Esport. The first one being geared towards international competitions and the second devoted mainly to the French esports scene. The "Millenium" brand is thus refocusing around its media activities and esports merchandising products, and the "Millenium esport club" being gradually closed. The same month, the company announced the acquisition of Weblogs, a Spanish-speaking website publisher, thereby strengthening its activity in Spain and Latin America. On October 22, 2018, Webedia announced the merger of BoxOffice magazine with Film Journal International. On November 13, 2018, Groupe SEB announced the acquisition from Webedia of 750g International, the international branch of the French recipe site 750g (the original French website 750g.com being retained by Webedia). The group is thus separating from Gourmandize (United States and United Kingdom), HeimGourmet (Germany), Rebañando (Spain), Receitas Sem Fronteiras (Brazil / Portugal) and Tribù Golosa (Italy). The same month, Webedia joined forces with Riot Games to launch the French League of League of Legends (LFL), the first French professional league on the League of Legends game, which will bring together the 8 best teams on the French scene. In March 2019, Webedia bought 51% of the audiovisual production company Elephant. The new set will weigh 500 million euros, a quarter of which will be made outside France. The same month, Webedia purchased a majority stake in the company Partoo, which publishes a SaaS platform specializing in local marketing for brands and merchants. On March 14, 2019, a new measurement of the international audience of Webedia sites was produced by Comscore, posting 250 million unique visitors in December 2018, up 9.2% compared to December 2017. In June 2019, the group joined forces with Michel Cymes, a famous doctor and French TV host by taking a majority stake in his company Club Santé Débat, in order to develop a health platform around the Dr. Good! Brand. In Sep

    Read more →
  • Bluelight (web forum)

    Bluelight (web forum)

    Bluelight is a web-forum, research portal, online community, and non-profit organisation dedicated to harm reduction in drug use. Its userbase includes current and former substance users, academic researchers, drug policy activists, and mental health advocates. It is believed to be the largest online international drug discussion website in the world. As of November 2025, the website claims over 475,900 registered members, the Discord community claims over 11,900 members, and additional members utilise other platforms such as Telegram. Bluelight has been utilised by academic researchers as a primary source of data in numerous publications. Researchers also utilise the site to advertise research studies, recruit study participants, and better understand the world of substance use. Research groups and organisations that have partnered with Bluelight to recruit study participants include Imperial College London, Johns Hopkins University, Health Canada, Karlstad University, Curtin University, Macquarie University, Columbia University, University of Pennsylvania, University of Michigan, Toronto Metropolitan University (then known as Ryerson University), and MAPS. Researchers have found that the most common reasons for substance users to visit Bluelight.org and similar online communities are to learn "how to use drugs safely" and "how to help others use drugs safely." Bluelight neither condemns or condones drug use, instead advocating for the principle of responsible drug use; educating and allowing individuals to make informed decisions regarding their drug use, providing information on local drug misuse services, and providing them with other drug harm reduction resources and public safety notices. == History == Bluelight.org was originally formed in 1997 as a message board on bluelight.net called the MDMA Clearinghouse. The board was created as a side project by the owner of West Palm Beach design company Bluelight Designs. 200–300 users joined the site between 1998 and 1999, but the site's servers were heavily limited and could only store a few threads at a time; this led to the creation of 'The New Bluelight' forum in May 1999 and the registration of the bluelight.nu domain in June 1999. The site began to explode in popularity in the early 2000s with the rise of MDMA in the club scene, amassing nearly 7,000 members by the year 2000 and 59,000 by the start of 2006. The site switched to the bluelight.ru domain in October 2005, and switched again to bluelight.org in January 2014. In early 2024, Bluelight was re-structured and the forum became a subsidiary of the newly formed Australian non-profit organisation & registered charity Bluelight Communities Ltd. == Partnerships == In the early 2000s, Bluelight worked with reagent test supplier EZ-Test to promote the sale of drug checking kits. In 2007, Bluelight partnered with the Multidisciplinary Association for Psychedelic Studies (MAPS), a non-profit organisation working to raise awareness and understanding of psychedelic drugs through education, clinical research, and advocacy. MAPS utilised Bluelight to recruit participants for its first MDMA-assisted psychotherapy trial for PTSD. In 2013, the official MAPS forums were migrated to Bluelight. Bluelight's other partners include Erowid, a non-profit organisation dedicated to education surrounding psychoactive drugs; TripSit, a harm reduction education website; Pill Reports, a web-based database for drug checking results that was initially formed as an offshoot of the site; and the Global Drug Survey, an independent research organisation focused on collecting data about substance use. == Notable users == Alan Woods – funded the site's maintenance costs from 1999 until his death in 2008 Hamilton Morris John McAfee – created an infamous series of troll posts about the stimulant MDPV

    Read more →
  • Digital zombie

    Digital zombie

    A digital zombie is a person so engaged with digital technology or social media they are unable to separate themselves from a persistent online presence. Writing in 2017, University of Sydney researcher Andrew Campbell expressed concerns over whether or not the individual can truly live a full and healthy life while they are preoccupied with the digital world. Other individuals have also begun referencing certain types of behaviour with being a digital zombie. Stefanie Valentic, managing editor of EHS Today, refers to it as people hunting digital creatures through their smartphones in public spaces, always fixed on their phones. The University of Warwick has used the term to argue that further research needs to be done with people who exist in digital form after death to help people grieve their loss. == Modern applications == === Distracted walking === The term digital zombie can refer to a person performing distracted walking, which has been labelled dangerous by the American Academy of Orthopaedic Surgeons. They created the "Digital Deadwalkers" campaign after physicians became aware of the risks associated with walking across intersections and sidewalks while paying attention only to smartphones and not one's surroundings. Also stating that the name is derived from the fact that "they're oblivious to everyone else, so it's like they're dead-walking, sleepwalking." === Living through media === The Department of Sociology, University of Warwick has also identified the term, digital zombie, to refer to an individual who has died but is digitally resurrected, reanimated and socially active. These digital zombies do things in death they did not do when they were alive as they "live" again through a digital self on a digital medium. Dead celebrities sometimes become digital zombies when they are reanimated to appear in commercial advertisements (such as Audrey Hepburn and Bob Monkhouse). Other accidental digital zombies include Tupac Shakur and Michael Jackson who were both digitally resurrected and recreated to perform "live" on stage years after their death. Researchers at the University of Warwick have carried out research into the area of human-computer interaction. in an effort to understand the affect these digital zombies have on grief and bereavement. === Mobile gaming === Writer for EHS Today, Stefanie Valentic, has made observations with the mobile phone video game Pokémon Go, which offers players the experience to hunt and collect digital creatures called Pokémon through their smartphone in real world. Players can be observed simultaneously gazing at their phone while also obliviously walking around their environments looking for Pokémon. Stefanie references these individuals as "digital zombies" since they walk around with no cognition of their surroundings while engaged with their phone. == Health risks == === Heavy use of technology === Research by the University of Sydney has begun looking at how new technology such as digital media and smartphones impact our lives and questioning whether they can create new compulsions and obsessions. The research demonstrates that increased heavy technological use can have negative health consequences similar to drugs, smoking, and alcohol. Marcel O'Gorman, an associate professor of English at the University of Waterloo, has commented on the body of research examining how technology impacts cognition, stating currently that there is no empirical evidence to support any theories that suggest that technology can damage memory and attention span. === Heightened risk to children === Manfred Spitzer, a German psychiatrist, has raised concerns with providing digital devices to children. During the early childhood stage while their brains are rapidly growing, increased exposure to digital devices may deprive them of necessary development required to facilitate brain growth. These concerns are also shared by Korean doctors who believe giving digital devices, like smartphones to children, limits their cognitive development.

    Read more →
  • Purged cross-validation

    Purged cross-validation

    Purged cross-validation is a variant of k-fold cross-validation designed to prevent look-ahead bias in time series and other structured data, developed in 2017 by Marcos López de Prado at Guggenheim Partners and Cornell University. It is primarily used in financial machine learning to ensure the independence of training and testing samples when labels depend on future events. It provides an alternative to conventional cross-validation and walk-forward backtesting methods, which often yield overly optimistic performance estimates due to information leakage and overfitting. == Motivation == Standard cross-validation assumes that observations are independently and identically distributed (IID), which often does not hold in time series or financial datasets. If the label of a test sample overlaps in time with the features or labels in the training set, the result may be data leakage and overfitting. Purged cross-validation addresses this issue by removing overlapping observations and, optionally, adding a temporal buffer ("embargo") around the test set to further reduce the risk of leakage. The figure below illustrates standard 5 Fold Cross-Validation == Purging == Purging removes from the training set any observation whose timestamp falls within the time range of formation of a label in the test set. This can be the case for train set observations before and after the test set. Their removal ensures that the algorithm cannot learn during train time information that will be used to assess the performance of the algorithm. See the figure below for an illustration of purging. == Embargoing == Embargoing addresses a more subtle form of leakage: even if an observation does not directly overlap the test set, it may still be affected by test events due to market reaction lag or downstream dependencies. To guard against this, a percentage-based embargo is imposed after each test fold. For example, with a 5% embargo and 1000 observations, the 50 observations following each test fold are excluded from training. Unlike purging, embargoing can only occur after the test set. The figure below illustrates the application of embargo: == Applications == Purged and embargoed cross-validation has been useful in: Backtesting of trading strategies Validation of classifiers on labeled event-driven returns Any machine learning task with overlapping label horizons == Example == To illustrate the effect of purging and embargoing, consider the figures below. Both diagrams show the structure of 5-fold cross-validation over a 20-day period. In each row, blue squares indicate training samples and red squares denote test samples. Each label is defined based on the value of the next two observations, hence creating an overlap. If this overlap is left untreated, test set information leaks into the train set. The second figure applies the Purged CV procedure. Notice how purging removes overlapping observations from the training set and the embargo widens the gap between test and training data. This approach ensures that the evaluation more closely resembles a true out-of-sample test and reduces the risk of backtest overfitting. == Combinatorial Purged Cross-Validation == Walk-forward backtesting analysis, another common cross-validation technique in finance, preserves temporal order but evaluates the model on a single sequence of test sets. This leads to high variance in performance estimation, as results are contingent on a specific historical path. Combinatorial Purged Cross-Validation (CPCV) addresses this limitation by systematically constructing multiple train-test splits, purging overlapping samples, and enforcing an embargo period to prevent information leakage. The result is a distribution of out-of-sample performance estimates, enabling robust statistical inference and more realistic assessment of a model's predictive power. === Methodology === CPCV divides a time-series dataset into N sequential, non-overlapping groups. These groups preserve the temporal order of observations. Then, all combinations of k groups (where k < N) are selected as test sets, with the remaining N − k groups used for training. For each combination, the model is trained and evaluated under strict controls to prevent leakage. To eliminate potential contamination between training and test sets, CPCV introduces two additional mechanisms: Purging: Any training observations whose label horizon overlaps with the test period are excluded. This ensures that future information does not influence model training. Embargoing: After the end of each test period, a fixed number of observations (typically a small percentage) are removed from the training set. This prevents leakage due to delayed market reactions or auto-correlated features. Each data point appears in multiple test sets across different combinations. Because test groups are drawn combinatorially, this process produces multiple backtest "paths," each of which simulates a plausible market scenario. From these paths, practitioners can compute a distribution of performance statistics such as the Sharpe ratio, drawdown, or classification accuracy. === Formal definition === Let N be the number of sequential groups into which the dataset is divided, and let k be the number of groups selected as the test set for each split. Then: The number of unique train-test combinations is given by the binomial coefficient: ( N k ) {\displaystyle {\binom {N}{k}}} Each observation is used in k {\displaystyle k} test sets and contributes to φ [ N , k ] {\displaystyle \varphi [N,k]} unique backtest paths: φ [ N , k ] = k N ( N k ) {\displaystyle \varphi [N,k]={\frac {k}{N}}{\binom {N}{k}}} This yields a distribution of performance metrics rather than a single point estimate, making it possible to apply Monte Carlo-based or probabilistic techniques to assess model robustness. === Illustrative example === Consider the case where N = 6 and k = 2. The number of possible test set combinations is ( 6 2 ) = 15 {\displaystyle {\binom {6}{2}}=15} . Each of the six groups appears in five test splits. Consequently, five distinct backtest paths can be constructed, each incorporating one appearance from every group. ==== Test group assignment matrix ==== This table shows the 15 test combinations. An "x" indicates that the corresponding group is included in the test set for that split. ==== Backtest path assignment ==== Each group contributes to five different backtest paths. The number in each cell indicates the path to which the group's result is assigned for that split. === Advantages === Combinatorial Purged Cross-Validation offers several key benefits over conventional methods: It produces a distribution of performance metrics, enabling more rigorous statistical inference. The method systematically eliminates lookahead bias through purging and embargoing. By simulating multiple historical scenarios, it reduces the dependence on any single market regime or realization. It supports high-confidence comparisons between competing models or strategies. CPCV is commonly used in quantitative strategy research, especially for evaluating predictive models such as classifiers, regressors, and portfolio optimizers. It has been applied to estimate realistic Sharpe ratios, assess the risk of overfitting, and support the use of statistical tools such as the Deflated Sharpe Ratio (DSR). === Limitations === The main limitation of CPCV stems from its high computational cost. However, this cost can be managed by sampling a finite number of splits from the space of all possible combinations.

    Read more →
  • Svelte

    Svelte

    Svelte is a free and open-source component-based front-end software framework and language created by Rich Harris and maintained by the Svelte core team. Svelte is not a monolithic JavaScript library imported by applications: instead, Svelte compiles HTML templates to specialized code that manipulates the DOM directly, which may reduce the size of transferred files and give better client performance. Application code is also processed by the compiler, inserting calls to automatically recompute data and re-render UI elements when the data they depend on is modified. This also avoids the overhead associated with runtime intermediate representations, such as virtual DOM, unlike traditional frameworks (such as React and Vue) which carry out the bulk of their work at runtime, i.e. in the browser. The compiler itself is written in JavaScript. Its source code is licensed under MIT License and hosted on GitHub. Among comparable frontend libraries, Svelte has one of the smallest bundle footprints at merely 2KB. == History == The predecessor of Svelte is Ractive.js, which Rich Harris created in 2013. Version 1 of Svelte was written in JavaScript and was released on 29 November 2016. The name Svelte was chosen by Rich Harris and his coworkers at The Guardian. Version 2 of Svelte was released on 19 April 2018. It set out to correct what the maintainers viewed as mistakes in the earlier version such as replacing double curly braces with single curly braces. Version 3 of Svelte was written in TypeScript and was released on 21 April 2019. It rethought reactivity by using the compiler to instrument assignments behind the scenes. The SvelteKit web framework was announced in October 2020 and entered beta in March 2021. SvelteKit 1.0 was released in December 2022 after two years in development. Version 4 of Svelte was released on 22 June 2023. It was a maintenance release, smaller and faster than version 3. A part of this release was an internal rewrite from TypeScript back to JavaScript, with JSDoc type annotations. This was met with confusion from the developer community, which was addressed by the creator of Svelte, Rich Harris. Version 5 of Svelte was released on October 19, 2024 at Svelte Summit Fall 2024 with Rich Harris cutting the release live while joined by other Svelte maintainers. Svelte 5 was a ground-up rewrite of Svelte, changing core concepts such as reactivity and reusability. Its primary feature, runes, reworked how reactive state is declared and used. Runes are function-like macros that are used to declare a reactive state, or code that uses reactive states. These runes are used by the compiler to indicate values that may change and are depended on by other states or the DOM. Svelte 5 also introduces Snippets, which are reusable "snippets" of code that are defined once and can be reused anywhere else in the component. Svelte 5 was initially met with controversy due to its many changes, and thus deprecations caused primarily by runes. However, most of this has subsided since the initial announcement of runes, and the further refining of Svelte 5. Also at Svelte Summit Fall 2024, Ben McCann announced the Svelte CLI under the sv package name on npm. In early 2025, the Svelte team announced Asynchronous Svelte, an experimental feature set centered around asynchronous reactivity in Svelte using await expressions. As of August 2025, the feature is available via an experimental compiler option. This coincided with the experimental release of remote functions, an RPC feature in SvelteKit, Svelte's metaframework. Key early contributors to Svelte became involved with Conduitry joining with the release of Svelte 1, Tan Li Hau joining in 2019, and Ben McCann joining in 2020. Rich Harris and Simon Holthausen joined Vercel to work on Svelte fulltime in 2022. Dominic Gannaway joined Vercel from the React core team to work on Svelte fulltime in 2023. == Syntax == Svelte applications and components are defined in .svelte files, which are HTML files extended with templating syntax that is based on JavaScript and is similar to JSX. Svelte's core features are accessed through runes, which syntactically look like functions, but are used as macros by the compiler. These runes include: The $state rune, used for declaring a reactive state value The $derived rune, used for declaring reactive state derived from one or more states The $effect rune, used for declaring code that reruns whenever its dependencies change Starting with Svelte 5, the framework introduced a significant reactivity overhaul that replaces the previous `$:` reactive declarations with new runes such as $state, $derived, and $effect. The $effect rune is now used for post-render operations without modifying state, while $derived is used for computations that depend on other reactive values. This change aims to simplify the mental model of reactivity and make component logic more explicit. Additionally, the { JavaScript code } syntax can be used for templating in HTML elements and components, similar to template literals in JavaScript. This syntax can also be used in element attributes for uses such as two-way data binding, event listeners, and CSS styling. A Todo List example made in Svelte is below: == Associated projects == The Svelte maintainers created SvelteKit as the official way to build projects with Svelte. It is a Next.js/Nuxt-style full-stack framework that dramatically reduces the amount of code that gets sent to the browser. The maintainers had previously created Sapper, which was the predecessor of SvelteKit. The Svelte maintainers also maintain a number of integrations for popular software projects under the Svelte organization including integrations for Vite, Rollup, Webpack, TypeScript, VS Code, Chrome Developer Tools, ESLint, and Prettier. A number of external projects such as Storybook have also created integrations with Svelte and SvelteKit. == Influence == Vue.js modeled its API and single-file components after Ractive.js, the predecessor of Svelte. == Adoption == Svelte is widely praised by developers. Taking the top ranking in multiple large scale developer surveys, it was chosen as the Stack Overflow 2021 most loved web framework and 2020 State of JS frontend framework with the most satisfied developers. Recent surveys continue to show Svelte's strong developer satisfaction, with the 2024 State of JS survey maintaining its position among the most praised frontend frameworks. The 2024 Stack Overflow Developer Survey reported that 73% of developers who used Svelte want to continue working with it, and noted that Stack Overflow's own team used Svelte for building their 2024 Developer Survey results site. Svelte has been adopted by a number of high-profile web companies including The New York Times, Google, Apple, Spotify, Radio France, Square, Yahoo, ByteDance, Rakuten, Bloomberg, Reuters, Ikea, Facebook, Logitech, and Brave. A community group of primarily non-maintainers, known as the Svelte Society, run the Svelte Summit conference, write a Svelte newsletter, host a Svelte podcast, and host a directory of Svelte tooling, components, and templates.

    Read more →
  • Digital zombie

    Digital zombie

    A digital zombie is a person so engaged with digital technology or social media they are unable to separate themselves from a persistent online presence. Writing in 2017, University of Sydney researcher Andrew Campbell expressed concerns over whether or not the individual can truly live a full and healthy life while they are preoccupied with the digital world. Other individuals have also begun referencing certain types of behaviour with being a digital zombie. Stefanie Valentic, managing editor of EHS Today, refers to it as people hunting digital creatures through their smartphones in public spaces, always fixed on their phones. The University of Warwick has used the term to argue that further research needs to be done with people who exist in digital form after death to help people grieve their loss. == Modern applications == === Distracted walking === The term digital zombie can refer to a person performing distracted walking, which has been labelled dangerous by the American Academy of Orthopaedic Surgeons. They created the "Digital Deadwalkers" campaign after physicians became aware of the risks associated with walking across intersections and sidewalks while paying attention only to smartphones and not one's surroundings. Also stating that the name is derived from the fact that "they're oblivious to everyone else, so it's like they're dead-walking, sleepwalking." === Living through media === The Department of Sociology, University of Warwick has also identified the term, digital zombie, to refer to an individual who has died but is digitally resurrected, reanimated and socially active. These digital zombies do things in death they did not do when they were alive as they "live" again through a digital self on a digital medium. Dead celebrities sometimes become digital zombies when they are reanimated to appear in commercial advertisements (such as Audrey Hepburn and Bob Monkhouse). Other accidental digital zombies include Tupac Shakur and Michael Jackson who were both digitally resurrected and recreated to perform "live" on stage years after their death. Researchers at the University of Warwick have carried out research into the area of human-computer interaction. in an effort to understand the affect these digital zombies have on grief and bereavement. === Mobile gaming === Writer for EHS Today, Stefanie Valentic, has made observations with the mobile phone video game Pokémon Go, which offers players the experience to hunt and collect digital creatures called Pokémon through their smartphone in real world. Players can be observed simultaneously gazing at their phone while also obliviously walking around their environments looking for Pokémon. Stefanie references these individuals as "digital zombies" since they walk around with no cognition of their surroundings while engaged with their phone. == Health risks == === Heavy use of technology === Research by the University of Sydney has begun looking at how new technology such as digital media and smartphones impact our lives and questioning whether they can create new compulsions and obsessions. The research demonstrates that increased heavy technological use can have negative health consequences similar to drugs, smoking, and alcohol. Marcel O'Gorman, an associate professor of English at the University of Waterloo, has commented on the body of research examining how technology impacts cognition, stating currently that there is no empirical evidence to support any theories that suggest that technology can damage memory and attention span. === Heightened risk to children === Manfred Spitzer, a German psychiatrist, has raised concerns with providing digital devices to children. During the early childhood stage while their brains are rapidly growing, increased exposure to digital devices may deprive them of necessary development required to facilitate brain growth. These concerns are also shared by Korean doctors who believe giving digital devices, like smartphones to children, limits their cognitive development.

    Read more →
  • Packingham v. North Carolina

    Packingham v. North Carolina

    Packingham v. North Carolina, 582 U.S. 98 (2017), is a case in which the Supreme Court of the United States held that a North Carolina statute that prohibited registered sex offenders from using social media websites was unconstitutional because it violated the First Amendment to the U.S. Constitution, which protects freedom of speech. In 2010, Lester Gerard Packingham, a registered sex offender, posted on Facebook under a pseudonym to comment favorably on a recent traffic court experience. Police then identified Packingham and charged him with violating North Carolina's law. Packingham moved to dismiss the charges, arguing that the state's law violated the First Amendment. The trial court dismissed this motion and ultimately convicted Packingham. A state appellate court initially reversed the trial court, holding that the law did violate the First Amendment, but the North Carolina Supreme Court, the state's highest court, disagreed and reinstated the conviction. In June 2017, the U.S. Supreme Court unanimously reversed the North Carolina Supreme Court's judgment. In the majority opinion authored by Justice Anthony Kennedy, the Court held that social media—defined broadly to include Facebook, Amazon.com, The Washington Post, and WebMD, among many others—is a "protected space" under the First Amendment for lawful speech. The Court offered that North Carolina could protect children through less restrictive means, such as prohibiting "conduct that often presages a sexual crime, like contacting a minor or using a website to gather information about a minor". == Background == === North Carolina statute === In 2008, the state of North Carolina passed a law that made it a felony for a registered sex offender "to access a commercial social networking Web site where the sex offender knows that the site permits minor children to become members or to create or maintain personal Web pages". The law defined a "commercial social networking Web site" using four criteria. Specifically, the website must: be "operated by a person who derives revenue from membership fees, advertising, or other sources related to the operation of the Web site". facilitate "the social introduction between two or more persons for the purposes of friendship, meeting other persons, or information exchanges". allow "users to create Web pages or personal profiles that contain information such as the name or nickname of the user, photographs placed on the personal Web page by the user, other personal information about the user, and links to other personal Web pages on the commercial social networking Web site of friends or associates of the user that may be accessed by other users or visitors to the Web site". provide "users or visitors... mechanisms to communicate with other users, such as a message board, chat room, electronic mail, or instant messenger". The law exempted websites that "Provid[e] only one of the following discrete services: photo-sharing, electronic mail, instant messenger, or chat room or message board platform", as well as websites that have as their primary purpose "the facilitation of commercial transactions involving goods or services between [their] members or visitors". === Facts of the case === In 2002, Lester Gerard Packingham was convicted of taking "indecent liberties with a child", a felony that required him to register as a sex offender. A North Carolina court sentenced him to 10–12 months in prison with 24 months of supervised release. He was given no other special instructions on his behavior outside of prison other than to "remain away from" the minor. In 2010, after a state court dismissed a traffic ticket against Packingham, he submitted a post on Facebook under the name "J. R. Gerrard", stating: "Man God is Good! How about I got so much favor they dismissed the ticket before court even started? No fine, no court cost, no nothing spent. . . . . .Praise be to GOD, WOW! Thanks JESUS!" The Durham Police Department identified Packingham as the author of the post after cross-checking the time of the post with recently dismissed traffic tickets, and a grand jury indicted him for violating the North Carolina statute. === Lower court proceedings === Initially, Packingham moved to dismiss his indictment, arguing that it violated the First Amendment. A North Carolina Superior Court judge denied this motion, and he was convicted of violating the North Carolina social media law. Packingham appealed his conviction to the North Carolina Court of Appeals, which reversed the trial court's decision in 2013. Applying intermediate scrutiny, the court of appeals determined that North Carolina's law violated the First Amendment because it was too broad, applying to all registered sex offenders regardless of whether the offender had committed a crime involving a minor or whether the offender was a continuing threat to minors. The appeals court also stated that the law had been defined broadly enough to prohibit a registered sex offender from conducting a wide array of Internet activity, such as "conducting a 'Google' search, purchasing items on Amazon.com, or accessing a plethora of Web sites unrelated to online communication with minors". In 2015, the North Carolina Supreme Court, the state's highest court, reversed the court of appeals, holding that the law was "constitutional in all respects". The North Carolina Supreme Court found that the statute was a "limitation on conduct" and did not impede any free speech. The state had a vested interest in “forestalling the illicit lurking and contact of minors” by registered sex offenders and potential future victims, and upheld Packingham's conviction. == Supreme Court ruling == Packingham filed a petition for a writ of certiorari with the Supreme Court of the United States. The federal government also filed a brief recommending that the Supreme Court grant certiorari, arguing that the North Carolina Supreme Court incorrectly decided the case in favor of the state. The U.S. Supreme Court granted certiorari in October 2016. Amicus briefs in support of Packingham were filed by the libertarian Cato Institute and the American Civil Liberties Union. The North Carolina Supreme Court filed a brief supporting its prior decision, urging the importance of protecting minors from being stalked online. === Oral argument === The oral argument took place in February 2017. Packingham’s lawyer, David T. Goldberg, argued that the law banned “vast swaths of First Amendment activity”, went too far in restricting which Internet sites could be accessed, and forbade use of the Internet in general. The law targeted speech on some of the platforms that Americans use most often, Goldberg noted, and that under the law Packingham could not even use Twitter to read the myriad messages discussing his own case. He further noted that the law imposes punishment without regard to whether the offender actually did anything wrong. North Carolina’s senior deputy Attorney General, Robert C. Montgomery, argued for the state, and claimed that communication through social media sites is a “crucial channel”. Justice Sonia Sotomayor asked Montgomery to provide evidence as to the claim that by giving Packingham Internet privileges, he would commit another crime. Justice Stephen Breyer added that “It seems to be well-settled law that the state can’t (bar usage) unless there is a 'clear and present danger'." === Opinion of the Court === In June 2017 the Supreme Court delivered a judgment in favor of Packingham, unanimously voting to reverse the state court's ruling. Justice Anthony Kennedy authored the decision, joined by Justice Ginsburg, Justice Breyer, Justice Sotomayor, and Justice Kagan. Kennedy explained the decision: "A fundamental principle of the First Amendment is that all persons have access to places where they can speak and listen, and then, after reflection, speak and listen once more." He continued that "By prohibiting sex offenders from using those websites, North Carolina with one broad stroke bars access to what for many are the principal sources for knowing current events, checking ads for employment, speaking and listening in the modern public square, and otherwise exploring the vast realms of human thought and knowledge." Citing Ashcroft v. Free Speech Coalition as a precedent, Kennedy also wrote: "It is well established that, as a general rule, the Government 'may not suppress lawful speech as the means to suppress unlawful speech'." === Concurring opinion === Justice Samuel Alito wrote an opinion concurring in the judgment, joined by John Roberts and Clarence Thomas. While Alito agreed that the state statute at issue violated the First Amendment, he noted that there are reasonable scenarios for which legal bans for sex offenders can be placed, such as for sites targeted at teenagers. Justice Gorsuch took no part in the decision of the case. == Impact == Packingham v. North Carolina was one of the first U.S. Supreme Court cases to ana

    Read more →
  • The Drivers Cooperative

    The Drivers Cooperative

    The Drivers Cooperative or Co-Op Ride is an American ridesharing company and mobile app that is a workers cooperative, owned collectively by the drivers. The cooperative launched in May 2021 in New York City, with the first 2,500 drivers issued their ownership certificates in a media event. The cooperative was co-founded by Grenadan immigrant and for hire vehicle driver Ken Lewis, labor organizer Erik Forman, and former Uber executive Alissa Orlando. Mohammad Hossen is the first member of the drivers' advisory board, which they plan to expand democratically as more drivers are onboarded. Other staff include software and industry veterans and in addition to co-founder Lewis, there are other drivers in management roles such as ex-driver and organizer David Alexis. The Co-Op Ride app is on the iOS and Android platforms and is built on Google Maps, Stripe, and Waze. By July, the app had been downloaded by 30,000 users and the number of drivers increased to 3,400, and by August there were 40,000 users. The cooperative is owned by the drivers themselves, and takes 15% from each ride for business overhead costs, as opposed to the 25% to 40% ride hail apps like Uber or Lyft take per ride. While being ultimately owned by the driver members, not by investors, the cooperative began with seed money from the Minnesota-based Community Development Financial Institution Shared Capital Cooperative, the local Lower East Side People's Federal Credit Union, and welcomed individual donations via crowdfunding in the form of revenue sharing debt on Wefunder. Each driver is a member of the cooperative and owns one share of the company and one vote in business and leadership decisions. In addition to a larger percentage of the fees per ride driven, each driver as a part-owner will also receive a share of the company's profits after loans and other expenses are paid, in the form of weighted dividends. The drivers use their own cars. The cooperative vets its owner-members further than what is already performed by the New York City Taxi and Limousine Commission (TLC), and gives a fixed price when a car is ordered and does not engage in surge pricing. The TLC imposed a minimum payrate for mobile app ridesharing companies operating in New York city in 2018. In 2021 that is $1.26 per mile which Uber and Lyft do not pay above; the cooperative pays a minimum mileage of $1.64. The cooperative intends to be able to set aside 10% of profits to community foundations and other non-profits and community organizations. The cooperative has engaged in advocacy around a policy agenda voted on by its members. Legislation to achieve this policy goal was introduced by State Senator Julia Salazar and Assemblymember Jessica González-Rojas, with the support of a coalition led by The Drivers Cooperative, United Auto Workers Region 9 and 9A, Sunrise Movement, New York Lawyers for the Public Interest, and New York Communities for Change.

    Read more →
  • Comparison of user features of operating systems

    Comparison of user features of operating systems

    Comparison of user features of operating systems refers to a comparison of the general user features of major operating systems in a narrative format. It does not encompass a full exhaustive comparison or description of all technical details of all operating systems. It is a comparison of basic roles and the most prominent features. It also includes the most important features of the operating system's origins, historical development, and role. == Overview == An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also include accounting software for cost allocation of processor time, mass storage, printing, and other resources. For hardware functions such as input and output and memory allocation, the operating system acts as an intermediary between programs and the computer hardware, although the application code is usually executed directly by the hardware and frequently makes system calls to an OS function or is interrupted by it. Operating systems are found on many devices that contain a computer – from cellular phones and video game consoles to web servers and supercomputers. As of June 2024, the dominant general-purpose desktop operating system is Microsoft Windows with a market share of around 72.91%. macOS by Apple Inc. is in second place (14.93%), and the varieties of Linux are collectively in third place (4.04%). In the mobile sector, including both smartphones and tablets, Android is dominant with a market share of 71%, followed by Apple's iOS with 28%; for smartphones alone, Android has 72% and iOS has 28%. Linux distributions are dominant in the server and supercomputing sectors. Other specialized classes of operating systems (special-purpose operating systems)), such as embedded and real-time systems, exist for many applications. Security-focused operating systems also exist. Some operating systems have low system requirements (i.e. light-weight Linux distribution). Others may have higher system requirements. Some operating systems require installation or may come pre-installed with purchased computers (OEM-installation), whereas others may run directly from media (i.e. live cd) or flash memory (i.e. USB stick). == MS-DOS == === Overview === MS-DOS (acronym for Microsoft Disk Operating System) is an operating system for x86-based personal computers mostly developed by Microsoft. Collectively, MS-DOS, its rebranding as IBM PC DOS, and some operating systems attempting to be compatible with MS-DOS, are sometimes referred to as "DOS" (which is also the generic acronym for disk operating system). MS-DOS was the main operating system for IBM PC compatible personal computers during the 1980s, from which point it was gradually superseded by operating systems offering a graphical user interface (GUI), in various generations of the graphical Microsoft Windows operating system. IBM licensed and re-released it in 1981 as PC DOS 1.0 for use in its PCs. Although MS-DOS and PC DOS were initially developed in parallel by Microsoft and IBM, the two products diverged after twelve years, in 1993, with recognizable differences in compatibility, syntax, and capabilities. During its lifetime, several competing products were released for the x86 platform, and MS-DOS went through eight versions, until development ceased in 2000. Initially, MS-DOS was targeted at Intel 8086 processors running on computer hardware using floppy disks to store and access not only the operating system, but application software and user data as well. Progressive version releases delivered support for other mass storage media in ever greater sizes and formats, along with added feature support for newer processors and rapidly evolving computer architectures. Ultimately, it was the key product in Microsoft's development from a programming language company to a diverse software development firm, providing the company with essential revenue and marketing resources. It was also the underlying basic operating system on which early versions of Windows ran as a GUI. == Microsoft Windows == === Overview === Microsoft Windows, commonly referred to as Windows, is a group of several proprietary graphical operating system families, all of which are developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. Active Microsoft Windows families include Windows NT and Windows IoT; these may encompass subfamilies, (e.g. Windows Server or Windows Embedded Compact) (Windows CE). Defunct Microsoft Windows families include Windows 9x, Windows Mobile, and Windows Phone. Microsoft announced an operating environment named Windows on 10 November 1983, as a graphical operating system shell for MS-DOS in response to the growing interest in graphical user interfaces (GUIs); Windows 1.0 first shipped on 20 November 1985. Microsoft Windows came to dominate the world's personal computer (PC) market with over 90% market share, overtaking Mac OS, which had been introduced in 1984, while Microsoft has in 2020 lost its dominance of the consumer operating system market, with Windows down to 30%, lower than Apple's 31% mobile-only share (65% for desktop operating systems only, i.e. "PCs" vs. Apple's 28% desktop share) in its home market, the US, and 32% globally (77% for desktops), where Google's Android leads. Apple came to see Windows as an unfair encroachment on their innovation in GUI development as implemented on products such as the Lisa and Macintosh (eventually settled in court in Microsoft's favor in 1993). As of January 2023, on PCs, Windows is still the most popular operating system in all countries. However, in 2014, Microsoft admitted losing the majority of the overall operating system market to Android, because of the massive growth in sales of Android smartphones. In 2014, the number of Windows devices sold was less than 25% that of Android devices sold. This comparison, however, may not be fully relevant, as the two operating systems traditionally target different platforms. Still, numbers for server use of Windows (that are comparable to competitors) show one third market share, similar to that for end user use. As of October 2020, the most recent version of Windows for PCs, tablets and embedded devices is Windows 10, version 20H2. The most recent version for server computers is Windows Server, version 20H2. A specialized version of Windows also runs on the Xbox One video game console. === Windows 95 === Windows 95 introduced a redesigned shell based around a desktop metaphor; File shortcuts (also known as shell links) were introduced and the desktop was re-purposed to hold shortcuts to applications, files and folders, reminiscent of Mac OS. In Windows 3.1 the desktop was used to display icons of running applications. In Windows 95, the currently running applications were displayed as buttons on a taskbar across the bottom of the screen. The taskbar also contained a notification area used to display icons for background applications, a volume control and the current time. The Start menu, invoked by clicking the "Start" button on the taskbar or by pressing the Windows key, was introduced as an additional means of launching applications or opening documents. While maintaining the program groups used by its predecessor Program Manager, it also displayed applications within cascading sub-menus. The previous File Manager program was replaced by Windows Explorer and the Explorer-based Control Panel and several other special folders were added such as My Computer, Dial Up Networking, Recycle Bin, Network Neighborhood, My Documents, Recent documents, Fonts, Printers, and My Briefcase among others. AutoRun was introduced for CD drives. The user interface looked dramatically different from prior versions of Windows, but its design language did not have a special name like Metro, Aqua or Material Design. Internally it was called "the new shell" and later simply "the shell". The subproject within Microsoft to develop the new shell was internally known as "Stimpy". In 1994, Microsoft designers Mark Malamud and Erik Gavriluk approached Brian Eno to compose music for the Windows 95 project. The result was the six-second start-up music-sound of the Windows 95 operating system, The Microsoft Sound and it was first released as a startup sound in May 1995 on Windows 95 May Test Release build 468. When released for Windows 95 and Windows NT 4.0, Internet Explorer 4 came with an optional Windows Desktop Update, which modified the shell to provide several additional updates to Windows Explorer, including a Quick Launch toolbar, and new features integrated with Internet Explorer, such as Active Desktop (which allowed Internet content to be displayed directly on the desktop). Some of the user interface elements introduced in Windows 95, such as the desktop, taskbar, Start menu and Windows

    Read more →
  • Anonymous social media

    Anonymous social media

    Anonymous social media is a subcategory of social media wherein the main social function is to share and interact around content and information anonymously on mobile and web-based platforms. Another key aspect of anonymous social media is that content or information posted is not connected with particular online identities or profiles. == Background == Appearing very early on the web as mostly anonymous-confession websites, this genre of social media has evolved into various types and formats of anonymous self-expression. One of the earliest anonymous social media forums was 2channel, which was first introduced online on May 30, 1999, as a Japanese text board forum. With the way digital content is consumed and created continuously changing, the trending shift from web to mobile applications is also affecting anonymous social media. This can be seen as anonymous blogging, or various other format based content platforms such as nameless question and answer online platforms like Ask.fm introduced mobile versions of their services. The number of new networks joining the anonymous social sharing scene continues to grow rapidly. == Degrees of anonymity == Across different forms of anonymous social media there are varying degrees of anonymity. Some applications, such as Librex, require users to sign up for an account, even though their profile is not linked to their posts. While these applications remain anonymous, some of these sites can sync up with the user's contact list or location to develop a context within the social community and help personalize the user's experience, such as Yik Yak or Secret. Other sites, such as 4chan and 2channel, allow for a purer form of anonymity as users are not required to create an account, and posts default to the username of "Anonymous". While users can still be traced through their IP address, there are anonymizing services like I2P or various proxy server services that encrypt a user's identity online by running it through different routers. Secret users must provide a phone number or email when signing up for the service, and their information is encrypted into their posts. Stylometry poses a risk to the anonymity or pseudonymity of social media users, who may be identifiable by writing style; in turn, they may use adversarial stylometry to resist such identification. == Controversy == Apps such as Formspring, Ask, Sarahah, Whisper, and Secret have elicited discussion around the rising popularity of anonymity apps, including debate and anticipation about this social sharing class. As more and more platforms join the league of anonymous social media, there is growing concern about the ethics and morals of anonymous social networking as cases of cyber-bullying, and personal defamation occurs. Formspring, also known as spring.me, and Ask.fm have both been associated with teen suicides as a result of cyberbullying on the sites. Formspring has been associated with at least three teen suicides and Ask.fm with at least five. For instance, the app Secret got shut down due to its escalated use of cyberbullying. The app Yik Yak has also helped to contribute to more cyberbullying situations and, in turn, was blocked on some school networks. Their privacy policy meant that users could not be identified without a subpoena, search warrant, or court order. Another app called After School also sparked controversy for its app design that lets students post any anonymous content. Due to these multiple controversies, the app has been removed from both Apple and Google app stores. As the number of people using these platforms multiplies, unintended uses of the apps have increased, urging popular networks to enact in-app warnings and prohibit the use for middle and high school students. 70% of teens admit to making an effort to conceal their online behavior from their parents. Even Snapchat has some relation to the health of children after using social media. This is an app that is meant to be quick and simple but in many ways it can be overwhelming. A person can post something, and it will be gone in seconds. Oftentimes, the post that was made was inappropriate and harmful to another person. It's a never-ending cycle. Some of these apps have also been criticized for causing chaos in American schools, such as lockdowns and evacuations. In order to limit the havoc caused, anonymous apps are currently removing all abusive and harmful posts. Apps such as Yik Yak, Secret, and Whisper are removing these posts by outsourcing the job of content supervision to oversea surveillance companies. These companies hire a team of individuals to inspect and remove any harmful or abusive posts. Furthermore, algorithms are also used to detect and remove any abusive posts the individuals may have missed. Another method used by the anonymous app named Cloaq to reduce the number of harmful and abusive posts is to limit the number of users that can register during a certain period. Under this system, all contents are still available to the public, but only registered users can post. Other websites such as YouTube have gone on to create new policies regarding anonymity. YouTube now does not allow anonymous comments on videos. Users must have a Google account to like, dislike, comment or reply to comments on videos. Once a sign-in user "likes" a video, it will be added to that user's 'Liked video playlist'. YouTube changed their "Liked video playlist" policy in December 2019, allowing a signed-in user to keep their "Liked video playlist" private. Historically, these controversies and the rise of cyberbullying have been blamed on the anonymous aspect of many social media platforms, but about half of US adult online harassment cases do not involve anonymity, and researchers have found that if targeted harassment exists offline it will also be found online, because online harassment is a reflection of existing prejudices. == As platforms for anonymous discussion == Anonymous social media can be used for political discussion in countries where political opinions opposed to the government are normally suppressed, and allow persons of different genders to communicate freely in cultures where such communication is not generally accepted. In the United States, the 2016 presidential election led to an increase in the use of anonymous social media websites to express political stances. Moreover, anonymous social media can also provide authentic connection to complete anonymous communication. There have been cases where these anonymous platforms have saved individuals from life-threatening situation or spread news about a social cause. Additionally, anonymous social websites also allow internet users to communicate while also safeguarding personal information from criminal actors and corporations that sell users' data. A study in 2017 on the content posted to 4chan's /pol/ board found that the majority of the content was unique, including 70% of the 1 million images included in the studied data set. == Revenue generated by anonymous social media == === Anonymous apps === Generating revenue from anonymous apps has been a discussion for investors. Since little information is collected about the users, it is difficult for anonymous apps to advertise to users. However some apps, such as Whisper, have found a method to overcome this obstacle. They have developed a "keyword-based" approach, where advertisements are shown to users depending on certain words they type. The app Yik Yak has been able to capitalize on the features they provide. Anonymous apps such a Chrends take the approach of using anonymity to provide freedom of speech. Telephony app Burner has regularly been a top grossing utilities app in the iOS and Android app stores using its phone number generation technology. Despite the success of some anonymous apps, there are also apps, such as Secret, which have yet to find a way to generate revenue. The idea of an anonymous app has also caused mixed opinions within investors. Some investors have invested a large sum of money because they see the potential revenue generated within these apps. Other investors have stayed away from investing these apps because they feel these apps bring more harm than good. === Anonymous sites === There are several sources to generate revenue for anonymous social media sites. One source of revenue is by implementing programs such as a premium membership or a gift-exchanging program. Another source of revenue is by merchandising goods and specific usernames to users. In addition, sites such as FMyLife, have implemented a policy where the anonymous site will receive 50% of profit from apps that makes money off it. In terms of advertisements, some anonymous sites have had troubles implementing or attracting them. There are several reasons for this problem. Anonymous sites, such as 4chan, have received few advertisement offers due to some of the contents it generates. Other anonymous sites, such as Reddit, have been ca

    Read more →
  • Alt TikTok

    Alt TikTok

    Alt TikTok (or 2020 Alt) was an online youth subculture and internet community that emerged on TikTok in 2020. Alt TikTok users (also known as alt girls, alt boys, or alt kids) emerged as primarily LGBTQ+ individuals who were in contrast to "Straight TikTok" which was seen as the mainstream and heteronormative side of the platform. The subculture became closely associated with music surrounding the hyperpop scene, particularly 100 gecs and also led to a short-lived fashion style and Internet aesthetic adopted by Generation Z during the COVID-19 lockdowns. Notable artists associated with the movement included Girl in Red, Freddie Dredd, David Shawty, WHOKILLEDXIX, and 645AR. While "alt kid" might imply a general association with traditional alternative fashion, the subculture was more an offshoot of e-girls and e-boys. In 2023, the hashtag #altfashion on TikTok amassed over 1.8 billion views. == History == Around mid-2020, users on TikTok began to group different content on the site into labels like "elite TikTok", "deep TikTok", and "floptok". These categories acted as different "sides of TikTok", deviating from mainstream lip syncing, online trends, and dance videos. Alt TikTok became one of the many subcultural communities to emerge during this period, initially referred to interchangeably with "elite TikTok". The movement quickly identified itself with alternative and queer users, in contrast to "Straight TikTok", also known as the "straight side of TikTok", which was seen as the mainstream and heteronormative side of the platform. Alt TikTok was accompanied by memes with surrealist or supernatural themes (sometimes being described as cursed), such as videos with heavy saturation and humanoid animals. One of the popular videos from Alt TikTok, gaining 18 million likes, shows a llama dancing to a cover of a song from a Russian commercial by the cereal brand Miel Pops, later becoming a viral audio. Some Alt TikTok users personified brands and products in what was referred to as Retail TikTok. In 2020, Rolling Stone described Alt TikTok as "one of the primary countercultures on the app." In 2020, American journalist Taylor Lorenz stated in an article of The New York Times, "Every pop sensation needs its ironic counterpoints. Alt Tiktok gets it done. [...] alt TikTok stars like Mooptopia are mainstays on the more indie side of the app. They aren't the popular crowd, but their cool, quirky content still attracts millions." === Trump rally trolling === In June 2020, alt TikTok and K-pop twitter users coordinated a strategy to ruin a Trump rally in Tulsa, Oklahoma. American politician and activist Alexandria Ocasio-Cortez later saluted the individuals for their "Trump troll". == Alt subculture == In 2020, Alt TikTok was one of many subcultural communities to emerge on TikTok, alongside Deep TikTok (aka DeepTok) and Flop TikTok (aka Floptok). The alt kid subculture emerged from Alt TikTok primarily among young Gen Z women, influenced by online fashion and aesthetics shaped by e-girls and e-boys. The movement was accelerated by the COVID-19 lockdowns, while the subculture itself stood in opposition to mainstream "Straight TikTok" and the VSCO girl movement, primarily adopting aspects of queer and alternative culture. While the phrase might imply a general association with alternative fashion or alternative culture, it is more accurately understood as a specific internet-driven outgrowth of online aesthetic youth subcultures like e-girls and e-boys. The alt subculture's visual style blended influences from goth, punk, emo, and grunge, often expressed through fashion, music taste, and online presence. === Style and music === The style of alt-girls is reminiscent of a myriad of previous alternative fashion trends, often blending these influences with online aesthetics. In 2020, TikTok alt-girls were teens ranging from ages 13 to 16, who tended to wear friendship bracelets, goth boots, Dr. Martens, bunny and frog hats, piercings, and split-dyed hair, as well as iconography lifted from Monster Energy and Hello Kitty. Some alt-girls displayed a love of cosplay, while drawing from Japanese anime and manga, particularly Danganronpa and Haikyu!!, which originally gained traction on the app through Anime TikTok (aka Anitok). Alt TikTok has been noted for being primarily influenced by queer and alternative culture, positioning itself in contrast to "Straight TikTok", which focused on mainstream dances and music. Alt kids frequently intersected with the e-girls and e-boys subculture, in terms of music, style, visual media, and aesthetics. Several musicians and artists were closely associated with the alt subculture, particularly those in the hyperpop scene, while alt tiktok users became important in the wider popularization of artists like 100 gecs. Notable prominent artists associated with Alt Tiktok included Girl in Red, Freddie Dredd, David Shawty, WHOKILLEDXIX, and 645AR, alongside music by YouTubers turned musicians such as Wilbur Soot's "I'm in Love With an E‐Girl" and Corpse Husband's "E-Girls Are Ruining My Life!". == Legacy == In 2020, Pitchfork claimed Alt TikTok as having an influence on wider music trends, stating: "Alt TikTok's music is now a hot zone for major record labels, pushing it even further into the mainstream". After the COVID-19 lockdowns, Alt TikTok, alongside its subculture, fell out of prominence and was taken over by other Gen Z-related internet aesthetics, developments, and online trends.

    Read more →
  • Feature hashing

    Feature hashing

    In machine learning, feature hashing, also known as the hashing trick (by analogy to the kernel trick), is a fast and space-efficient way of vectorizing features, i.e. turning arbitrary features into indices in a vector or matrix. It works by applying a hash function to the features and using their hash values as indices directly (after a modulo operation), rather than looking the indices up in an associative array. In addition to its use for encoding non-numeric values, feature hashing can also be used for dimensionality reduction. This trick is often attributed to Weinberger et al. (2009), but there exists a much earlier description of this method published by John Moody in 1989. == Motivation == === Motivating example === In a typical document classification task, the input to the machine learning algorithm (both during learning and classification) is free text. From this, a bag of words (BOW) representation is constructed: the individual tokens are extracted and counted, and each distinct token in the training set defines a feature (independent variable) of each of the documents in both the training and test sets. Machine learning algorithms, however, are typically defined in terms of numerical vectors. Therefore, the bags of words for a set of documents is regarded as a term-document matrix where each row is a single document, and each column is a single feature/word; the entry i, j in such a matrix captures the frequency (or weight) of the j'th term of the vocabulary in document i. (An alternative convention swaps the rows and columns of the matrix, but this difference is immaterial.) Typically, these vectors are extremely sparse—according to Zipf's law. The common approach is to construct, at learning time or prior to that, a dictionary representation of the vocabulary of the training set, and use that to map words to indices. Hash tables and tries are common candidates for dictionary implementation. E.g., the three documents John likes to watch movies. Mary likes movies too. John also likes football. can be converted, using the dictionary to the term-document matrix ( John likes to watch movies Mary too also football 1 1 1 1 1 0 0 0 0 0 1 0 0 1 1 1 0 0 1 1 0 0 0 0 0 1 1 ) {\displaystyle {\begin{pmatrix}{\textrm {John}}&{\textrm {likes}}&{\textrm {to}}&{\textrm {watch}}&{\textrm {movies}}&{\textrm {Mary}}&{\textrm {too}}&{\textrm {also}}&{\textrm {football}}\\1&1&1&1&1&0&0&0&0\\0&1&0&0&1&1&1&0&0\\1&1&0&0&0&0&0&1&1\end{pmatrix}}} (Punctuation was removed, as is usual in document classification and clustering.) The problem with this process is that such dictionaries take up a large amount of storage space and grow in size as the training set grows. On the contrary, if the vocabulary is kept fixed and not increased with a growing training set, an adversary may try to invent new words or misspellings that are not in the stored vocabulary so as to circumvent a machine learned filter. To address this challenge, Yahoo! Research attempted to use feature hashing for their spam filters. Note that the hashing trick isn't limited to text classification and similar tasks at the document level, but can be applied to any problem that involves large (perhaps unbounded) numbers of features. === Mathematical motivation === Mathematically, a token is an element t {\displaystyle t} in a finite (or countably infinite) set T {\displaystyle T} . Suppose we only need to process a finite corpus, then we can put all tokens appearing in the corpus into T {\displaystyle T} , meaning that T {\displaystyle T} is finite. However, suppose we want to process all possible words made of the English letters, then T {\displaystyle T} is countably infinite. Most neural networks can only operate on real vector inputs, so we must construct a "dictionary" function ϕ : T → R n {\displaystyle \phi :T\to \mathbb {R} ^{n}} . When T {\displaystyle T} is finite, of size | T | = m ≤ n {\displaystyle |T|=m\leq n} , then we can use one-hot encoding to map it into R n {\displaystyle \mathbb {R} ^{n}} . First, arbitrarily enumerate T = { t 1 , t 2 , . . , t m } {\displaystyle T=\{t_{1},t_{2},..,t_{m}\}} , then define ϕ ( t i ) = e i {\displaystyle \phi (t_{i})=e_{i}} . In other words, we assign a unique index i {\displaystyle i} to each token, then map the token with index i {\displaystyle i} to the unit basis vector e i {\displaystyle e_{i}} . One-hot encoding is easy to interpret, but it requires one to maintain the arbitrary enumeration of T {\displaystyle T} . Given a token t ∈ T {\displaystyle t\in T} , to compute ϕ ( t ) {\displaystyle \phi (t)} , we must find out the index i {\displaystyle i} of the token t {\displaystyle t} . Thus, to implement ϕ {\displaystyle \phi } efficiently, we need a fast-to-compute bijection h : T → { 1 , . . . , m } {\displaystyle h:T\to \{1,...,m\}} , then we have ϕ ( t ) = e h ( t ) {\displaystyle \phi (t)=e_{h(t)}} . In fact, we can relax the requirement slightly: It suffices to have a fast-to-compute injection h : T → { 1 , . . . , n } {\displaystyle h:T\to \{1,...,n\}} , then use ϕ ( t ) = e h ( t ) {\displaystyle \phi (t)=e_{h(t)}} . In practice, there is no simple way to construct an efficient injection h : T → { 1 , . . . , n } {\displaystyle h:T\to \{1,...,n\}} . However, we do not need a strict injection, but only an approximate injection. That is, when t ≠ t ′ {\displaystyle t\neq t'} , we should probably have h ( t ) ≠ h ( t ′ ) {\displaystyle h(t)\neq h(t')} , so that probably ϕ ( t ) ≠ ϕ ( t ′ ) {\displaystyle \phi (t)\neq \phi (t')} . At this point, we have just specified that h {\displaystyle h} should be a hashing function. Thus we reach the idea of feature hashing. == Algorithms == === Feature hashing (Weinberger et al. 2009) === The basic feature hashing algorithm presented in (Weinberger et al. 2009) is defined as follows. First, one specifies two hash functions: the kernel hash h : T → { 1 , 2 , . . . , n } {\displaystyle h:T\to \{1,2,...,n\}} , and the sign hash ζ : T → { − 1 , + 1 } {\displaystyle \zeta :T\to \{-1,+1\}} . Next, one defines the feature hashing function: ϕ : T → R n , ϕ ( t ) = ζ ( t ) e h ( t ) {\displaystyle \phi :T\to \mathbb {R} ^{n},\quad \phi (t)=\zeta (t)e_{h(t)}} Finally, extend this feature hashing function to strings of tokens by ϕ : T ∗ → R n , ϕ ( t 1 , . . . , t k ) = ∑ j = 1 k ϕ ( t j ) {\displaystyle \phi :T^{}\to \mathbb {R} ^{n},\quad \phi (t_{1},...,t_{k})=\sum _{j=1}^{k}\phi (t_{j})} where T ∗ {\displaystyle T^{}} is the set of all finite strings consisting of tokens in T {\displaystyle T} . Equivalently, ϕ ( t 1 , . . . , t k ) = ∑ j = 1 k ζ ( t j ) e h ( t j ) = ∑ i = 1 n ( ∑ j : h ( t j ) = i ζ ( t j ) ) e i {\displaystyle \phi (t_{1},...,t_{k})=\sum _{j=1}^{k}\zeta (t_{j})e_{h(t_{j})}=\sum _{i=1}^{n}\left(\sum _{j:h(t_{j})=i}\zeta (t_{j})\right)e_{i}} ==== Geometric properties ==== We want to say something about the geometric property of ϕ {\displaystyle \phi } , but T {\displaystyle T} , by itself, is just a set of tokens, we cannot impose a geometric structure on it except the discrete topology, which is generated by the discrete metric. To make it nicer, we lift it to T → R T {\displaystyle T\to \mathbb {R} ^{T}} , and lift ϕ {\displaystyle \phi } from ϕ : T → R n {\displaystyle \phi :T\to \mathbb {R} ^{n}} to ϕ : R T → R n {\displaystyle \phi :\mathbb {R} ^{T}\to \mathbb {R} ^{n}} by linear extension: ϕ ( ( x t ) t ∈ T ) = ∑ t ∈ T x t ζ ( t ) e h ( t ) = ∑ i = 1 n ( ∑ t : h ( t ) = i x t ζ ( t ) ) e i {\displaystyle \phi ((x_{t})_{t\in T})=\sum _{t\in T}x_{t}\zeta (t)e_{h(t)}=\sum _{i=1}^{n}\left(\sum _{t:h(t)=i}x_{t}\zeta (t)\right)e_{i}} There is an infinite sum there, which must be handled at once. There are essentially only two ways to handle infinities. One may impose a metric, then take its completion, to allow well-behaved infinite sums, or one may demand that nothing is actually infinite, only potentially so. Here, we go for the potential-infinity way, by restricting R T {\displaystyle \mathbb {R} ^{T}} to contain only vectors with finite support: ∀ ( x t ) t ∈ T ∈ R T {\displaystyle \forall (x_{t})_{t\in T}\in \mathbb {R} ^{T}} , only finitely many entries of ( x t ) t ∈ T {\displaystyle (x_{t})_{t\in T}} are nonzero. Define an inner product on R T {\displaystyle \mathbb {R} ^{T}} in the obvious way: ⟨ e t , e t ′ ⟩ = { 1 , if t = t ′ , 0 , else. ⟨ x , x ′ ⟩ = ∑ t , t ′ ∈ T x t x t ′ ⟨ e t , e t ′ ⟩ {\displaystyle \langle e_{t},e_{t'}\rangle ={\begin{cases}1,{\text{ if }}t=t',\\0,{\text{ else.}}\end{cases}}\quad \langle x,x'\rangle =\sum _{t,t'\in T}x_{t}x_{t'}\langle e_{t},e_{t'}\rangle } As a side note, if T {\displaystyle T} is infinite, then the inner product space R T {\displaystyle \mathbb {R} ^{T}} is not complete. Taking its completion would get us to a Hilbert space, which allows well-behaved infinite sums. Now we have an inner product space, with enough structure to describe the geometry of the feature hashing function ϕ : R T → R n {\displaystyle \phi :\ma

    Read more →
  • Web API

    Web API

    A web API is an application programming interface (API) for either a web server or a web browser. As a web development concept, it can be related to a web application's client side (including any web frameworks being used). A server-side web API consists of one or more publicly exposed endpoints to a defined request–response message system, typically expressed in JSON or XML by means of an HTTP-based web server. A server API (SAPI) is not considered a server-side web API, unless it is publicly accessible by a remote web application. == Client side == A client-side web API is a programmatic interface to extend functionality within a web browser or other HTTP client. Originally these were most commonly in the form of native plug-in browser extensions however most newer ones target standardized JavaScript bindings. The Mozilla Foundation created their WebAPI specification which is designed to help replace native mobile applications with HTML5 applications. Google created their Native Client architecture which is designed to help replace insecure native plug-ins with secure native sandboxed extensions and applications. They have also made this portable by employing a modified LLVM AOT compiler. == Server side == A server-side web API consists of one or more publicly exposed endpoints to a defined request–response message system, typically expressed in JSON or XML. The web API is exposed most commonly by means of an HTTP-based web server. Mashups are web applications which combine the use of multiple server-side web APIs. Webhooks are server-side web APIs that take input as a Uniform Resource Identifier (URI) that is designed to be used like a remote named pipe or a type of callback such that the server acts as a client to dereference the provided URI and trigger an event on another server which handles this event thus providing a type of peer-to-peer IPC. === Endpoints === Endpoints are important aspects of interacting with server-side web APIs, as they specify where resources can be accessed by third-party software. Usually the access is via a URI to which HTTP requests are posted, and from which the response is thus expected. Web APIs may be public or private, the latter of which requires an access token. Endpoints need to be static, otherwise the correct functioning of software that interacts with them cannot be guaranteed. If the location of a resource changes (and with it the endpoint) then previously written software will break, as the required resource can no longer be found at the same place. As API providers still want to update their web APIs, many have introduced a versioning system in the URI that points to an endpoint. === Resources versus services === Web 2.0 Web APIs often use machine-based interactions such as REST and SOAP. RESTful web APIs use HTTP methods to access resources via URL-encoded parameters, and use JSON or XML to transmit data. By contrast, SOAP protocols are standardized by the W3C and mandate the use of XML as the payload format, typically over HTTP. Furthermore, SOAP-based Web APIs use XML validation to ensure structural message integrity, by leveraging the XML schemas provisioned with WSDL documents. A WSDL document accurately defines the XML messages and transport bindings of a Web service. === Documentation === Server-side web APIs are interfaces for the outside world to interact with the business logic. For many companies this internal business logic and the intellectual property associated with it are what distinguishes them from other companies, and potentially what gives them a competitive edge. They do not want this information to be exposed. However, in order to provide a web API of high quality, there needs to be a sufficient level of documentation. One API provider that not only provides documentation, but also links to it in its error messages is Twilio. However, there are now directories of popular documented server-side web APIs. === Growth and impact === The number of available web APIs has grown consistently over the past years, as businesses realize the growth opportunities associated with running an open platform, that any developer can interact with. ProgrammableWeb tracks over 24000 Web APIs that were available in 2022, up from 105 in 2005. Web APIs have become ubiquitous. There are few major software applications/services that do not offer some form of web API. One of the most common forms of interacting with these web APIs is via embedding external resources, such as tweets, Facebook comments, YouTube videos, etc. In fact there are very successful companies, such as Disqus, whose main service is to provide embeddable tools, such as a feature-rich comment system. Any website of the TOP 100 Alexa Internet ranked websites uses APIs and/or provides its own APIs, which is a very distinct indicator for the prodigious scale and impact of web APIs as a whole. As the number of available web APIs has grown, open source tools have been developed to provide more sophisticated search and discovery. APIs.json provides a machine-readable description of an API and its operations, and the related project APIs.io offers a searchable public listing of APIs based on the APIs.json metadata format. === Business === ==== Commercial ==== Many companies and organizations rely heavily on their Web API infrastructure to serve their core business clients. In 2014 Netflix received around 5 billion API requests, most of them within their private API. ==== Governmental ==== Many governments collect a lot of data, and some governments are now opening up access to this data. The interfaces through which this data is typically made accessible are web APIs. Web APIs allow for data, such as "budget, public works, crime, legal, and other agency data" to be accessed by any developer in a convenient manner. == Example == An example of a popular web API is the Astronomy Picture of the Day API operated by the American space agency NASA. It is a server-side API used to retrieve photographs of space or other images of interest to astronomers, and metadata about the images. According to the API documentation, the API has one endpoint: https://api.nasa.gov/planetary/apod The documentation states that this endpoint accepts GET requests. It requires one piece of information from the user, an API key, and accepts several other optional pieces of information. Such pieces of information are known as parameters. The parameters for this API are written in a format known as a query string, which is separated by a question mark character (?) from the endpoint. An ampersand (&) separates the parameters in the query string from each other. Together, the endpoint and the query string form a URL that determines how the API will respond. This URL is also known as a query or an API call. In the below example, two parameters are transmitted (or passed) to the API via the query string. The first is the required API key and the second is an optional parameter — the date of the photograph requested. https://api.nasa.gov/planetary/apod?api_key=DEMO_KEY&date=1996-12-03 Visiting the above URL in a web browser will initiate a GET request, calling the API and showing the user a result, known as a return value or as a return. This API returns JSON, a type of data format intended to be understood by computers, but which is somewhat easy for a human to read as well. In this case, the JSON contains information about a photograph of a white dwarf star: The above API return has been reformatted so that names of JSON data items, known as keys, appear at the start of each line. The last of these keys, named url, indicates a URL which points to a photograph: https://apod.nasa.gov/apod/image/9612/ngc2440_hst2.jpg Following the above URL, a web browser user would see this photo: Although this API can be called by an end user with a web browser (as in this example) it is intended to be called automatically by software or by computer programmers while writing software. JSON is intended to be parsed by a computer program, which would extract the URL of the photograph and the other metadata. The resulting photo could be embedded in a website, automatically sent via text message, or used for any other purpose envisioned by a software developer.

    Read more →
  • Blocking of Twitter in Nigeria

    Blocking of Twitter in Nigeria

    Twitter was blocked in Nigeria from 5 June 2021 to 13 January 2022. The government imposed a ban on the social network after it deleted tweets made by, and temporarily suspended, the Nigerian president Muhammadu Buhari, warning the southeastern people of Nigeria, predominantly Igbo people, of a potential repeat of the 1967 Nigerian Civil War due to the ongoing insurgency in Southeastern Nigeria. The Nigerian government claimed that the deletion of the president's tweets factored into their decision, but it was ultimately based on "a litany of problems with the social media platform in Nigeria, where misinformation and fake news spread through it have had real world violent consequences", citing the persistent use of the platform for activities that are capable of undermining Nigeria's corporate existence. In January 2022, Nigeria lifted its blocking of Twitter after the platform agreed to establish a legal entity within the country sometime in the first quarter of 2022. == Background == On 1 June 2021, Nigerian President Muhammadu Buhari posted a tweet threatening a crackdown on regional separatists "in the language they understand". The next day, Twitter deleted the tweet, claiming it was in violation of Twitter rules, but gave no further details. Nigeria's Information Minister Lai Mohammed said that Twitter's actions were part of an unfair double standard, as Twitter had not banned incitement tweets from other groups. During the Nigerian Civil War a majority of deaths resulted from the blockade of Biafra which caused the deaths of millions of civilians from starvation, a fact that was not alluded to in the tweet. The Nigerian government has long held concerns over the use of Twitter in the country. The ongoing local End SARS protest began on Twitter and got amplified in 2020 when it had 48 million tweets in ten days. Buhari's government floated the idea of social media regulation on different occasions prior to banning Twitter. Attempts to pass an anti-social media bill in the past have failed majorly due to massive outcry on Twitter. Days before the ban, the country's minister of information called Twitter's activities in Nigeria suspicious, citing its influence on the End SARS protests. == Aftermath == Three days after Twitter was suspended, it was reported that the move had cost the country over 6 billion naira and would also contribute to the worsening unemployment in the country. ExpressVPN reported an over 200 percent increase in web traffic and searches for VPN spiked across the country. In response, Nigeria's Minister of Justice and Attorney General of the Federation Abubakar Malami at first openly threatened to prosecute citizens who bypass the ban using a VPN but then denied saying so after a screenshot of a Twitter deactivation notification he shared on Facebook showed a VPN logo. Nigeria's cultural minister Lai Mohammed stated the ban would be lifted once Twitter submitted to locally licensing, registration and conditions. "It will be licensed by the broadcasting commission, and must agree not to allow its platform to be used by those who are promoting activities that are inimical to the corporate existence of Nigeria." In late June 2021, Twitter announced it would enter talks with the Nigerian government over the platform's suspension. The talks began in July 2021. On 15 September 2021, Mohammed said the Nigerian government will lift the ban on Twitter in a "few days." The Minister said Twitter gave a progress report of their talks with them, adding that it has been productive and quite respectful. On 1 October 2021, President Muhammadu Buhari in his Independence Day broadcast said Twitter must meet the Nigerian government's five conditions before the suspension of the social media platform will be lifted. The conditions are: Respect for national security and cohesion; registration, physical presence and representation in Nigeria; fair taxation; dispute resolution; local content. == Reactions == The ban was condemned by Amnesty International, the British, Canadian and Swedish diplomatic missions to Nigeria, as well as the United States and the European Union in a joint statement. Two domestic organizations, the Socio-Economic Rights and Accountability Project (SERAP) and the Nigerian Bar Association, indicated intent to challenge the ban in court. Twitter itself called the ban "deeply concerning". Former U.S. President Donald Trump, who was permanently suspended from Twitter following the United States Capitol attack in January, praised the ban, stating "Congratulations to the country of Nigeria, who just banned Twitter because they banned their President", and also called on other countries to ban Twitter and Facebook due to "not allowing free and open speech." == Lifting of the ban == On 12 January 2022, the Nigerian Government lifted the ban after Twitter agreed to pay an "applicable tax" and establish "a legal entity in Nigeria during the first quarter of 2022".

    Read more →