Public computer

A public computer (or public access computer) is any of various computers available in public areas. Some places where public computers may be available are libraries, schools, or dedicated facilities run by government. Public computers share similar hardware and software components to personal computers, however, the role and function of a public access computer is entirely different. A public access computer is used by many different untrusted individuals throughout the course of the day. The computer must be locked down and secure against both intentional and unintentional abuse. Users typically do not have authority to install software or change settings. A personal computer, in contrast, is typically used by a single responsible user, who can customize the machine's behavior to their preferences. Public access computers are often provided with tools such as a PC reservation system to regulate access. The world's first public access computer center was the Marin Computer Center in California, co-founded by David and Annie Fox in 1977. == Kiosks == A kiosk is a special type of public computer using software and hardware modifications to provide services only about the place the kiosk is in. For example, a movie ticket kiosk can be found at a movie theater. These kiosks are usually in a secure browser with zero access to the desktop. Many of these kiosks may run Linux, however, ATMs, a kiosk designed for depositing money, often run Windows XP. == Public computers in the United States == === Library computers === In the United States and Canada, almost all public libraries have computers available for the use of patrons, though some libraries will impose a time limit on users to ensure others will get a turn and keep the library less busy. Users are often allowed to print documents that they have created using these computers, though sometimes for a small fee. ==== Privacy ==== Privacy is an important part of the public library institution, since the libraries entitle the public to intellectual freedom. Use of any computer or network may create records of users' activities that can jeopardize their privacy. It is possible for a patron to jeopardize their privacy if they do not delete cache, clear cookies, or documents from the public computer. In order for a member of the public to remain private on a computer, the American Library Association (ALA) has guidelines. These give patrons an idea of the right way to keep using public library computers. In their provision of services to library users, librarians have an ethical responsibility, expressed in the ALA Code of Ethics, to preserve users' right to privacy. A librarian is also responsible for giving users an understanding of private patron use and access. Libraries must ensure that users have the following rights when browsing on public computers: the computer automatically will clear a users history; libraries should display privacy screens so users do not see another patron's screen; updating software for effective safety measures; restoration data software to clear documents that users may have left on their computers and to combat possible malware; security practices; and making users aware of any possible monitoring of their browsing activities. Users can also view the Library Privacy Checklist for Public Access Computers and Networks to better understand what libraries strive for when protecting privacy. === School computers === The U.S. government has given money to many school boards to purchase computers for educational applications. Schools may have multiple computer labs, which contain these computers for students to use. There is usually Internet access on these machines, but some schools will put up a blocking service to limit the websites that students are able to access to only include educational resources, such as Google. In addition to controlling the content students are viewing, putting up these blocks can also help to keep the computers safe by preventing students from downloading malware and other threats. However, the effectiveness of such content filtering systems is questionable since it can easily be circumvented by using proxy websites, Virtual Private Networks, and for some weak security systems, merely knowing the IP address of the intended website is enough to bypass the filter. School computers often have advanced operating system security to prevent tech-savvy students from inflicting damage (i.e. the Windows Registry Editor and Task Manager, etc.) are disabled on Microsoft Windows machines. Schools with very advanced tech services may also install a locked down BIOS/firmware or make kernel-level changes to the operating system, precluding the possibility of unauthorized activity.

Reconstruction from projections

The problem of reconstructing a multidimensional signal from its projection is uniquely multidimensional, having no 1-D counterpart. It has applications that range from computer-aided tomography to geophysical signal processing. It is a problem which can be explored from several points of view—as a deconvolution problem, a modeling problem, an estimation problem, or an interpolation problem. == Motivation and applications == Many fields in science and engineering use reconstruction from projections, especially in imaging. It is widely applied geophysical tomography, medical imaging and industrial radiography. For example, in a CT scanner, the 3D structure of the patient’s body being scanned is measured with beams going through the tissue and hitting a detector, giving a flat projection of the body from that angle. Multiple projections are put together to get an image of the position and shape of structures inside in 3D. == Problem statement and basics == A projection is a linear mapping of an M {\displaystyle M} dimensional signal into an N {\displaystyle N} dimensional one, where N ≤ M {\displaystyle N\leq M} . And the objective of reconstruction is to restore the M {\displaystyle M} dimensional signal based on the N {\displaystyle N} dimensional signal. The following case is a 2-D signal projected into 1D signal. The signal in the original coordinate is denoted as d ( u , v ) {\displaystyle d(u,v)} . Now consider a collimated beam of radiation coming from the opposite orientation of v ^ {\displaystyle {\hat {v}}} , producing a projection along u ^ {\displaystyle {\hat {u}}} . v ^ {\displaystyle {\hat {v}}} and u ^ {\displaystyle {\hat {u}}} are normal to each other, and the angle between u {\displaystyle u} and u ^ {\displaystyle {\hat {u}}} is theta. The signal obtained along u ^ {\displaystyle {\hat {u}}} axis is defined to be p θ ( u ^ ) {\displaystyle p_{\theta }({\hat {u}})} . The relationship between the original coordinate and the rotated coordinate is given by [ u ^ v ^ ] = [ cos ⁡ θ sin ⁡ θ − sin ⁡ θ cos ⁡ θ ] [ u v ] {\displaystyle {\begin{bmatrix}{\hat {u}}\\{\hat {v}}\end{bmatrix}}={\begin{bmatrix}\cos \theta &\sin \theta \\-\sin \theta &\cos \theta \end{bmatrix}}{\begin{bmatrix}u\\v\end{bmatrix}}} or inversely, [ u v ] = [ cos ⁡ θ − sin ⁡ θ sin ⁡ θ cos ⁡ θ ] [ u ^ v ^ ] {\displaystyle {\begin{bmatrix}u\\v\end{bmatrix}}={\begin{bmatrix}\cos \theta &-\sin \theta \\\sin \theta &\cos \theta \end{bmatrix}}{\begin{bmatrix}{\hat {u}}\\{\hat {v}}\end{bmatrix}}} Then we have p θ ( u ^ ) = ∫ − ∞ ∞ d ( u , v ) d v ^ = ∫ − ∞ ∞ d ( u ^ cos ⁡ ( θ ) − v ^ sin ⁡ ( θ ) , u ^ sin ⁡ ( θ ) + v ^ cos ⁡ ( θ ) ) d v ^ {\displaystyle p_{\theta }({\hat {u}})=\int _{-\infty }^{\infty }d(u,v)\,\mathrm {d} {\hat {v}}=\int _{-\infty }^{\infty }d({\hat {u}}\cos(\theta )-{\hat {v}}\sin(\theta ),{\hat {u}}\sin(\theta )+{\hat {v}}\cos(\theta ))\,\mathrm {d} {\hat {v}}} By varying theta, a large number of projections can be obtained. Given the projection-slice theorem, D ( Ω , θ ) {\displaystyle D(\Omega ,\theta )} ,the slice of the Fourier transform of d ( u , v ) {\displaystyle d(u,v)} at angle theta, is equivalent to P θ ( Ω ) {\displaystyle P_{\theta }(\Omega )} , the Fourier Transform of the projection p θ ( u ^ ) {\displaystyle p_{\theta }({\hat {u}})} . Therefore, the unknown d ( u , v ) {\displaystyle d(u,v)} can be obtained from its Fourier transform by means of the Fourier transform inversion integral d ( u , v ) = 1 4 π 2 ∫ − ∞ ∞ ∫ − ∞ ∞ D ( Ω 1 , Ω 2 ) e j Ω 1 u e j Ω 2 v d Ω 1 , Ω 2 {\displaystyle \mathrm {d} (u,v)={\frac {1}{4\pi ^{2}}}\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }D(\Omega _{1},\Omega _{2})e^{j\Omega _{1}u}e^{j\Omega _{2}v}\,\mathrm {d} \Omega _{1},\Omega _{2}} = 1 4 π 2 ∫ 0 ∞ ∫ − π π D ( Ω , θ ) e j Ω u cos ⁡ ( θ ) e j Ω v s i n θ | Ω | d Ω d θ {\displaystyle ={\frac {1}{4\pi ^{2}}}\int _{0}^{\infty }\int _{-\pi }^{\pi }D(\Omega ,\theta )e^{j\Omega u\cos(\theta )}e^{j\Omega vsin\theta }{\begin{vmatrix}\Omega \end{vmatrix}}\,\mathrm {d} \Omega \mathrm {d} \theta } = 1 4 π 2 ∫ − π π ∫ 0 ∞ P θ ( Ω ) e j Ω ( u cos ⁡ θ + v sin ⁡ θ ) | Ω | d Ω d θ {\displaystyle ={\frac {1}{4\pi ^{2}}}\int _{-\pi }^{\pi }\int _{0}^{\infty }P_{\theta }(\Omega )e^{j}\Omega (u\cos \theta +v\sin \theta ){\begin{vmatrix}\Omega \end{vmatrix}}\,\mathrm {d} \Omega \mathrm {d} \theta } = 1 4 π 2 ∫ 0 π ( ∫ − ∞ ∞ P θ ( Ω ) | Ω | {\displaystyle ={\frac {1}{4\pi ^{2}}}\int _{0}^{\pi }(\int _{-\infty }^{\infty }P_{\theta }(\Omega ){\begin{vmatrix}\Omega \end{vmatrix}}} e j Ω u ^ d Ω ) d θ {\displaystyle e^{j\Omega {\hat {u}}}\mathrm {d} \Omega )\mathrm {d} \theta } By taking the inverse Fourier Transform and assuming g ( u ^ ) = F − 1 ( | Ω | 2 ) {\displaystyle g({\hat {u}})={\mathcal {F}}^{-1}({{\begin{vmatrix}\Omega \end{vmatrix}}^{2}})} , we get d ( u , v ) = ∑ i △ θ i [ p θ ( u ^ ) ∗ g θ i ( u ^ ) ] {\displaystyle d(u,v)=\sum _{i}\vartriangle \theta _{i}[p_{\theta }({\hat {u}})g_{\theta i}({\hat {u}})]} == Approaches == In practice, there are a wide variety of methods that are utilized, most of which are reconstruct 3-D information (volume) from 2-D signals (image). Typically used methods are CT, MRI, PET and SPECT. And the filtered back projection based on the principles introduced above are commonly applied. === Computed Tomography (CT) === In CT, a volume is formed by stacking the axial slices. The software cuts the volume in a different plane (usually orthogonal). Commonly, slice data is generated using an X-ray source that rotates around the object. X-ray sensors are positioned on the opposite side of the circle from the X-ray source. === Magnetic resonance imaging (MRI) === In MRI, energy from an oscillating magnetic field is temporarily applied to the patient at the appropriate resonance frequency. The protons (hydrogen atoms) emit a radio frequency signal which is measured by a receiving coil. The radio signal can be made to encode position information by varying the main magnetic field using gradient coils. === Positron emission tomography (PET) === The system detects pairs of gamma rays emitted indirectly by a positron-emitting radionuclide (tracer), which is introduced into the body on a biologically active molecule. Three-dimensional images of tracer concentration within the body are then constructed by computer analysis. In modern PET-CT scanners, three dimensional imaging is often accomplished with the aid of a CT X-ray scan performed on the patient during the same session, in the same machine. === Single-photon emission computed tomography (SPECT) === SPECT imaging is performed by using a gamma camera to acquire multiple 2-D images (projections) from multiple angles. Multiple projections are used to yield a 3-D data set. This data set may then be manipulated to show thin slices along any chosen axis of the body. SPECT is similar to PET in its use of radioactive tracer material and detection of gamma rays, while the tracers used in SPECT emit gamma radiation that is measured more directly.

Access-independent services

Access-independent service (AIS) is a service concept in which a service does not depend on guaranteed access network cooperation for service delivery. Telecommunications industry analyst Dean Bubley first used the term in a report on Telco-OTT in February 2012. Traditionally, most telecom company or internet service provider services are access-dependent, because they rely heavily on guaranteed access cooperation on the network the service is delivered over. For instance, traditional IP-based TV service (IPTV) delivered by a telecom company is generally a managed service. This means that IPTV service assumes the IPTV service provider has control over the access network that the IPTV service is delivered over, and network quality of service (QoS) guarantees are available for IPTV service delivery. As a result, the reach of a telecom company's IPTV service is generally restricted by the reach of the telecom company's access network. In contrast, services offered by non-traditional video content delivery service providers such as Netflix, Hulu, and Amazon Video are considered access-independent services. Netflix's video content streaming service, for example, dynamically adapts to network conditions in real-time to strive for the best overall quality of experience (QoE) and does not assume guaranteed cooperation from the underlying IP network, such as QoS. As a result, without considering content rights and different countries' government restrictions, the reach of Netflix's video content streaming service is, in theory, the reach of the Internet. Skype is another example of AIS, because Skype offers an IP-based telephony service over the Internet without depending on IP network cooperation guarantees other than basic IP network connectivity. In the context of telecom service delivery, the concept of access independent services is also commonly described by the term "over-the-top" (OTT) services. OTT service providers such as but not limited to Facebook, WeChat, and Netflix generally do not own or directly manage any wide-area access network to begin with, so they design their services for overall quality of experience, with no assumptions on guaranteed access network cooperation.

Radio network

A radio network is a system that distributes radio signals to multiple receivers or enables two-way communication between stations and mobile units. Worldwide, radio networks include broadcast networks, such as BBC Radio in the United Kingdom and NPR in the United States, which transmit one-to-many signals for news, entertainment, and public information; two-way radio networks, used by police, fire services, taxicabs, and delivery fleets for operational communication; and cellular networks, such as Verizon, Vodafone, and China Mobile, which provide mobile telephony and data services using frequency or time division duplexing. While all rely on radio-frequency technology like transmitters, receivers, and antennas, their network architectures, protocols, and regulatory frameworks differ substantially across applications and regions. The two-way type of radio network shares many of the same technologies and components as the broadcast-type radio network but is generally set up with fixed broadcast points (transmitters) with co-located receivers and mobile receivers/transmitters or transceivers. In this way both the fixed and mobile radio units can communicate with each other over broad geographic regions ranging in size from small single cities to entire states/provinces or countries. There are many ways in which multiple fixed transmit/receive sites can be interconnected to achieve the range of coverage required by the jurisdiction or authority implementing the system: conventional wireless links in numerous frequency bands, fibre-optic links, or microwave links. In all of these cases the signals are typically backhauled to a central switch of some type where the radio message is processed and resent (repeated) to all transmitter sites where it is required to be heard. In contemporary two-way radio systems, a concept called trunking is commonly used to achieve better efficiency of radio spectrum use. It provides a very wide range of coverage, with no switching of channels required by the mobile radio user as it roams throughout the system coverage. Trunking of two-way radio is identical to the concept used for cellular phone systems where each fixed and mobile radio is specifically identified to the system controller and its operation is switched by the controller. == Broadcasting networks == The broadcast type of radio network is a network system which distributes radio programming to multiple stations simultaneously, or slightly delayed, for the purpose of extending total coverage beyond the limits of a single broadcast signal. The resulting expanded audience for radio programming or information essentially applies the benefits of mass-production to the broadcasting enterprise. A radio network has two sales departments, one to package and sell programs to radio stations, and one to sell the audience of those programs to advertisers. Most radio networks also produce much of their programming. Originally, radio networks owned some or all of the stations that broadcast the network's radio format programming. Presently however, there are many networks that do not own any stations and only produce and/or distribute programming. Similarly station ownership does not always indicate network affiliation. A company might own stations in several different markets and purchase programming from a variety of networks. Radio networks rose rapidly with the growth of regular broadcasting of radio to home listeners in the 1920s. This growth took various paths in different places. In Britain the BBC was developed with public funding, in the form of a broadcast receiver license, and a broadcasting monopoly in its early decades. In contrast, in the United States various competing commercial broadcasting networks arose funded by advertising revenue. In that instance, the same corporation that owned or operated the network often manufactured and marketed the listener's radio. Major technical challenges to be overcome when distributing programs over long distances are maintaining signal quality and managing the number of switching/relay points in the signal chain. Early on, programs were sent to remote stations (either owned or affiliated) by various methods, including leased telephone lines, pre-recorded gramophone records and audio tape. The world's first all-radio, non-wireline network was claimed to be the Rural Radio Network, a group of six upstate New York FM stations that began operation in June 1948. Terrestrial microwave relay, a technology later introduced to link stations, has been largely supplanted by coaxial cable, fiber, and satellite, which usually offer superior cost-benefit ratios. Many early radio networks evolved into television networks.

WebCL

WebCL (Web Computing Language) is a JavaScript binding to OpenCL for heterogeneous parallel computing within any compatible web browser without the use of plug-ins, first announced in March 2011. It is developed on similar grounds as OpenCL and is considered as a browser version of the latter. Primarily, WebCL allows web applications to actualize speed with multi-core CPUs and GPUs. With the growing popularity of applications that need parallel processing like image editing, augmented reality applications and sophisticated gaming, it has become more important to improve the computational speed. With these background reasons, a non-profit Khronos Group designed and developed WebCL, which is a Javascript binding to OpenCL with a portable kernel programming, enabling parallel computing on web browsers, across a wide range of devices. In short, WebCL consists of two parts, one being Kernel programming, which runs on the processors (devices) and the other being JavaScript, which binds the web application to OpenCL. The completed and ratified specification for WebCL 1.0 was released on March 19, 2014. == Implementation == Currently, no browsers natively support WebCL. However, non-native add-ons are used to implement WebCL. For example, Nokia developed a WebCL extension. Mozilla does not plan to implement WebCL in favor of WebGL Compute Shaders, which were in turn scrapped in favor of WebGPU. Mozilla (Firefox) - hg.mozilla.org/projects/webcl/ === WebCL working draft === Samsung (WebKit) - github.com/SRA-SiliconValley/webkit-webcl (unavailable) Nokia (Firefox) - github.com/toaarnio/webcl-firefox (down since Nov 2014, Last Version for FF 34) Intel (Crosswalk) - www.crosswalk-project.org === Example C code === The basic unit of a parallel program is kernel. A kernel is any parallelizable task used to perform a specific job. More often functions can be realized as kernels. A program can be composed of one or more kernels. In order to realize a kernel, it is essential that a task is parallelizable. Data dependencies and order of execution play a vital role in producing efficient parallelized algorithms. A simple example can be thought of the case of loop unrolling performed by C compilers, where a statement like:can be unrolled into:Above statements can be parallelized and can be made to run simultaneously. A kernel follows a similar approach where only the snapshot of the ith iteration is captured inside kernel. Rewriting the above code using a kernel:Running a WebCL application involves the following steps: Allow access to devices and provide context Hand over the kernel to a device Cause the device to execute the kernel Retrieve results from the device Use the data inside JavaScript Further details about the same can be found at == Exceptions List == WebCL, being a JavaScript based implementation, doesn't return an error code when errors occur. Instead, it throws an exception such as OUT_OF_RESOURCES, OUT_OF_HOST_MEMORY, or the WebCL-specific WEBCL_IMPLEMENTATION_FAILURE. The exception object describes the machine-readable name and human-readable message describing the error. The syntax is as follows: From the code above, it can be observed that the message field can be a NULL value. Other exceptions include: INVALID_OPERATION – if the blocking form of this function is called from a WebCLCallback INVALID_VALUE – if eventWaitList is empty INVALID_CONTEXT – if events specified in eventWaitList do not belong to the same context INVALID_DEVICE_TYPE – if deviceType is given, but is not one of the valid enumerated values DEVICE_NOT_FOUND – if there is no WebCLDevice available that matches the given deviceType More information on exceptions can be found in the specs document. There is another exception that is raised upon trying to call an object that is ‘released’. On using the release method, the object doesn't get deleted permanently but it frees the resources associated with that object. In order to avoid this exception, releaseAll method can be used, which not only frees the resources but also deletes all the associated objects created. == Security == WebCL, being an open-ended software developed for web applications, has lots of scope for vulnerabilities in the design and development fields too. This forced the developers working on WebCL to give security the utmost importance. Few concerns that were addressed are: Out-of-bounds Memory Access: This occurs by accessing the memory locations, outside the allocated space. An attacker can rewrite or erase all the important data stored in those memory locations. Whenever there arises such a case, an error must be generated at the compile time, and zero must be returned at run-time, not letting the program override the memory. A project WebCL Validator, was initiated by the Khronos Group (developers) on handling this vulnerability. Memory Initialization: This is done to prevent the applications to access the memory locations of previous applications. WebCL ensures that this doesn't happen by initializing all the buffers, variables used to zero before it runs the current application. OpenCL 1.2 has an extension ‘cl_khr_initialize_memory’, which enables this. Denial of Service: The most common attack on web applications cannot be eliminated by WebCL or the browser. OpenCL can be provided with watchdog timers and pre-emptive multitasking, which can be used by WebCL in order to detect and terminate the contexts that are taking too long or consume lot of resources. There is an extension of OpenCL 1.2 ‘cl_khr_terminate_context’ like for the previous one, which enables to terminate the process that might cause a denial of service attack. == Related browser bugs == Bug 664147 - [WebCL] add openCL in gecko, Mozilla Bug 115457: [Meta] WebCL support for WebKit, WebKit Bugzilla

Language model benchmark

A language model benchmark is a standardized test designed to evaluate the performance of language models on various natural language processing tasks. These tests are intended for comparing different models' capabilities in areas such as language understanding, generation, and reasoning. Benchmarks generally consist of a dataset and corresponding evaluation metrics. The dataset provides text samples and annotations, while the metrics measure a model's performance on tasks like answering questions, text classification, and machine translation. These benchmarks are developed and maintained by academic institutions, research organizations, and industry players to track progress in the field. In addition to accuracy, the metrics can include throughput, energy efficiency, bias, trust, and sustainability. == Overview == === Types === Benchmarks may be described by the following adjectives, not mutually exclusive: Classical: These tasks are studied in natural language processing, even before the advent of deep learning. Examples include the Penn Treebank for testing syntactic and semantic parsing, as well as bilingual translation benchmarked by BLEU scores. Question answering: These tasks have a text question and a text answer, often multiple-choice. They can be open-book or closed-book. Open-book QA resembles reading comprehension questions, with relevant passages included as annotation in the question, in which the answer appears. Closed-book QA includes no relevant passages. Closed-book QA is also called open-domain question-answering. Before the era of large language models, open-book QA was more common, and understood as testing information retrieval methods. Closed-book QA became common since GPT-2 as a method to measure knowledge stored within model parameters. Omnibus: An omnibus benchmark combines many benchmarks, often previously published. It is intended as an all-in-one benchmarking solution. Reasoning: These tasks are usually in the question-answering format, but are intended to be more difficult than standard question answering. Multimodal: These tasks require processing not only text, but also other modalities, such as images and sound. Examples include OCR and transcription. Agency: These tasks are for a language-model–based software agent that operates a computer for a user, such as editing images, browsing the web, etc. Adversarial: A benchmark is "adversarial" if the items in the benchmark are picked specifically so that certain models do badly on them. Adversarial benchmarks are often constructed after state of the art (SOTA) models have saturated (achieved 100% performance) a benchmark, to renew the benchmark. A benchmark is "adversarial" only at a certain moment in time, since what is adversarial may cease to be adversarial as newer SOTA models appear. Public/Private: A benchmark might be partly or entirely private, meaning that some or all of the questions are not publicly available. The idea is that if a question is publicly available, then it might be used for training, which would be "training on the test set" and invalidate the result of the benchmark. Usually, only the guardians of the benchmark have access to the private subsets, and to score a model on such a benchmark, one must send the model weights, or provide API access, to the guardians. The boundary between a benchmark and a dataset is not sharp. Generally, a dataset contains three "splits": training, test, and validation. Both the test and validation splits are essentially benchmarks. In general, a benchmark is distinguished from a test/validation dataset in that a benchmark is typically intended to be used to measure the performance of many different models that are not trained specifically for doing well on the benchmark, while a test/validation set is intended to be used to measure the performance of models trained specifically on the corresponding training set. In other words, a benchmark may be thought of as a test/validation set without a corresponding training set. Conversely, certain benchmarks may be used as a training set, such as the English Gigaword or the One Billion Word Benchmark, which in modern language is just the negative log-likelihood loss on a pretraining set with 1 billion words. Indeed, the distinction between benchmark and dataset in language models became sharper after the rise of the pretraining paradigm, whereby a model is first trained on massive, unlabeled datasets to learn general language patterns, syntax, and knowledge (pretraining), and the base model is then adapted to specific, downstream tasks using smaller, labeled datasets (fine-tuning). === Lifecycle === Generally, the life cycle of a benchmark consists of the following steps: Inception: A benchmark is published. It can be simply given as a demonstration of the power of a new model (implicitly) that others then picked up as a benchmark, or as a benchmark that others are encouraged to use (explicitly). Growth: More papers and models use the benchmark, and the performance on the benchmark grows. Maturity, degeneration or deprecation: A benchmark may be saturated, after which researchers move on to other benchmarks. Progress on the benchmark may also be neglected as the field moves to focus on other benchmarks. Renewal: A saturated benchmark can be upgraded to make it no longer saturated, allowing further progress. === Construction === Like datasets, benchmarks are typically constructed by several methods, individually or in combination: Web scraping: Ready-made question-answer pairs may be scraped online, such as from websites that teach mathematics and programming. Conversion: Items may be constructed programmatically from scraped web content, such as by blanking out named entities from sentences, and asking the model to fill in the blank. This was used for making the CNN/Daily Mail Reading Comprehension Task. Crowd sourcing: Items may be constructed by paying people to write them, such as on Amazon Mechanical Turk. This was used for making the MCTest. === Evaluation === Generally, benchmarks are fully automated. This limits the questions that can be asked. For example, with mathematical questions, "proving a claim" would be difficult to automatically check, while "calculate an answer with a unique integer answer" would be automatically checkable. With programming tasks, the answer can generally be checked by running unit tests, with an upper limit on runtime. The benchmark scores are of the following kinds: For multiple choice or cloze questions, common scores are accuracy (frequency of correct answer), precision, recall, F1 score, etc. pass@n: The model is given n {\displaystyle n} attempts to solve each problem. If any attempt is correct, the model earns a point. The pass@n score is the model's average score over all problems. k@n: The model makes n {\displaystyle n} attempts to solve each problem, but only k {\displaystyle k} attempts out of them are selected for submission. If any submission is correct, the model earns a point. The k@n score is the model's average score over all problems. cons@n: The model is given n {\displaystyle n} attempts to solve each problem. If the most common answer is correct, the model earns a point. The cons@n score is the model's average score over all problems. Here "cons" stands for "consensus" or "majority voting". The pass@n score can be estimated more accurately by making N > n {\displaystyle N>n} attempts, and use the unbiased estimator 1 − ( N − c n ) ( N n ) {\displaystyle 1-{\frac {\binom {N-c}{n}}{\binom {N}{n}}}} , where c {\displaystyle c} is the number of correct attempts. For less well-formed tasks, where the output can be any sentence, there are the following commonly used scores including BLEU ROUGE, METEOR, NIST, word error rate, LEPOR, CIDEr, and SPICE. === Issues === error: Some benchmark answers may be wrong. ambiguity: Some benchmark questions may be ambiguously worded. subjective: Some benchmark questions may not have an objective answer at all. This problem generally prevents creative writing benchmarks. Similarly, this prevents benchmarking writing proofs in natural language, though benchmarking proofs in a formal language is possible. open-ended: Some benchmark questions may not have a single answer of a fixed size. This problem generally prevents programming benchmarks from using more natural tasks such as "write a program for X", and instead uses tasks such as "write a function that implements specification X". inter-annotator agreement: Some benchmark questions may be not fully objective, such that even people would not agree with 100% on what the answer should be. This is common in natural language processing tasks, such as syntactic annotation. shortcut: Some benchmark questions may be easily solved by an "unintended" shortcut. For example, in the SNLI benchmark, having a negative word like "not" in the second sentence is a strong signal for the "Contradiction" category, regardless of what the se

VibeOS

VibeOS is an operating system built from scratch entirely by generative artificial intelligence, using code produced through prompts to Claude (vibe coding). It is capable of running on QEMU and was successfully tested on a Raspberry Pi Zero. It has been released under the MIT license. == Features == === Core === Custom kernel with cooperative multitasking (preemptive backup) FAT32 filesystem with long filename support Memory allocator, process scheduler, interrupt handling GIC-400 (QEMU) and BCM2836/BCM2835 (Pi) interrupt controllers Configurable boot (splash screen, boot target) === GUI === Desktop environment with draggable windows Menu bar, dock, window minimize/maximize/close Mouse and keyboard input Modern macOS-inspired aesthetic === Networking === Full TCP/IP stack (Ethernet, ARP, IP, ICMP, UDP, TCP) DNS resolver HTTP client TLS 1.2 with HTTPS support === Apps === Web browser with HTML/CSS rendering Terminal emulator with readline-style shell Text editor (vim clone) with syntax highlighting File manager with drag-and-drop Music player (MP3/WAV) Calculator, system monitor VibeCode IDE Doom port === Development === TCC (Tiny C Compiler) - compile C programs directly on VibeOS MicroPython interpreter with full kernel API bindings 60+ userspace programs (coreutils, games, GUI apps) === Hardware === Runs on Raspberry Pi Zero 2W USB keyboard and mouse via DWC2 driver SD card via EMMC driver 1920×1080 framebuffer == Further projects == There are other independent projects under the VibeOS name, including an independent development by Ben, also developed using vibe coding, aimed at creating a Unix-like operating system for educational purposes. Another project is Vib-OS, an operating system also built using vibe coding, capable of booting on a Raspberry Pi. It offers a desktop environment with a customizable wallpaper, a file manager, and a web browser currently in an early stage of development, a functional Doom port, among other features that are not very polished given the state of development.