AI Art Krishna

AI Art Krishna — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

Fully probabilistic design

Decision making (DM) can be seen as a purposeful choice of action sequences. It also covers control, a purposeful choice of input sequences. As a rule, it runs under randomness, uncertainty and incomplete knowledge. A range of prescriptive theories have been proposed how to make optimal decisions under these conditions. They optimise sequence of decision rules, mappings of the available knowledge on possible actions. This sequence is called strategy or policy. Among various theories, Bayesian DM is broadly accepted axiomatically based theory that solves the design of optimal decision strategy. It describes random, uncertain or incompletely known quantities as random variables, i.e. by their joint probability expressing belief in their possible values. The strategy that minimises expected loss (or equivalently maximises expected reward) expressing decision-maker's goals is then taken as the optimal strategy. While the probabilistic description of beliefs is uniquely and deductively driven by rules for joint probabilities, the composition and decomposition of the loss function have no such universally applicable formal machinery. Fully probabilistic design (of decision strategies or control, FPD) removes the mentioned drawback and expresses also the DM goals of by the "ideal" probability, which assigns high (small) values to desired (undesired) behaviours of the closed DM loop formed by the influenced world part and by the used strategy. FPD has axiomatic basis and has Bayesian DM as its restricted subpart. FPD has a range of theoretical consequences , and, importantly, has been successfully used to quite diverse application domains.
Read more →
Supercomputer operating system

A supercomputer operating system is an operating system intended for supercomputers. Since the end of the 20th century, supercomputer operating systems have undergone major transformations, as fundamental changes have occurred in supercomputer architecture. While early operating systems were custom tailored to each supercomputer to gain speed, the trend has been moving away from in-house operating systems and toward some form of Linux, with it running all the supercomputers on the TOP500 list in November 2017. In 2021, top 10 computers run for instance Red Hat Enterprise Linux (RHEL), or some variant of it or other Linux distribution e.g. Ubuntu. Given that modern massively parallel supercomputers typically separate computations from other services by using multiple types of nodes, they usually run different operating systems on different nodes, e.g., using a small and efficient lightweight kernel such as Compute Node Kernel (CNK) or Compute Node Linux (CNL) on compute nodes, but a larger system such as a Linux distribution on server and input/output (I/O) nodes. While in a traditional multi-user computer system job scheduling is in effect a tasking problem for processing and peripheral resources, in a massively parallel system, the job management system needs to manage the allocation of both computational and communication resources, as well as gracefully dealing with inevitable hardware failures when tens of thousands of processors are present. Although most modern supercomputers use the Linux operating system, each manufacturer has made its own specific changes to the Linux distribution they use, and no industry standard exists, partly because the differences in hardware architectures require changes to optimize the operating system to each hardware design. == Context and overview == In the early days of supercomputing, the basic architectural concepts were evolving rapidly, and system software had to follow hardware innovations that usually took rapid turns. In the early systems, operating systems were custom tailored to each supercomputer to gain speed, yet in the rush to develop them, serious software quality challenges surfaced and in many cases the cost and complexity of system software development became as much an issue as that of hardware. In the 1980s the cost for software development at Cray came to equal what they spent on hardware and that trend was partly responsible for a move away from the in-house operating systems to the adaptation of generic software. The first wave in operating system changes came in the mid-1980s, as vendor specific operating systems were abandoned in favor of Unix. Despite early skepticism, this transition proved successful. By the early 1990s, major changes were occurring in supercomputing system software. By this time, the growing use of Unix had begun to change the way system software was viewed. The use of a high level language (C) to implement the operating system, and the reliance on standardized interfaces was in contrast to the assembly language oriented approaches of the past. As hardware vendors adapted Unix to their systems, new and useful features were added to Unix, e.g., fast file systems and tunable process schedulers. However, all the companies that adapted Unix made unique changes to it, rather than collaborating on an industry standard to create "Unix for supercomputers". This was partly because differences in their architectures required these changes to optimize Unix to each architecture. As general purpose operating systems became stable, supercomputers began to borrow and adapt critical system code from them, and relied on the rich set of secondary functions that came with them. However, at the same time the size of the code for general purpose operating systems was growing rapidly. By the time Unix-based code had reached 500,000 lines long, its maintenance and use was a challenge. This resulted in the move to use microkernels which used a minimal set of the operating system functions. Systems such as Mach at Carnegie Mellon University and ChorusOS at INRIA were examples of early microkernels. The separation of the operating system into separate components became necessary as supercomputers developed different types of nodes, e.g., compute nodes versus I/O nodes. Thus modern supercomputers usually run different operating systems on different nodes, e.g., using a small and efficient lightweight kernel such as CNK or CNL on compute nodes, but a larger system such as a Linux-derivative on server and I/O nodes. == Early systems == The CDC 6600, generally considered the first supercomputer in the world, ran the Chippewa Operating System, which was then deployed on various other CDC 6000 series computers. The Chippewa was a rather simple job control oriented system derived from the earlier CDC 3000, but it influenced the later KRONOS and SCOPE systems. The first Cray-1 was delivered to the Los Alamos Lab with no operating system, or any other software. Los Alamos developed the application software for it, and the operating system. The main timesharing system for the Cray 1, the Cray Time Sharing System (CTSS), was then developed at the Livermore Labs as a direct descendant of the Livermore Time Sharing System (LTSS) for the CDC 6600 operating system from twenty years earlier. In developing supercomputers, rising software costs soon became dominant, as evidenced by the 1980s cost for software development at Cray growing to equal their cost for hardware. That trend was partly responsible for a move away from the in-house Cray Operating System to UNICOS system based on Unix. In 1985, the Cray-2 was the first system to ship with the UNICOS operating system. Around the same time, the EOS operating system was developed by ETA Systems for use in their ETA10 supercomputers. Written in Cybil, a Pascal-like language from Control Data Corporation, EOS highlighted the stability problems in developing stable operating systems for supercomputers and eventually a Unix-like system was offered on the same machine. The lessons learned from developing ETA system software included the high level of risk associated with developing a new supercomputer operating system, and the advantages of using Unix with its large extant base of system software libraries. By the middle 1990s, despite the extant investment in older operating systems, the trend was toward the use of Unix-based systems, which also facilitated the use of interactive graphical user interfaces (GUIs) for scientific computing across multiple platforms. The move toward a commodity OS had opponents, who cited the fast pace and focus of Linux development as a major obstacle against adoption. As one author wrote "Linux will likely catch up, but we have large-scale systems now". Nevertheless, that trend continued to gain momentum and by 2005, virtually all supercomputers used some Unix-like OS. These variants of Unix included IBM AIX, the open source Linux system, and other adaptations such as UNICOS from Cray. By the end of the 20th century, Linux was estimated to command the highest share of the supercomputing pie. == Modern approaches == The IBM Blue Gene supercomputer uses the CNK operating system on the compute nodes, but uses a modified Linux-based kernel called I/O Node Kernel (INK) on the I/O nodes. CNK is a lightweight kernel that runs on each node and supports a single application running for a single user on that node. For the sake of efficient operation, the design of CNK was kept simple and minimal, with physical memory being statically mapped and the CNK neither needing nor providing scheduling or context switching. CNK does not even implement file I/O on the compute node, but delegates that to dedicated I/O nodes. However, given that on the Blue Gene multiple compute nodes share a single I/O node, the I/O node operating system does require multi-tasking, hence the selection of the Linux-based operating system. While in traditional multi-user computer systems and early supercomputers, job scheduling was in effect a task scheduling problem for processing and peripheral resources, in a massively parallel system, the job management system needs to manage the allocation of both computational and communication resources. It is essential to tune task scheduling, and the operating system, in different configurations of a supercomputer. A typical parallel job scheduler has a master scheduler which instructs some number of slave schedulers to launch, monitor, and control parallel jobs, and periodically receives reports from them about the status of job progress. Some, but not all supercomputer schedulers attempt to maintain locality of job execution. The PBS Pro scheduler used on the Cray XT3 and Cray XT4 systems does not attempt to optimize locality on its three-dimensional torus interconnect, but simply uses the first available processor. On the other hand, IBM's scheduler on the Blue Gene supercomputers aims to exploit locality a
Read more →
Key & See

Key & See is a variation of the TV Key service that forms part of the open, standards-based interactive TV services platform provided by Miniweb Interactive. Key & See allows viewers to access the interactive TV content made available by broadcasters and channel owners while leaving quarter of their screen tuned to the programme they are already watching Like TV Key, Key & See can be used with interactive TV services on UK satellite TV provider Sky Digital (BSkyB) Key & See works in the same way as a TV Key but the numeric shortcut code is associated with a broadcaster and a particular TV channel or programme. Miniweb Interactive offers commercial brands and broadcasters the chance to utilise TV Key and Key & See technology as part of its interactive TV services platform
Read more →
Content reference identifier

A content reference identifier or CRID is a concept from the standardization work done by the TV-Anytime forum. It is or closely matches the concept of the Uniform Resource Locator, or URL, as used on the World-Wide Web: A unit of content, in a broadcast stream, can be referred to by its globally unique CRID in the same way that a webpage can be referred to by its globally unique URL on the web. The concept of CRID permits referencing contents unambiguously, regardless of their location, i.e., without knowing specific broadcast information (time, date and channel) or how to obtain them through a network, for instance, by means of a streaming service or by downloading a file from an Internet server. The receiver must be capable of resolving these unambiguous references, i.e. of translating them into specific data that will allow it to obtain the location of that content in order to acquire it. This makes it possible for recording processes to take place without knowing that information, and even without knowing beforehand the duration of the content to be recorded: a complete series by a simple click, a program that has not been scheduled yet, a set of programs grouped by a specific criterion... This framework allows for the separation between the reference to a given content (the CRID) and the necessary information to acquire it, which is called a “locator”. Each CRID may lead to one or more locators which will represent different copies of the same content. They may be identical copies broadcast in different channels or dates, or cost different prices. They may also be distinct copies with different technical parameters such as format or quality. It may also be the case that the resolution process of a CRID provides another CRID as a result (for example, its reference in a different network, where it has an alternative identifier assigned by a different operator) or a set of CRIDs (for instance, if the original CRID represents a TV series, in which case the resolution process would result in the list of CRIDs representing each episode). From the above it can be concluded that provided that a given content can belong to many groups (each possibly defined by distinctive qualities), it is possible that many CRIDs carry the same content. That is, several CRIDs may be resolved into the same locator. A CRID is not exactly a universal, unique and exclusive identifier for a given content. It is closely related to the authority that creates it, to the resolution service provider, and to the content provider in such a way that the same content may have different CRIDs depending on the field in which they are used (for example, a different one for each television operator that has the rights to broadcast the content). == Format == A CRID is specified much like URLs. In fact, a CRID is a so-called URI. Typically, the content creator, the broadcaster or a third party will use their DNS-names in a combination with a product-specific name to create globally unique CRIDs. That is, the syntax of a CRID is: crid://authority/data The authority field represents the entity that created the CRID and its format is that of a DNS name. The data field represents a string of characters that will unambiguously identify the content within the authority scope (it is a string of characters assigned by the authority itself). As an example, let's assume that BBC wanted to make a CRID for (all the programs of) the Olympics in China. It may have looked something like this crid://bbc.co.uk/olympics/2008/ This would be a group CRID, that is, a CRID representing a group of contents. Then, to refer to a specific event – such as the women's shot-put final – they could have used the following inside their metadata. crid://bbc.co.uk/olympics/2008/final/shotput/women Currently, four types of CRIDs are playing a major role in some unidirectional television networks: programme CRID, series CRID, group CRID, and recommendation CRID. One of the most important applications of CRIDs is the so-called series link recording function (SL) of modern digital video recorders (DVR, PVR). In turn, a locator is a string of characters that contains all the necessary information for a receiver to find and acquire a given content, whether it is received through a transport stream, located in local storage, downloaded as a file from an Internet server, or through a streaming service. For example, a DVB locator will include all the necessary parameters to identify a specific content within a transport stream: network, transport stream, service, table and/or event identifiers. The locators' format, as established in TV-Anytime, is quite generic and simple, and corresponds to: [transport-mechanism]:[specific-data] The first part of the locator's format (the transport mechanism) must be a string of characters that is unique for each mechanism (transport stream, local file, HTTP Internet access...). The second part must be unambiguous only within the scope of a given transport mechanism and will be standardized by the organism in charge of the regulation of the mechanism itself. For instance, a DVB locator to identify a content within the transport stream of networks that follow this standard would be: dvb://112.4a2.5ec;2d22~20121212T220000Z—PT01H30M which would indicate a content (identified by the string “2d22”) that airs on a channel available on a DVB network identified by the address “112.4a2.5ec” (network “112”, transport stream “4a2” and service “5ec”), on 12 December 2012 at 10 p.m. and with a duration of 90 minutes. == The location resolution process == The location resolution process is the procedure by which, starting from the CRID of a given content, one or several locators of that content are obtained. Resolving a CRID can be a direct process, which leads immediately to one or many locators, or it may also happen that in the first place one or many intermediate CRIDs are returned, which must undergo the same procedure to finally obtain one or several locators. This procedure involves some information elements, among which we find two structures named resolving authority record (RAR) and ContentReferencingTable, respectively. Consulting them repeatedly will take the receiver from a CRID to one or many locators that will allow it to acquire the content. The RAR table The RAR table is one or many data structures that provide the receiver, for each authority that submits CRIDs, information on the corresponding resolution service provider. Among other things, it informs about which mechanism is used to provide information to resolve the CRIDs from each authority. That is, one or many RAR records must exist for each authority that indicate the receiver where it has to go to resolve the CRIDs of that particular authority. For example, in the record of the figure (expressed by means of a XML structure, according to the XML Schema defined in the TV-Anytime) there is an authority called “tve.es”, whose resolution service provider is the entity “rtve.es”, available on the URL "http://tva.rtve.es/locres/tve", which means there is resolution information in that URL. These RAR records will have reached the receiver in an indefinite form, unimportant for the TV-Anytime specification, which will depend on the specific transport mechanism of the network to which the receiver is connected. Each family of standards that regulates distribution networks (DVB, ATSC, ISDB, IPTV...) will have previously defined such procedure, which will be used by devices certified according to those standards. The ContentReferencingTable table The second structure involved in the location resolution process is a proper resolution table which, given a content's CRID, returns one or several locators that enable the receiver to access an instance of that content, or one or many CRIDs that allow it to move forward in the resolution process. The figure shows an example of this second structure, an XML document according to the specifications of the XML Schema defined in TV-Anytime. In it, several sections are included ( elements) that structure the information that describes each resolution case. The first one declares how a CRID (crid://tv.com/Friends/all), which corresponds to a group content that encompasses several episodes (two) of the “Friends” series is resolved. The result of the resolution process provides two new CRIDs each of them corresponding to one of the two episodes. The second element resolves the CRID of the first episode of the first season. The result of the resolution process is two DVB locators. The “acquire” attribute with “any” value indicates that any of them are good (the second one is a repetition broadcast a week later). The third element gives information about the second episode. It indicates that it cannot be resolved yet (“status” attribute with the “cannot yet resolve” value), indicating a date on which the request for resolution information must be repeated. The pro
Read more →
Cygwin

Cygwin ( SIG-win) is a free and open-source Unix-like environment and command-line interface (CLI) for Microsoft Windows. The project also provides a software repository containing open-source packages. Cygwin allows source code for Unix-like operating systems to be compiled and run on Windows. Cygwin provides native integration of Windows-based applications. The terminal emulator mintty is the default command-line interface provided to interact with the environment. The Cygwin installation directory layout mimics the root file system of Unix-like systems, with directories such as /bin, /home, /etc, /usr, and /var. Cygwin is released under the GNU Lesser General Public License version 3. It was originally developed by Cygnus Solutions, which was later acquired by Red Hat (now part of IBM), to port the GNU toolchain to Win32, including the GNU Compiler Suite. Rather than rewrite the tools to use the Win32 runtime environment, Cygwin implemented a POSIX-compatible environment in the form of a DLL. The brand motto is "Get that Linux feeling – on Windows", although Cygwin doesn't have Linux in it. == History == Cygwin began in 1995 as a project of Steve Chamberlain, a Cygnus engineer who observed that Windows NT and 95 used COFF as their object file format, and that GNU already included support for x86 and COFF, and the C library newlib. He thought that it would be possible to retarget GCC and produce a cross compiler generating executables that could run on Windows. A prototype was later developed. Chamberlain bootstrapped the compiler on a Windows system, to emulate Unix to let the GNU configure shell script run. Initially, Cygwin was called Cygwin32. When Microsoft registered the trademark Win32, the "32" was dropped to simply become Cygwin. In 1999, Cygnus offered Cygwin 1.0 as a commercial product. Subsequent versions have not been released, instead relying on continued open source releases. Geoffrey Noer was the project lead from 1996 to 1999. Christopher Faylor was lead from 1999 to 2004; he left Red Hat and became co-lead with Corinna Vinschen. Corinna Vinschen has been the project lead from mid-2014 to date (as of September, 2024). From June 23, 2016, the Cygwin library version 2.5.2 was licensed under the GNU Lesser General Public License (LGPL) version 3. == Description == Cygwin is provided in two versions: the full 64-bit version and a stripped-down 32-bit version, whose final version was released in 2022. Cygwin consists of a library that implements the POSIX system call API in terms of Windows system calls to enable the running of a large number of application programs equivalent to those on Unix systems, and a GNU development toolchain (including GCC and GDB). Programmers have ported the X Window System, K Desktop Environment 3, GNOME, Apache, and TeX. Cygwin permits installing inetd, syslogd, sshd, Apache, and other daemons as standard Windows services. Cygwin programs have full access to the Windows API and other Windows libraries. Cygwin programs are installed by running Cygwin's "setup" program, which downloads them from repositories on the Internet. The Cygwin API library is licensed under the GNU Lesser General Public License version 3 (or later), with an exception to allow linking to any free and open-source software whose license conforms to the Open Source Definition. Cygwin consists of two parts: A dynamic-link library in the form of a C standard library that acts as a compatibility layer for the POSIX API and A collection of software tools and applications that provide a Unix-like look and feel. Cygwin supports POSIX symbolic links, representing them as plain-text files with the system attribute set. Cygwin 1.5 represented them as Windows Explorer shortcuts, but this was changed for reasons of performance and POSIX correctness. Cygwin also recognises NTFS junction points and symbolic links and treats them as POSIX symbolic links, but it does not create them. The POSIX API for handling access control lists (ACLs) is supported. === Technical details === A Cygwin-specific version of the Unix mount command allows mounting Windows paths as "filesystems" in the Unix file space. Initial mount points can be configured in /etc/fstab, which has a format very similar to Unix systems, except that Windows paths appear in place of devices. Filesystems can be mounted in binary mode (by default), or in text mode, which enables automatic conversion between LF and CRLF endings (which only affects programs that open files without explicitly specifying text or binary mode). Cygwin 1.7 introduced comprehensive support for POSIX locales, and the UTF-8 Unicode encoding became the default. The fork system call for duplicating a process is fully implemented, but the copy-on-write optimization strategy could not be used. Cygwin's default user interface is the bash shell running in the mintty terminal emulator. The DLL also implements pseudo terminal (pty) devices, and Cygwin ships with a number of terminal emulators that are based on them, including rxvt/urxvt and xterm. The version of GCC that comes with Cygwin has various extensions for creating Windows DLLs, such as specifying whether a program is a windowing or console-mode program. Support for compiling programs that do not require the POSIX compatibility layer provided by the Cygwin DLL used to be included in the default GCC, but as of 2014, it is provided by cross-compilers contributed by the MinGW-w64 project. == Software packages == Cygwin's base package selection is approximately 100MB, containing the bash (interactive user) and dash (installation) shells and the core file and text manipulation utilities. Additional packages are available as optional installs from within the Cygwin "setup" program and package manager ("setup-x86_64.exe" – 64 bit). The Cygwin Ports project provided additional packages that were not available in the Cygwin distribution itself. Examples included GNOME, K Desktop Environment 3, MySQL database, and the PHP scripting language. Most ports have been adopted by volunteer maintainers as Cygwin packages, and Cygwin Ports are no longer maintained. Cygwin ships with GTK+ and Qt. The Cygwin/X project allows graphical Unix programs to display their user interfaces on the Windows desktop for both local and remote programs.
Read more →
Digital data

Digital data or digital information, in information theory and information systems, is data or information represented as a string of discrete symbols, each of which can take on one of only a finite number of values from some alphabet, such as letters or digits. An example is a text document, which consists of a string of alphanumeric characters. The most common form of digital data in modern information systems is binary data, which is represented by a string of binary digits (bits) each of which can have one of two values, either 0 or 1. Digital data can be contrasted with analog data, which is represented by a value from a continuous range of real numbers. Analog data is transmitted by an analog signal, which not only takes on continuous values but can vary continuously with time, a continuous real-valued function of time. An example is the air pressure variation in a sound wave. Data requires interpretation to become information. In modern (post-1960) computer systems, all data is digital. The word digital comes from the same source as the words digit and digitus (the Latin word for finger), as fingers are often used for counting. Mathematician George Stibitz of Bell Telephone Laboratories used the word digital in reference to the fast electric pulses emitted by a device designed to aim and fire anti-aircraft guns in 1942. The term is most commonly used in computing and electronics, especially where real-world information is converted to binary numeric form as in digital audio and digital photography. == Symbol to digital conversion == Since symbols (for example, alphanumeric characters) are not continuous, representing symbols digitally is rather simpler than conversion of continuous or analog information to digital. Instead of sampling and quantization as in analog-to-digital conversion, such techniques as polling and encoding are used. A symbol input device usually consists of a group of switches that are polled at regular intervals to see which switches are switched. Data will be lost if, within a single polling interval, two switches are pressed, or a switch is pressed, released, and pressed again. This polling can be done by a specialized processor in the device to prevent burdening the main CPU. When a new symbol has been entered, the device typically sends an interrupt, in a specialized format, so that the CPU can read it. For devices with only a few switches (such as the buttons on a joystick), the status of each can be encoded as bits (usually 0 for released and 1 for pressed) in a single word. This is useful when combinations of key presses are meaningful, and is sometimes used for passing the status of modifier keys on a keyboard (such as shift and control). But it does not scale to support more keys than the number of bits in a single byte or word. Devices with many switches (such as a computer keyboard) usually arrange these switches in a scan matrix, with the individual switches on the intersections of x and y lines. When a switch is pressed, it connects the corresponding x and y lines together. Polling (often called scanning in this case) is done by activating each x line in sequence and detecting which y lines then have a signal, thus which keys are pressed. When the keyboard processor detects that a key has changed state, it sends a signal to the CPU indicating the scan code of the key and its new state. The symbol is then encoded or converted into a number based on the status of modifier keys and the desired character encoding. A custom encoding can be used for a specific application with no loss of data. However, using a standard encoding such as ASCII is problematic if a symbol such as 'ß' needs to be converted but is not in the standard. It is estimated that in the year 1986, less than 1% of the world's technological capacity to store information was digital and in 2007 it was already 94%. The year 2002 is assumed to be the year when humankind was able to store more information in digital than in analog format (the "beginning of the digital age"). == States == Digital data come in these three states: data at rest, data in transit, and data in use. The confidentiality, integrity, and availability have to be managed during the entire lifecycle from 'birth' to the destruction of the data. === Data at rest === Data at rest in information technology means data that is housed physically on computer data storage in any digital form (e.g. cloud storage, file hosting services, databases, data warehouses, spreadsheets, archives, tapes, off-site or cloud backups, mobile devices etc.). Data at rest includes both structured and unstructured data. This type of data is subject to threats from hackers and other malicious threats to gain access to the data digitally or physical theft of the data storage media. To prevent this data from being accessed, modified or stolen, organizations will often employ security protection measures such as password protection, data encryption, or a combination of both. The security options used for this type of data are broadly referred to as data-at-rest protection (DARP). Definitions include: "...all data in computer storage while excluding data that is traversing a network or temporarily residing in computer memory to be read or updated." "...all data in storage but excludes any data that frequently traverses the network or that which resides in temporary memory. Data at rest includes but is not limited to archived data, data which is not accessed or changed frequently, files stored on hard drives, USB thumb drives, files stored on backup tape and disks, and also files stored off-site or on a storage area network (SAN)." While it is generally accepted that archive data (i.e. which never changes), regardless of its storage medium, is data at rest and active data subject to constant or frequent change is data in use. “Inactive data” could be taken to mean data which may change, but infrequently. The imprecise nature of terms such as “constant” and “frequent” means that some stored data cannot be comprehensively defined as either data at rest or in use. These definitions could be taken to assume that Data at Rest is a superset of data in use; however, data in use, subject to frequent change, has distinct processing requirements from data at rest, whether completely static or subject to occasional change. ==== Security ==== Because of its nature data at rest is of increasing concern to businesses, government agencies and other institutions. Mobile devices are often subject to specific security protocols to protect data at rest from unauthorized access when lost or stolen and there is an increasing recognition that database management systems and file servers should also be considered as at risk; the longer data is left unused in storage, the more likely it might be retrieved by unauthorized individuals outside the network. Data encryption, which prevents data visibility in the event of its unauthorized access or theft, is commonly used to protect data in motion and increasingly promoted for protecting data at rest. The encryption of data at rest should only include strong encryption methods such as AES or RSA. Encrypted data should remain encrypted when access controls such as usernames and password fail. Increasing encryption on multiple levels is recommended. Cryptography can be implemented on the database housing the data and on the physical storage where the databases are stored. Data encryption keys should be updated on a regular basis. Encryption keys should be stored separately from the data. Encryption also enables crypto-shredding at the end of the data or hardware lifecycle. Periodic auditing of sensitive data should be part of policy and should occur on scheduled occurrences. Finally, only store the minimum possible amount of sensitive data. Tokenization is a non-mathematical approach to protecting data at rest that replaces sensitive data with non-sensitive substitutes, referred to as tokens, which have no extrinsic or exploitable meaning or value. This process does not alter the type or length of data, which means it can be processed by legacy systems such as databases that may be sensitive to data length and type. Tokens require significantly less computational resources to process and less storage space in databases than traditionally encrypted data. This is achieved by keeping specific data fully or partially visible for processing and analytics while sensitive information is kept hidden. Lower processing and storage requirements makes tokenization an ideal method of securing data at rest in systems that manage large volumes of data. A further method of preventing unwanted access to data at rest is the use of data federation especially when data is distributed globally (e.g. in off-shore archives). An example of this would be a European organisation which stores its archived data off-site in the US. Under the terms of the USA PATRIOT Act the American authorities can demand
Read more →
Digital content

Digital content is any content that exists in the form of digital data. Digital content is stored on digital media or analog storage in specific formats. Forms of digital content include information that is digitally broadcast, streamed, or contained in computer files. Viewed narrowly, digital content includes popular media types, while a broader approach considers any type of digital information (e. g. digitally updated weather forecasts, GPS maps, and so on) as digital content. Digital content has increased as more households have accessed the Internet. Expanded access has made it easier for people to receive their news and watch TV online, challenging the popularity of traditional platforms. Increased access to the Internet has also led to the mass publication of digital content through individuals in the form of eBooks, blog posts, and even Facebook posts. == History == At the beginning of the Digital Revolution, computers facilitated the discovery, retrieval, and creation of new information in every field of human knowledge. As information became increasingly more accessible, the Digital Revolution also facilitated the creation of digital content. Despite an evolution to digital technology, which occurred somewhere between the late 1970s, distribution of digital content did not begin until the late 1990s with the rise in popularity of the Internet. In the past, digital content was primarily distributed through computers and the Internet. Methods of distribution are rapidly changing as the Digital Revolution brings new channels, such as mobile apps and eBooks. These new technologies will create challenges for content creators, as they determine the best channel to bring content to their consumers. Despite the benefits, new technologies have created new intellectual property issues. Users can easily share, modify, and redistribute content outside of the creator's control. While new technologies have made digital content available to large audiences, managing copyright and limiting content movement will continue to be an issue that digital content creators face in the future. == Types of digital content == Examples include: Video – Types of video content include home videos, music videos, TV shows, and movies. Many of these can be viewed on websites such as YouTube, Hulu, Paramount+, Disney+, HBO Max, and so on, in which people and companies alike can post content. However, many movies and television shows are not available for free legally, but rather can be purchased from sites such as iTunes and Amazon. Audio – Music is the most common form of audio. Spotify has emerged as a popular way for people to listen to music either over the Internet or from their computer desktop. Digital content in the form of music is also available through Pandora and last.fm, both of which allow listeners to listen to music online for no charge. Images – Photo and image sharing is another example of digital content. Popular sites used for this type of digital content includes Imgur, where people share self-created pictures, Flickr, where people share their photo albums, and DeviantArt, where people share their artwork. Popular apps that are used for images include Instagram and Snapchat. Visual Stories - Stories are a new type of digital content that got introduced by Snapchat. Since then, stories as a format has been introduced in a couple of other platforms such as Facebook and Linkedin. In 2018, Google introduced their AMP Stories, which provides content publishers with a mobile-focused format for delivering news and information as visually rich, tap-through stories. Text - Type of digital content which is available in text or written format. Blog websites which store data in form of textual format. === Paid digital content === In order to have access to more premium digital goods, consumers usually have to pay an upfront charge for digital content, or a subscription based fee. Video – Many licensed videos, such as movies and television shows, require money in order to be viewed or downloaded. Popular services used by many include streaming giant Netflix and Amazon's streaming service, as well as recent notice put forth by the online video platform YouTube. Audio – While songs can be streamed for free, generally in order to download most licensed music, consumers need to purchase songs from web stores, such as the popular iTunes. However, Spotify Premium is emerging as a new model for purchasing digital content on the web: consumers pay a monthly fee to unlimited streaming and downloading from Spotify's music library. According to a report done by IHS Inc. in 2013, the global consumer spending on digital content grew to over $57 billion in 2013, which was up almost 30% from $44 billion in 2012. In past years, the US has always been a leader in consumer expenditure on digital content, but as of 2013, many countries have emerged with great consumer expenditure. South Korea's overall digital spend per capita is now greater than the US. ==== Consolidation ==== According to research firm Ampere Analysis, in 2024, a small group of six media conglomerates; Disney, Comcast, Google, Warner Bros. Discovery, Netflix, and Paramount Global—are poised to dominate the global content market. These companies are projected to account for 51% of all global spending on content, a significant increase from 47% in 2020. Disney, in particular, is a major player, with an estimated $35.8 billion investment in television and film content, representing 14% of global spending. This significant increase, fueled by Disney's full ownership of Hulu, highlights the company's strategic focus on streaming services. A substantial portion of the projected $126 billion global content spending is allocated to streaming platforms. === Non-purchasable digital content === Not all digital content is purchasable, and is simply anything published digitally. This would include: News – in recent years newspapers have attempted to expand their readership by creating access to their newspapers digitally. As of 2012, 39% of readers learned about news from online formats, making news a prevalent form of digital content. Advertisements – as media consumers increasingly use digital formats to watch TV, check the weather, and search for content, advertisements have shifted to digital forms to keep up with their viewership. Advertisements are now being made digitally and placed on sites ranging from Facebook to YouTube. Question and Answer sites – these sites are a type of Internet forum where people can post questions they want answered, or provide responses to previous inquiries. With millions of questions posted each day, anyone has the ability to create content on these sites, so the information provided may not be 100% reliable or accurate. Popular sites include Yahoo! Answers, WikiAnswers and Quora. Web mapping – sites such as MapQuest and Google Maps provide users with map content. These sites give people the ability to quickly look up the location of a landmark and create routes to a destination. Online maps are a form of free content provided by companies such as Google and AOL, serving as much more efficient alternatives to the traditional Thomas Guide. == Business implications == === Digital companies === Digital content businesses can include news, information, and entertainment distributed over the Internet and consumed digitally by both consumers and businesses. Based on revenue, the leading digital businesses are ranked Google, China Mobile, Bloomberg, Reed Elsevier, and Apple. The 50 companies with the highest revenue are split between those offering free and paid digital content, but these top 50 companies combined generate revenue of $150 billion. === Educational opportunities === Programs such as CUNY's Macaulay Honors College in their New Media Lab, run by industry professional Robert Small, is set up to train and introduce students to the various disciplines within the digital content industry. The goal is to offer information and access to professional work opportunities. They also explore within an incubator how to create businesses and start ups within the world of digital content. There are many educational events in support of choosing digital content as a career. === Government support === The Irish government adopted a "Strategy for the Digital Content Industry in Ireland" in 2002.
Read more →
Online exhibition

An online exhibition, also referred to as a virtual exhibition, online gallery, cyber-exhibition, is an exhibition whose venue is cyberspace. Museums and other organizations create online exhibitions for many reasons. For example, an online exhibition may: expand on material presented at, or generate interest in, or create a durable online record of, a physical exhibition; save production costs (insurance, shipping, installation); solve conservation/preservation problems (e.g., handling of fragile or rare objects); reach lots more people: "Access to information is no longer restricted to those who can afford travel and museum visits, but is available to anyone who has access to a computer with an Internet connection. Unlike physical exhibitions, online exhibitions are not restricted by time; they are not forced to open and close but may be available 24 hours a day. In the nonprofit world, many museums, libraries, archives, universities, and other cultural organizations create online exhibitions. A database of such exhibitions is Library and Archival Exhibitions on the Web. Online exhibition organizers may use techniques such as marquee text, display advertisements, and in-event emails to engage patrons. Various guides have been published to help organizations create effective online exhibitions. The earliest museum with a physical existence to create a programme of substantial online exhibitions with high resolution images of artefacts was the Museum of the History of Science in Oxford, the first of which, The Measurers: a Flemish Image of Mathematics in the Sixteenth Century and an exhibition of early photographs, were published on 21 August 1995. == Examples of online exhibitions == International Museum of Women is an online-only museum that does not have a physical building and instead offers online exhibitions about women's issues globally as well as an online community. Online exhibitions include "Imagining Ourselves" (launched 2006) about women's identity, "Women, Power and Politics" (2008), and "Economica: Women and the Global Economy" (2009). Tucson LGBTQ Museum is an online-only museum that does not have a physical building and instead offers online exhibitions about LGBTQ history. The online photographic, audio, video, text, and other historical exhibitions include exhibits from the 1700s to the present day. The effort began in the summer of 1967 and spanned almost 50 years. International New Media Gallery (INMG) is an online museum specialising in moving image and screen-based art. The INMG is dedicated to exploring current debates and topics in art history: touching on areas such as migration, war, environmental activism and the internet itself. The gallery publishes extensive academic catalogues alongside its exhibitions. It also hosts spaces for discussion and debate, both online and offline. Virtual Museum of Modern Nigerian Art – the VMMNA is the first of its kind in Africa. Hosted by the Pan-African University, Lagos, Nigeria this virtual museum offers a good view of the development on Nigerian Art in the past fifty years.
Read more →
Sentence embedding

In natural language processing, a sentence embedding is a representation of a sentence as a vector of numbers which encodes meaningful semantic information. State of the art embeddings are based on the learned hidden layer representation of dedicated sentence transformer models. BERT pioneered an approach involving the use of a dedicated [CLS] token prepended to the beginning of each sentence inputted into the model; the final hidden state vector of this token encodes information about the sentence and can be fine-tuned for use in sentence classification tasks. In practice however, BERT's sentence embedding with the [CLS] token achieves poor performance, often worse than simply averaging non-contextual word embeddings. SBERT later achieved superior sentence embedding performance by fine tuning BERT's [CLS] token embeddings through the usage of a siamese neural network architecture on the SNLI dataset. Other approaches are loosely based on the idea of distributional semantics applied to sentences. Skip-Thought trains an encoder-decoder structure for the task of neighboring sentences predictions; this has been shown to achieve worse performance than approaches such as InferSent or SBERT. An alternative direction is to aggregate word embeddings, such as those returned by Word2vec, into sentence embeddings. The most straightforward approach is to simply compute the average of word vectors, known as continuous bag-of-words (CBOW). However, more elaborate solutions based on word vector quantization have also been proposed. One such approach is the vector of locally aggregated word embeddings (VLAWE), which demonstrated performance improvements in downstream text classification tasks. == Applications == In recent years, sentence embedding has seen a growing level of interest due to its applications in natural language queryable knowledge bases through the usage of vector indexing for semantic search. LangChain for instance utilizes sentence transformers for purposes of indexing documents. In particular, an indexing is generated by generating embeddings for chunks of documents and storing (document chunk, embedding) tuples. Then given a query in natural language, the embedding for the query can be generated. A top k similarity search algorithm is then used between the query embedding and the document chunk embeddings to retrieve the most relevant document chunks as context information for question answering tasks. This approach is also known formally as retrieval-augmented generation. Though not as predominant as BERTScore, sentence embeddings are commonly used for sentence similarity evaluation which sees common use for the task of optimizing a Large language model's generation parameters is often performed via comparing candidate sentences against reference sentences. By using the cosine-similarity of the sentence embeddings of candidate and reference sentences as the evaluation function, a grid-search algorithm can be utilized to automate hyperparameter optimization. == Evaluation == A way of testing sentence encodings is to apply them on Sentences Involving Compositional Knowledge (SICK) corpus for both entailment (SICK-E) and relatedness (SICK-R). In the best results are obtained using a BiLSTM network trained on the Stanford Natural Language Inference (SNLI) Corpus. The Pearson correlation coefficient for SICK-R is 0.885 and the result for SICK-E is 86.3. A slight improvement over previous scores is presented in: SICK-R: 0.888 and SICK-E: 87.8 using a concatenation of bidirectional Gated recurrent unit.
Read more →
Digital goods

Digital goods or e-goods are intangible goods that exist in digital form. Examples are Wikipedia articles; digital media, such as e-books, downloadable music, internet radio, internet television and streaming media; fonts, logos, photos and graphics; digital subscriptions; online ads (as purchased by the advertiser); internet coupons; electronic tickets; electronically treated documentation in many different fields; downloadable software (Digital Distribution) and mobile apps; cloud-based applications and online games; virtual goods used within the virtual economies of online games and communities; community access; workbooks; worksheets; planners; e-learning (online courses); webinars, video tutorials, blog posts; cards; patterns; website themes and templates. == Legal concerns about digital goods == Special legal concerns regarding digital goods include copyright infringement and taxation. Also the question of the ownership (versus licensed use or service only) of purely digital goods is not finally resolved. For instance, the software installers of the digital software distributor gog.com are technically independent to the account but are still subject to the EULA, where a "licensed, not sold" formulation is used. Therefore, it is not clear if the software can be legally used after a hypothetical loss of the account; a question which was also raised before in practice for the similar service Steam. In July 2012, the European Court of Justice ruled in the case UsedSoft GMbH v. Oracle International Corp. that the sale of a software product, either through a physical support or download, constituted a transfer of ownership in EU law, thus the first sale doctrine applies; the ruling thereby breaks the "licensed, not sold" legal theory, but leaves open numerous questions. Therefore, it is also permissible to resell software licenses even if the digital good has been downloaded directly from the Internet, as the first-sale doctrine applied whenever software was originally sold to a customer for an unlimited amount of time, thus prohibiting any software maker from preventing the resale of their software by any of their legitimate owners. The court requires that the previous owner must no longer be able to use the licensed software after the resale, but finds that the practical difficulties in enforcing this clause should not be an obstacle to authorizing resale, as they are also present for software which can be installed from physical supports, where the first-sale doctrine is in force. In several cases, content providers have faced criticism for revoking access to digital goods due to expired licenses or the discontinuation of a product, such as ebooks (which resulted in a lawsuit against Amazon.com, Inc.), digital video (with Sony Interactive Entertainment revoking access to purchased StudioCanal content from its now-defunct PlayStation video store; a similar move involving Warner Bros. Discovery content was averted by an updated license agreement), and video games (such as Ubisoft discontinuing and revoking access to its game The Crew without providing refunds or the ability to redownload the game) In September 2024, the U.S. state of California implemented a consumer protection law that prohibits the use of terms such as "buy" or "purchase" during transactions involving digital goods if there is no way to obtain the purchases in a manner that cannot be revoked by the seller (such as allowing it to be downloaded for permanent, offline access), and requires a disclaimer to be displayed to the customer at the time of purchase.
Read more →
The Morning After (web series)

The Morning After is a Hulu original web series that premiered on January 17, 2011, and ended April 24, 2014. It was produced by Hulu and Jace Hall's HDFilms, streaming Monday through Friday. The show originally featured Brian Kimmet and Ginger Gonzaga as hosts. Later shows used a rotation of hosts including Alison Haislip, Dave Holmes, Damien Fahey, Bradley Hasemeyer, Haley Mancini, Paul Nyhart, and Rachel Perry. The series advertises itself as "a smart, daily shot of pop culture to help Hulu users stay up to date" and typically highlights notable moments from television shows and current news in an entertaining fashion. In keeping with its focus on pop culture, The Morning After will sometimes stream an episode featuring past pop culture titled "From the Archives," such as its April Fools' Day episode. == History == While not the first original series to appear exclusively on Hulu, The Morning After is the company's first self-branded production. It was preceded by If I Can Dream, a reality series co-produced with 19 Entertainment and created by Simon Fuller. Hulu originated the idea in house, based on user feedback and observations from discussion boards hosted by the website. The concept was modeled after The Big Show with Olbermann and Patrick. The company sought out a production partner and ultimately chose Jace Hall and his team at HDFilms to executive produce. Initial stream of the series was held on January 17, 2011, and featured coverage of Piers Morgan, the Golden Globes, and The Bachelor. Senior VP of Content and Distribution Andy Forssell made the announcement for the show the same day. The show aired its last episode April 24, 2014. == Format == A typical episode usually begins with a cold open shared by the varying hosts listing the highlights to be covered. The topics focus on TV and Pop Culture Highlights from the previous night, with the intention of helping Hulu users digest hours of content in a matter of moments. The show has the hosts trade humorous remarks regarding the news and each other, taking turns reviewing the night's TV and injecting their own personality. The Morning After was named as an honoree by the Webbys on April 10, 2012, in the variety section of its online video category.
Read more →
Mortimer Rogoff

Mortimer Alan Rogoff (May 2, 1921 – August 1, 2008) was an American inventor, businessman, and author as well as an amateur photographer and radio operator. He is recognized for his work in spread spectrum technology which is the technology that modern cell phones and GPS systems are based on. He is also considered the grandfather of the electronic navigation chart. == Early life == Rogoff was born in Brooklyn, New York. He earned his B.S.E.E. from Rensselaer Polytechnic Institute in 1943 and his M.S.E.E. from Columbia University in 1948. While at Rensselaer he was a member of Kappa Nu fraternity and the Features Editor for the student newspaper. During World War II, he enlisted in the United States Navy and worked on developing radio communication and aerial navigation systems. One of the techniques he developed was undetectable by Axis forces because its power was below that of the background noise and its frequency varied in random ways. This secure transmission was the beginning of spread spectrum technology which would become the basis for GPS and CDMA cellular telephone systems. Although he was never able to patent the technology because it was a military secret he did get some recognition for it almost forty years later when he received the Institute of Electrical and Electronics Engineers’ Pioneer Award in 1981. == Career == Rogoff worked for twenty-two years (1946 to 1968) for ITT Laboratories in New Jersey. In 1958, he became their deputy director of Engineering. He was Vice President of ITT Laboratories from 1962 to 1963. From 1963 to 1968, he was promoted to the corporate staff where he became head of European operations. In 1968 he left ITT to work for the Diebold Group where he became an Executive Vice President. After leaving the Diebold Group he founded several technology and automation businesses, including his own consulting firm, and Teletext Communications Corporation. Later in the 1970s, he was a Principal with Booz Allen Hamilton. In 1979, his book ‘’Calculator Navigation’’ was published. This book demonstrated practical methods for calculating precise ship locations using radio navigation with a consumer calculator. In 1981, he founded a new company, Navigation Sciences Inc., in Bethesda, Maryland. With this company he patented a method for marine navigation that combined radar maps with electronic charts in 1986. This was a major advancement in field. Today, this system is known as the Electronic Chart Display and Information System (ECDIS). Rogoff had seen the need for a new charting system in 1968 from his apartment at 180 East End Avenue in New York City. From there, he saw a boating accident where a life was lost and decided there had to be a way to automate navigation. Rogoff then became of member of the International Maritime Organization’s (IMO) sub-committee on Safety of Navigation, a representative to the International Electrotechnical Commission, and became the chairman of the Radio Technical Commission for Maritime Services Special Committee 109 on Electronic Charts. He was able to use his influence on these boards to push through a proposal of ECDIS standards in 1989 where none has been before. As his friend Giuseppe Carnevali said, “Although nobody could argue against the need for a standard, no one was ready to endorse one; however, nobody was brave enough to oppose it.” A Test Bed project on these proposals was conducted by the United States Coast Guard. The amended standards were accepted by the IMO in November, 1995. In 2000, he was named as a Fellow of the Institute of Navigation. He was also a Fellow of the Institute of Electrical and Electronics Engineers. During this time, he was also president of the Navigational Electronic Charts System Association. == Personal == In 1979, he moved to Washington, D.C. and bought a home in Nantucket, Massachusetts. He married Sheila Zunser in 1943 and they were together for sixty-five years. They had three daughters: Louisa Thompson, Alice Rogoff, and Julia Peach. His sister was sociologist Natalie Rogoff Ramsøy of the University of Oslo. He was a member of the Cosmos Club and President of The Navigational Electronic Chart System Association (NECSA). He was a very good amateur photographer and liked amateur radio (call sign W2EE). He died in Nantucket from bladder cancer. == Patents == Patent number: 4176316 – Secure Communication System – November 27, 1979 With Louis A. DeRosa Patent number: 4590569 – Electronic Navigation System – May 20, 1986 With Peter M. Winkler and John N. Ackley Patent number: RE34004 – Secure Communication System – July 21, 1992 With Louis A. DeRosa == Publications == Rogoff, Mortimer September 1957. Automatic Analysis of Infrared Spectra. Annals of the New York Academy of Sciences; vol. 69: no. 1: 27–37. Gen. P.C. Sandretto and Mortimer Rogoff. 1958 “A Novel Concept for Application to the Control of Airways Traffic.” NAVIGATION: Journal of The Institute of Navigation; vol. 6: no. 2: 102–107 Rogoff, Mortimer 1979. Calculator Navigation; ISBN 0-393-03192-6. Published by W.W. Norton & Company (New York and London). Rogoff, Mortimer December 1985. Electronic Charting. Yachting; vol. 158: no. 6: 54–57. Rogoff, Mortimer Winter 1990. Electronic Charts in the Nineties. NAVIGATION: Journal of The Institute of Navigation; vol. 37: no. 4: 305–318.
Read more →
Pronunciation assessment

Automatic pronunciation assessment uses computer speech recognition to determine how accurately speech has been pronounced, instead of relying on a human instructor or proctor. It is also called speech verification, pronunciation evaluation, and pronunciation scoring. This technology is used to grade speech quality, for language testing, for computer-aided pronunciation teaching (CAPT) in computer-assisted language learning (CALL), for speaking skill remediation, and for accent reduction. Pronunciation assessment is different from dictation or automatic transcription, because instead of determining unknown speech, it verifies learners' pronunciation of known word(s), often from prior transcription of the same utterance; ideally scoring the intelligibility of the learners' speech. Sometimes pronunciation assessment evaluates the prosody of the learners' speech, such as intonation, pitch, tempo, rhythm, and syllable and word stress, although those are usually not essential for being understood in most languages. Pronunciation assessment is also used in reading tutoring, for example in products from Google, Microsoft, and Amira Learning. Automatic pronunciation assessment can also be used to help diagnose and treat speech disorders such as apraxia. == Intelligibility == Intelligibility refers to how well a learner's utterance is understood by a listener, rather than how much it sounds like a native speaker. This is separate from measures of fluency, such as so-called "Goodness of Pronunciation" (GoP) scores, which estimate how closely an utterance aligns with those of native speakers. Intelligibility is widely regarded as the most important communicative goal in pronunciation teaching and assessment. For example, in the Common European Framework of Reference for Languages (CEFR) assessment criteria for "overall phonological control", intelligibility outweighs formally correct pronunciation at all levels. Studies in applied linguistics have shown that accent reduction does not always increase intelligibility because listeners can often comprehend heavily accented speech without difficulty. Pronunciation assessment systems often rely on acoustic methods such as GoP which compare learner speech to reference models to produce phoneme-level scores, which are in turn aggregated to produce word and phrase scores. While these methods are effective for identifying deviations from native speakers' utterances, they do not effectively measure how understandable speech is to human listeners. Intelligibility is influenced by broader linguistic and contextual factors such as stress placement, speech rate, and coarticulation, which are not represented in purely segmental scores. The earliest work on pronunciation assessment avoided measuring genuine listener intelligibility, a shortcoming corrected in 2011 at the Toyohashi University of Technology, and included in the Versant high-stakes English fluency assessment from Pearson and mobile apps from 17zuoye Education & Technology, but still missing in 2023 products from Google Search, Microsoft, Educational Testing Service, Speechace, and ELSA. Assessing authentic listener intelligibility is essential for avoiding inaccuracies from accent bias, especially in high-stakes assessments; from words with multiple correct pronunciations; and from phoneme coding errors in machine-readable pronunciation dictionaries. In 2022, researchers found that some newer speech-to-text systems, based on end-to-end reinforcement learning to map audio signals directly into words, produce word and phrase confidence scores (from 10-25ms audio frame logit aggregation) closely correlated with genuine listener intelligibility. Others have been able to assess intelligibility using Levenshtein or dynamic time warping distance measures from Wav2Vec2 representation of good speech. Further work through 2025 has focused specifically on measuring intelligibility. A 2025 study of 42 pronunciation and speech coaching apps (32 mobile and 10 web) found that none offered intelligibility assessment. Instead, most provided only segmental and accent-focused scoring. About two-thirds of the apps provided some form of specific pronunciation feedback, usually with phonetic transcriptions, but accompanied by visual cues (such as animations of the vocal tract or the lips and tongue from the front) in only about 5% of the apps. Less than a third provided feedback on learner perception of exemplar speech. == Evaluation == Although there are as yet no industry-standard benchmarks for evaluating pronunciation assessment accuracy, researchers occasionally release evaluation speech corpuses for others to use for improving assessment quality. Such evaluation databases often emphasize formally unaccented pronunciation to the exclusion of genuine intelligibility evident from blinded listener transcriptions. As of mid-2025, state of the art approaches for automatically transcribing phonemes typically achieve an error rate of about 10% from known good speech. The International Speech Communication Association (ISCA) 2025 Workshop on Speech and Language Technology in Education (SLaTE) administered a Speak & Improve Challenge: Spoken Language Assessment and Feedback, introducing benchmarks for evaluating pronunciation assessment and remediation systems across languages, accents, and learner populations. The challenge emphasized cross-lingual generalization and alignment with human intelligibility judgments, for more robust and interpretable assessment systems. Ethical issues in pronunciation assessment are present in both human and automatic methods. Authentic validity, fairness, and mitigating bias in evaluation are all crucial. Diverse speech data should be included in automatic pronunciation assessment models. Combining human judgments, especially blinded transcriptions from a wide diversity of listeners, with automated feedback can improve accuracy and fairness. Second language learners benefit substantially from their use of widely available speech recognition systems for dictation, virtual assistants, and AI chatbots. In such systems, users naturally try to correct their own errors evident in speech recognition results that they notice. Such use improves their grammar and vocabulary development along with their pronunciation skills. The extent to which explicit pronunciation assessment and remediation approaches improve on such self-directed interactions remains an open question. Similarly, automatic dictation results have been shown to reflect intelligibility about as well as human scorers. == Recent developments == During 2021–22, a smartphone-based CAPT system was used to sense articulation through both audible and inaudible signals, providing feedback at the phoneme level. Some promising areas for improvement which were being developed in 2024 include articulatory feature extraction and transfer learning to suppress unnecessary corrections. Other interesting advances under development include "augmented reality" interfaces for mobile devices using optical character recognition to provide pronunciation training on text found in user environments. In 2024, audio multimodal large language models were first described as assessing pronunciation. That work has been carried forward by other researchers in 2025 who report positive results. Subsequently, researchers demonstrated pronunciation scoring by providing a language model with textual descriptions of speech, including the speech-to-text transcript, phoneme sequences, pauses, and phoneme sequence matching; this approach can achieve performance similar to multimodal LLMs that analyze raw audio while avoiding their higher computational cost. In 2025, the Duolingo English Test authors published a description of their pronunciation assessment method, purportedly built to measure intelligibility rather than accent imitation. While achieving a correlation of 0.82 with expert human ratings, very close to inter-rater agreement and outperforming alternative methods, the method is nonetheless based on experts' scores along the six-point CEFR common reference levels scale, instead of actual blinded listener transcriptions. Further promising work in 2025 includes assessment feedback aligning learner speech to synthetic utterances using interpretable features, identifying continuous spans of words for remediation feedback; synthesizing corrected speech matching learners' self-perceived voices, which they prefer and imitate more accurately as corrections; and streaming such interactions. On January 21, 2026, Educational Testing Service's TOEFL iBT high-stakes English language test, required by US university admissions and employers from English as a foreign language applicants more often than all other internet-based tests combined, changed its speaking assessments. While official rubrics claim that the new scoring will be based primarily on intelligibility, the new test's technical description indicates that it ju
Read more →
Web developer

A web developer is a programmer who develops World Wide Web applications using a client–server model. The applications typically use HTML, CSS, and JavaScript in the client, and any general-purpose programming language in the server. HTTP is used for communications between client and server. A web developer may specialize in client-side applications (Front-end web development), server-side applications (back-end development), or both (full-stack development). == Prerequisite == There are no formal educational or license requirements to become a web developer. However, many colleges and trade schools offer coursework in web development. There are also many tutorials and articles which teach web development, often freely available on the web - for example, on JavaScript. Even though there are no formal requirements, web development projects require web developers to have knowledge and skills such as: Using HTML, CSS, and JavaScript Programming/coding/scripting in one of the many server-side languages or frameworks Understanding server-side/client-side architecture and communication of the kind mentioned above Ability to utilize a database
Read more →
Dynamic web page

A dynamic web page is a web page constructed at runtime (during software execution), as opposed to a static web page, delivered as it is stored. A server-side dynamic web page is a web page whose construction is controlled by an application server processing server-side scripts. In server-side scripting, parameters determine how the assembly of every new web page proceeds, and including the setting up of more client-side processing. A client-side dynamic web page processes the web page using JavaScript running in the browser as it loads. JavaScript can interact with the page via Document Object Model (DOM), to query page state and modify it. Even though a web page can be dynamic on the client-side, it can still be hosted on a static hosting service such as GitHub Pages or Amazon S3 as long as there is not any server-side code included. A dynamic web page is then reloaded by the user or by a computer program to change some variable content. The updating information could come from the server, or from changes made to that page's DOM. This may or may not truncate the browsing history or create a saved version to go back to, but a dynamic web page update using AJAX technologies will neither create a page to go back to, nor truncate the web browsing history forward of the displayed page. Using AJAX, the end user gets one dynamic page managed as a single page in the web browser while the actual web content rendered on that page can vary. The AJAX engine sits only on the browser requesting parts of its DOM, the DOM, for its client, from an application server. A particular application server could offer a standardized REST style interface to offer services to the web application. DHTML is the umbrella term for technologies and methods used to create web pages that are not static web pages, though it has fallen out of common use since the popularization of AJAX, a term which is now itself rarely used. Client-side-scripting, server-side scripting, or a combination of these make for the dynamic web experience in a browser. == Basic concepts == Classical hypertext navigation, with HTML or XHTML alone, provides "static" content, meaning that the user requests a web page and simply views the page and the information on that page. However, a web page can also provide a "live", "dynamic", or "interactive" user experience. Content (text, images, form fields, etc.) on a web page can change, in response to different contexts or conditions. There are two ways to create this kind of effect: Using client-side scripting to change interface behaviors within a specific web page, in response to mouse or keyboard actions, data received from a web API, websocket or at specified timing events. In this case the dynamic behavior occurs within the presentation. Using server-side scripting to change the supplied page source code between pages, adjusting the sequence or reload of the web pages or web content supplied to the browser. Server responses may be determined by such conditions as data in a posted HTML form, parameters in the URL, the type of browser being used, the passage of time, or a database or server state. Web pages that use client-side scripting must use presentation technology broadly called rich interfaced pages. Client-side scripting languages like JavaScript or ActionScript, used for Dynamic HTML (DHTML) and Flash technologies respectively, are frequently used to orchestrate media types (sound, animations, changing text, etc.) of the presentation. The scripting also allows use of remote scripting, a technique by which the DHTML page requests additional information from a server, using a hidden Frame, XMLHttpRequests, or a web service. It is also possible to use a web framework to create a web API, which the client, via the use of JavaScript, uses to obtain data and alter its appearance or behavior dynamically depending on the data. Web pages that use server-side scripting are often created with the help of server-side languages such as PHP, Perl, ASP, JSP, ColdFusion and other languages. These server-side languages typically use the Common Gateway Interface (CGI) to produce dynamic web pages. These kinds of pages can also use, on the client-side, the first kind (DHTML, etc.). == History == It is difficult to be precise about "dynamic web page beginnings" or chronology because the precise concept makes sense only after the "widespread development of web pages". HTTP has existed since 1989, HTML, publicly standardized since 1996. The web browser's rise in popularity started with Mosaic in 1993. Between 1995 and 1996, multiple dynamic web products were introduced to the market, including Coldfusion, WebObjects, PHP, and Active Server Pages. The introduction of JavaScript (then known as LiveScript) enabled the production of client-side dynamic web pages, with JavaScript code executed in the client's browser. The letter "J" in the term AJAX originally indicated the use of JavaScript, as well as XML. With the rise of server side JavaScript processing, for example, Node.js, originally developed in 2009, JavaScript is also used to dynamically create pages on the server that are sent fully formed to clients. MediaWiki, the content management system that powers Wikipedia, is an example for an originally server-side dynamic web page, interacted with through form submissions and URL parameters. Throughout time, progressively enhancing extensions such as the visual editor have also added elements that are dynamic on the client side, while the original dynamic server-side elements such as the classic edit form remain available to be fallen back on (graceful degradation) in case of error or incompatibility. == Server-side scripting == A program running on a web server is used to generate the web content on various web pages, manage user sessions, and control workflow. Server responses may be determined by such conditions as data in a posted HTML form, parameters in the URL, the type of browser being used, the passage of time, or a database or server state. Such web pages are often created with the help of server-side languages such as ASP, ColdFusion, Java, JavaScript, Perl, PHP, Ruby, Python, and other languages, by a support server that can run on the same hardware as the web server. These server-side languages often use the Common Gateway Interface (CGI) to produce dynamic web pages. Two notable exceptions are ASP.NET, and JSP, which reuse CGI concepts in their APIs but actually dispatch all web requests into a shared virtual machine. The server-side languages are used to embed tags or markers within the source file of the web page on the web server. When a user on a client computer requests that web page, the web server interprets these tags or markers to perform actions on the server. For example, the server may be instructed to insert information from a database or information such as the current date. Dynamic web pages are often cached when there are few or no changes expected and the page is anticipated to receive considerable amount of web traffic that would wastefully strain the server and slow down page loading if it had to generate the pages on the fly for each request. == Client-side scripting == Client-side scripting is changing interface behaviors within a specific web page in response to input device actions, or at specified timing events. In this case, the dynamic behavior occurs within the presentation. The client-side content is generated on the user's local computer system. Such web pages use presentation technology called rich interfaced pages. Client-side scripting languages like JavaScript or ActionScript, used for Dynamic HTML (DHTML) and Flash technologies respectively, are frequently used to orchestrate media types (sound, animations, changing text, etc.) of the presentation. Client-side scripting also allows the use of remote scripting, a technique by which the DHTML page requests additional information from a server, using a hidden frame, XMLHttpRequests, or a Web service. The first public use of JavaScript was in 1995, when the language was implemented in Netscape Navigator 2, standardized as ECMAScript two years later. Example The client-side content is generated on the client's computer. The web browser retrieves a page from the server, then processes the code embedded in the page (typically written in JavaScript) and displays the retrieved page's content to the user. The innerHTML property (or write command) can illustrate the client-side dynamic page generation: two distinct pages, A and B, can be regenerated (by an "event response dynamic") as document.innerHTML = A and document.innerHTML = B; or "on load dynamic" by document.write(A) and document.write(B). == Combination technologies == All of the client and server components that collectively build a dynamic web page are called a web application. Web applications manage user interactions, state, security, and performance. Ajax uses a combination of both client-side script
Read more →