AI For Business Isb

AI For Business Isb — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Automated penetration testing

    Automated penetration testing

    Automated penetration testing (also known as autonomous penetration testing or automated offensive security) is the application of software-driven workflows and orchestration to simulate cyberattack techniques. These methods are used to identify, validate, and exploit security vulnerabilities in IT assets such as networks, applications, and cloud infrastructure. Automated penetration testing is the use of software to simulate cyberattacks in order to rapidly identify exploitable vulnerabilities across systems without relying solely on human testers. In technical literature, the term describes a spectrum of activities ranging from scripted exploit orchestration to experimental systems designed for fully autonomous attack planning. Automated Penetration Testing falls short of testing using manual experts in terms of discovery of deep complex vulnerabilities and contextual business logic vulnerabilities. == Terminology and scope == The label “automated penetration testing” appears frequently in vendor and practitioner writing but lacks a single, neutral, standards-based definition. In the literature the term’s scope varies: some authors use it to mean automation of specific penetration-testing tasks (scanning, exploitation attempts, evidence collection), others to describe integrated, repeatable assessment pipelines, and a smaller body of work investigates autonomous decision-making agents that select attack steps algorithmically. To avoid implying consensus, this article describes common techniques and architectures reported in the literature and industry, and it notes where claims are primarily found in practitioner publications or early-stage research. Its important to note the differences between automated penetration testing and traditional penetration testing using human skill. The most important difference is scope and speed. Automated penetration testing generally fails at discovering exposures and weakness associated with business logic due to a lack of contextual understanding. The benefit of Automated Penetration testing is speed at which it can be conducted. Traditional penetration testing also is expected to be accurate and contain no false positives. This is due to the human validation aspect of the test. Automated approaches are expected to contain mistakes and false positives which need to be validated upon completion of the test. == History == Automated offensive techniques build on decades of tools and scripting that aided vulnerability discovery and exploitation. Early vulnerability scanners and community scripting in the 1990s and 2000s created the first layers of automation. Later, modular exploitation frameworks (notably Metasploit) integrated scanning and exploitation modules and made automated proof-of-concept attacks more accessible. Over the 2010s–2020s, as cloud platforms, APIs and continuous delivery practices increased the need for frequent validation, academic and industry interest in formalizing automated approaches also grew. == Methodologies and architectures == Descriptions in the literature and technical reports cluster automated capabilities into several overlapping models: Scripted/engineered playbooks (task automation): Predefined workflows or playbooks encode common attack paths (for example, web application exploit sequences or lateral-movement chains). These playbooks are designed to reproduce known techniques in a controlled way to validate exploitability and reduce manual repetition. Exploit-oriented orchestration: Automation orchestrates exploitation modules from established frameworks to perform controlled proof-of-concept attacks that confirm exploitability rather than simply flagging potential weaknesses. This approach can reduce false positives versus passive scanning when tests are run in an appropriately controlled environment. Orchestrated multi-tool pipelines: A coordinated toolchain integrates reconnaissance, vulnerability scanning, credential testing, exploitation modules and reporting. Data and state persist across stages so that multi-step workflows (e.g., discover → escalate → pivot) can be executed repeatably, approximating manual penetration-test methodologies at larger scale. Continuous / CI-integrated testing: Automation embedded in build or deployment pipelines (CI/CD) triggers assessments automatically on new builds, configuration changes, or on a schedule, supporting frequent, repeatable validation aligned with DevOps practices. Academic theses and experimental work describe CI/CD-integrated proof-of-concept systems for web applications and internal networks. Research on autonomous planning and learning: Recent academic work explores machine learning and reinforcement-learning approaches to select or prioritise attack steps, generate attack sequences, or optimize the testing path; these approaches are largely experimental and raise distinct validation and safety questions. == Tools and vendors == Automated penetration testing is provided by a mix of open-source projects, commercial platforms, and professional services. These often follow the penetration testing as a service (PTaaS) model, which integrates automated scanning with manual validation by security analysts. Examples of widely known tools and vendors in the space include exploitation frameworks such as Metasploit, commercial automated platforms and PTaaS providers, and specialist vendors that offer breach-and-attack simulation (BAS) or continuous testing capabilities. == Applications and deployment models == In industry practice, some organizations deploy automated techniques through dedicated security validation platforms rather than bespoke toolchains. These platforms are typically used for continuous or scheduled validation in pre-production or controlled environments and are often positioned alongside, rather than in place of, human-led penetration testing. Examples discussed in secondary literature include platforms such as Pentera, which are commonly classified under breach-and-attack simulation or automated security validation rather than as standalone penetration-testing methodologies.

    Read more →
  • CSS HTML Validator

    CSS HTML Validator

    CSS HTML Validator (previously named CSE HTML Validator) is an HTML editor and CSS editor for Microsoft Windows (and Linux and other Unix-like operating systems when used with Wine) that helps web developers create syntactically correct and accessible HTML/HTML5, XHTML, and CSS documents by locating errors, potential problems like browser compatibility issues, and common mistakes. It is also able to check links, check spelling, suggest improvements, alert developers to deprecated, obsolete, or proprietary tags, attributes, and CSS properties, and find issues that can affect search engine optimization. CSS HTML Validator is developed, marketed, and sold by AI Internet Solutions LLC located in the United States. The first version of CSS HTML Validator was released in 1997 for Windows 95. The current version is 2026/v26.02 (as of January 9, 2026) and is for Windows 10 and above, including Windows 11. A native macOS and Linux command-line console tool (called htmlval) became available with version 23. There are currently three main editions of CSS HTML Validator — Pro/Professional, Home/Standard, and Lite. The Enterprise edition was discontinued in 2025/v25. While the application is generally a commercial product (except for the Lite edition), a free version of the Home edition is available for personal/educational, non-commercial use. A free limited version of the htmlval command-line console tool for macOS and Linux is also available. == Features == CSS HTML Validator includes an HTML editor, validator for HTML, XHTML, htmx, polyglot markup, CSS, PHP and JavaScript (using JSLint or JSHint), link checker (to find dead and broken links), spell checker, accessibility checker, and search engine optimization (SEO) checker. An integrated web browser allows developers to browse the web while the pages are automatically validated. Because documents are checked locally and not uploaded over the Internet to a server in order to be checked, validations are performed relatively quickly, and security and privacy are increased. A custom scripting language called TNPL, included in the Pro and Enterprise editions, can be used to customize validations by adding, eliminating, or changing validator messages. TNPL can also be used to integrate customized validation checks to meet the unique requirements of an individual or entity. A Batch Wizard tool, included in the Pro and Enterprise editions, can check entire Web sites, parts of Web sites, or a list of local web documents. The Batch Wizard generates reports in standard HTML or XML format. The reports can be viewed using a normal web browser. The accessibility checker includes support for Section 508 Amendment to the Rehabilitation Act of 1973 and Web Content Accessibility Guidelines (both WCAG 1.0 and WCAG 2.0/2.1/2.2). Using a version of HTML Tidy with HTML5 support and the Pretty Print & Fix Tool, CSS HTML Validator can automatically fix some common problems with HTML and XHTML documents. However, some problems cannot be fixed (or fixed correctly) with automated tools and require manual review and repair. == Version history == Validation of polyglot markup was added in version 12, and mobile development support (for HTML and CSS) was added in version 14 and improved in version 15. Version 15 added built-in syntax checking for JSON and HTML5 cache manifest files. Version 16 added JavaScript linting using JSHint, a static code analysis tool for checking JavaScript, but also continues to support JSLint. Version 17 added support for the Accelerated Mobile Pages Project, which is a type of HTML optimized for mobile web browsing, and support for live DOM validation using Google Chrome CSS HTML Validator 2018/v18 renames the software from CSE HTML Validator to CSS HTML Validator and includes updated HTML5 and CSS support. Version 18 also added a new "By Message" report in the Batch Wizard and dropped support for Windows Vista and below. CSS HTML Validator 2019/v19 includes updated HTML and CSS support, adds WCAG 2.1 support, improves support when running under Wine (software), and is a native 64-bit application (previously releases were 32-bit). CSS HTML Validator 2020/v20, first released in January 2020, includes HTML, CSS, accessibility, and other updates, including improved support for the Accelerated Mobile Pages Project. Also, beginning with version 20, the Standard edition was renamed to the Home edition. CSS HTML Validator 2021/v21, first released in January 2021, includes further HTML, CSS, accessibility, and other updates. CSS HTML Validator 2022/v22, released in January 2022, includes improvements and updates to keep the program up-to-date, a new Microsoft Edge WebView2 rendering engine for the integrated web browser, and three new dark themes. Later updates to version 22 added support for checking JSON Lines and NDJSON documents. CSS HTML Validator 2023/v23, released in January 2023, includes more improvements and updates to keep the program up-to-date. The new release also introduced new command-line macOS and Linux ports of the core validation engine, called htmlval for Mac and Linux. Official support for Windows 7, 8, and 8.1 was dropped in the 2023/v23 version. CSS HTML Validator 2024/v24, released in January 2024, includes updates and improvements. It also adds support for htmx. CSS HTML Validator 2025/v25, released in December 2024, includes further updates and improvements for 2025. Version 25 discontinues the Enterprise edition, moving Enterprise functionality to the Pro edition. CSS HTML Validator 2026/v26, released in January 2026, includes updated support for HTML and CSS. An online edition based on CSS HTML Validator Pro that can check documents via file upload, URL, or snippets (direct text input) was discontinued May 2017 in favor of the desktop version for Microsoft Windows. == Purpose of validation == The purpose of validation and computerized checking of HTML, XHTML, and CSS documents is to help make sure that the documents are syntactically correct and problem-free. Checked HTML, XHTML, and CSS documents are more likely to: be more accessible for people with disabilities (such as blindness), as well as all users in general render faster (user agents don't have to "figure out" and decipher bad syntax) render as intended and with fewer problems on a variety of user agents, including mobile devices cause browsers and user agents to build a more consistent Document Object Model, which is important for CSS and JavaScript be forward-compatible with future versions of user agents and browsers ("future-proof") be compatible with current and future HTML, XHTML, and CSS specifications cause fewer problems for visitors and web indexing not contain dead, broken, or rotting links While automated checking tools are helpful for website development and continued maintenance, they cannot guarantee that a document will display (render) and behave as intended in all browsers. Developers should always test documents in a variety of browsers (including mobile browsers) to locate problems that cannot be detected with a computerized checking tool. == Differences from other HTML validators == CSS HTML Validator is an offline desktop app for Microsoft Windows and a native macOS and Linux command-line console tool that does not require an Internet connection. The offline nature of CSS HTML Validator is in contrast to online web-based services. CSS HTML Validator primarily works offline (except for link checking when it must go online), which has speed and privacy benefits compared to web-based solutions and services like the W3C Markup Validation Service. However, the user must keep the software updated unlike web-based solutions which are typically kept updated by the solution provider. CSS HTML Validator checks HTML/XHTML syntax, CSS, links, spelling, accessibility, JavaScript, SEO, and PHP with one pass, while DTD-based validators are more limited and cannot check HTML5. CSS HTML Validator includes a built-in scripting language (called TNPL) which allows for a high degree of customization via scripting and "user functions". This allows developers to add custom (specialized) validation checks and messages. CSS HTML Validator includes a DTD-based validator which can optionally be used for checking DTD-based versions of HTML (versions prior to HTML5), however one of CSS HTML Validator's primary differences is that its custom validation engine can perform more checks on a document than a DTD-based validator can. This is because DTD-based validators are limited to checking only what can be specified in a Document Type Definition. == Integration == CSS HTML Validator integrates with other third-party software like those listed below. This allows validation using CSS HTML Validator from within the third-party program. EmEditor - includes a special Lite edition build of CSS HTML Validator for built-in checking of HTML and CSS Blumentals Software - several Blumentals software products integrate with CSS H

    Read more →
  • Digital Cinema Package

    Digital Cinema Package

    A Digital Cinema Package (DCP) is a collection of digital files used to store and convey digital cinema (DC) audio, image, and data streams. The term was popularized by Digital Cinema Initiatives, LLC in its original recommendation for packaging DC contents. However, the industry tends to apply the term to the structure more formally known as the composition. A DCP is a container format for compositions, a hierarchical file structure that represents a title version. The DCP may carry a partial composition (e.g. not a complete set of files), a single complete composition, or multiple and complete compositions. The composition consists of a Composition Playlist (in XML format) that defines the playback sequence of a set of Track Files. Track Files carry the essence (audio, image, subtitles), which is wrapped using Material eXchange Format (MXF). Track Files must contain only one essence type. Two track files at a minimum must be present in every composition (see SMPTE ST429-2 D-Cinema Packaging – DCP Constraints, or Cinepedia): a track file carrying picture essence, and a track file carrying audio essence. The composition, consisting of a Composition Playlist (CPL) and associated track files, are distributed as a Digital Cinema Package (DCP). A composition is a complete representation of a title version, while the DCP need not carry a full composition. However, as already noted, it is commonplace in the industry to discuss the title in terms of a DCP, as that is the deliverable to the cinema. The Picture Track File essence is compressed using JPEG 2000 and the Audio Track File carries a 24-bit linear PCM uncompressed multichannel WAV file. Encryption may optionally be applied to the essence of a track file to protect it from unauthorized use. The encryption used is AES 128-bit in CBC mode. In practice, there are two versions of composition in use. The original version is called Interop DCP. In 2009, a specification was published by SMPTE (SMPTE ST 429-2 Digital Cinema Packaging – DCP Constraints) for what is commonly referred to as SMPTE DCP. SMPTE DCP is similar but not backwards compatible with Interop DCP, resulting in an uphill effort to transition the industry from Interop DCP to SMPTE DCP. SMPTE DCP requires significant constraints to ensure success in the field, as shown by ISDCF. While legacy support for Interop DCP is necessary for commercial products, new productions are encouraged to be distributed in SMPTE DCP. == Technical specifications == The DCP root folder (in the storage medium) contains a number of files, some used to store the image and audio contents, and some other used to organize and manage the whole playlist. === Picture MXF files === Picture contents may be stored in one or more reels corresponding to one or more MXF files. Each reel contains pictures as MPEG-2 or JPEG 2000 essence, depending on the adopted codec. MPEG-2 is no longer compliant with the DCI specification. JPEG 2000 is the only accepted compression format. Supported frame rates are: SMPTE (JPEG 2000) 24, 25, 30, 48, 50, and 60 fps @ 2K 24, 25, and 30 fps @ 4K 24 and 48 fps @ 2K stereoscopic MXF Interop (JPEG 2000) – Deprecated 24 and 48 fps @ 2K (MXF Interop can be encoded at 25 frame/s but support is not guaranteed) 24 fps @ 4K 24 fps @ 2K stereoscopic MXF Interop (MPEG-2) – Deprecated 23.976 and 24 fps @ 1920 × 1080 Maximum frame sizes are 2048 × 1080 for 2K DC, and 4096 × 2160 for 4K DC. Common formats are: SMPTE (JPEG 2000) Flat (1998 × 1080 or 3996 × 2160), = 1.85:1 aspect ratio Scope (2048 × 858 or 4096 × 1716), ~2.39:1 aspect ratio HDTV (1920 × 1080 or 3840 × 2160), 16:9 aspect ratio (~1.78:1) (although not specifically defined in the DCI specification, this resolution is DCI compliant per section 8.4.3.2). Full (2048 × 1080 or 4096 × 2160) (~1.9:1 aspect ratio, official name by DCI is Full Container. Not widely accepted in cinemas.) MXF Interop (MPEG-2) – Deprecated Full Frame (1920 × 1080) 12 bits per component precision (36 bits total per pixel) XYZ' colorspace; the prime mark indicates gamma encoding (gamma=2.6) Maximum bit rate is 250 Mbit/s (1.3 MBytes per frame at 24 frame per second) === Sound MXF files === Sound contents are also stored in reels corresponding to picture reels in number and duration. In case of multilingual features, separate reels are required to convey different languages. Each file contains linear PCM essence. Sampling rate is 48,000 or 96,000 samples per second Sample precision of 24 bits Linear mapping (no companding) Up to 16 independent channels === Asset map file === List of all files included in the DCP, in XML format. === Composition playlist file === Defines the playback order during presentation. The order is saved in XML format in this file; each picture and sound reel is identified by its UUID. In the following example, a reel is composed by picture and sound: === Packing list file or package key list (PKL) === All files in the composition are hashed and their hash is stored here, in XML format. This file is generally used during ingestion in a digital cinema server to verify if data have been corrupted or tampered with in some way. For example, an MXF picture reel is identified by the following element: The hash value is the Base64 encoding of the SHA-1 checksum. It can be calculated with the command: openssl sha1 -binary "FILE_NAME" | openssl base64 === Volume index file === A single DCP may be stored in more than one medium (e.g., multiple hard disks). The XML file VOLINDEX is used to identify the volume order in the series. == 3D DCP == The DCP format is also used to store stereoscopic (3D) contents for 3D films. In this case, 48 frames exist for every second – 24 frames for the left eye, 24 frames for the right. Depending on the projection system used, the left eye and right eye pictures are either shown alternately (double or triple flash systems) at 48 fps or, on 4k systems, both left and right eye pictures are shown simultaneously, one above the other, at 24 fps. In triple flash systems, active shutter glasses are required whereas optical filtering such as circular polarisation is used in conjunction with passive glasses on polarized systems. Since the maximum bit rate is always 250 Mbit/s, this results in a net 125 Mbit/s for single frame, but the visual quality decrease is generally unnoticeable. == D-Box == D-Box codes for motion controlled seating (labelled as "Motion Data" in the DCP specification), if present, are stored as a monoaural WAV file on Sound Track channel 13. Motion Data tracks are unencrypted and not watermarked. == Creation == Most film producers and distributors rely on digital cinema encoding facilities to produce and quality control check a digital cinema package before release. Facilities follow strict guidelines set out in the DCI recommendations to ensure compatibility with all digital cinema equipment. For bigger studio release films, the facility will usually create a Digital Cinema Distribution Master (DCDM). A DCDM is the post-production step prior to a DCP. The frames are in XYZ TIFF format and both sound and picture are not yet wrapped into MXF files. A DCP can be encoded directly from a DCDM. A DCDM is useful for archiving purposes and also facilities can share them for international re-versioning purposes. They can easily be turned into alternative version DCPs for foreign territories. For smaller release films, the facility will usually skip the creation of a DCDM and instead encode directly from the Digital Source Master (DSM) the original film supplied to the encoding facility. A DSM can be supplied in a multitude of formats and color spaces. For this reason, the encoding facility needs to have extensive knowledge in color space handling including, on occasion, the use of 3D LUTs to carefully match the look of the finished DCP to a celluloid film print. This can be a highly involved process in which the DCP and the film print are "butterflied" (shown side by side) in a highly calibrated cinema. Less demanding DCPs are encoded from tape formats such as HDCAM SR. Quality control checks are always performed in calibrated cinemas and carefully checked for errors. QC checks are often attended by colorists, directors, sound mixers and other personnel to check for correct picture and sound reproduction in the finished DCP. == Accessibility == === Hearing impaired audio === A Hearing Impaired (HI) audio track is designed for people who are hearing-impaired to better hear dialog. Moviegoers can wear headphones which play this audio track synchronized with the film. Hearing Impaired audio is stored in the DCP on Sound Track channel 7. === Audio description === Audio description is narration for people who are blind or visually impaired. Audio description is stored in the DCP as "Visually Impaired-Native" (VI-N) audio on Sound Track channel 8. === Sign Language Video === A Sign Language Video track can be included in a DCP to allow for display of sign la

    Read more →
  • Bookmarklet

    Bookmarklet

    A bookmarklet is a bookmark stored in a web browser that contains JavaScript commands that add new features to the browser. They are stored as the URL of a bookmark in a web browser or as a hyperlink on a web page. Bookmarklets are usually small snippets of JavaScript executed when a user clicks on them. When clicked, bookmarklets can perform a wide variety of operations, such as running a search query from selected text or extracting data from a table. Another name for bookmarklet is favelet or favlet, derived from favorites (synonym of bookmark). == History == Steve Kangas of bookmarklets.com coined the word bookmarklet when he started to create short scripts based on a suggestion in Netscape's JavaScript guide. Before that, Tantek Çelik called these scripts favelets and used that word as early as on 6 September 2001 (personal email). Brendan Eich, who developed JavaScript at Netscape, gave this account of the origin of bookmarklets: They were a deliberate feature in this sense: I invented the javascript: URL along with JavaScript in 1995, and intended that javascript: URLs could be used as any other kind of URL, including being bookmark-able. In particular, I made it possible to generate a new document by loading, e.g. javascript:'hello, world', but also (key for bookmarklets) to run arbitrary script against the DOM of the current document, e.g. javascript:alert(document.links[0].href). The difference is that the latter kind of URL uses an expression that evaluates to the undefined type in JS. I added the void operator to JS before Netscape 2 shipped to make it easy to discard any non-undefined value in a javascript: URL. The increased implementation of Content Security Policy (CSP) in websites has caused problems with bookmarklet execution and usage (2013–2015), with some suggesting that this hails the end or death of bookmarklets. William Donnelly created a work-around solution for this problem (in the specific instance of loading, referencing and using JavaScript library code) in early 2015 using a Greasemonkey userscript (Firefox / Pale Moon browser add-on extension) and a simple bookmarklet-userscript communication protocol. It allows (library-based) bookmarklets to be executed on any and all websites, including those using CSP and having an https:// URI scheme. However, if/when browsers support disabling/disallowing inline script execution using CSP, and if/when websites begin to implement that feature, it will "break" this "fix". == Concept == Web browsers use URIs for the href attribute of the tag and for bookmarks. The URI scheme, such as http or ftp, and which generally specifies the protocol, determines the format of the rest of the string. Browsers also implement javascript: URIs that to a parser is just like any other URI. The browser recognizes the specified javascript scheme and treats the rest of the string as a JavaScript program which is then executed. The expression result, if any, is treated as the HTML source code for a new page displayed in place of the original. The executing script has access to the current page, which it may inspect and change. If the script returns an undefined type (rather than, for example, a string), the browser will not load a new page, with the result that the script simply runs against the current page content. This permits changes such as in-place font size and color changes without a page reload. An immediately invoked function that returns no value or an expression preceded by the void operator will prevent the browser from attempting to parse the result of the evaluation as a snippet of HTML markup: == Usage == Bookmarklets are saved and used as normal bookmarks. As such, they are simple "one-click" tools which add functionality to the browser. For example, they can: Modify the appearance of a web page within the browser (e.g., change font size, background color, etc.) Extract data from a web page (e.g., hyperlinks, images, text, etc.) Remove redirects from (e.g. Google) search results, to show the actual target URL Submit the current page to a blogging service such as Posterous, link-shortening service such as bit.ly, or bookmarking service such as Delicious Query a search engine or online encyclopedia with highlighted text or by a dialog box Submit the current page to a link validation service or translation service Set commonly chosen configuration options when the page itself provides no way to do this Control HTML5 audio and video playback parameters such as speed, position, toggling looping, and showing/hiding playback controls, the first of which can be adjusted beyond HTML5 players' typical range setting. Installing a bookmarklet follows the same process as adding a normal bookmark; the only difference is that in place of the URL destination field is JavaScript code preceded by javascript:. Once created, bookmarklets can be run by clicking on them.

    Read more →
  • Hyperparameter optimization

    Hyperparameter optimization

    In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process, which must be configured before the process starts. Hyperparameter optimization determines the set of hyperparameters that yields an optimal model which minimizes a predefined loss function on a given data set. The objective function takes a set of hyperparameters and returns the associated loss. Cross-validation is often used to estimate this generalization performance, and therefore choose the set of values for hyperparameters that maximize it. == Approaches == === Grid search === The traditional method for hyperparameter optimization has been grid search, or a parameter sweep, which is simply an exhaustive searching through a manually specified subset of the hyperparameter space of a learning algorithm. A grid search algorithm must be guided by some performance metric, typically measured by cross-validation on the training set or evaluation on a hold-out validation set. Since the parameter space of a machine learner may include real-valued or unbounded value spaces for certain parameters, manually set bounds and discretization may be necessary before applying grid search. For example, a typical soft-margin SVM classifier equipped with an RBF kernel has at least two hyperparameters that need to be tuned for good performance on unseen data: a regularization constant C and a kernel hyperparameter γ. Both parameters are continuous, so to perform grid search, one selects a finite set of "reasonable" values for each, say C ∈ { 10 , 100 , 1000 } {\displaystyle C\in \{10,100,1000\}} γ ∈ { 0.1 , 0.2 , 0.5 , 1.0 } {\displaystyle \gamma \in \{0.1,0.2,0.5,1.0\}} Grid search then trains an SVM with each pair (C, γ) in the Cartesian product of these two sets and evaluates their performance on a held-out validation set (or by internal cross-validation on the training set, in which case multiple SVMs are trained per pair). Finally, the grid search algorithm outputs the settings that achieved the highest score in the validation procedure. Grid search suffers from the curse of dimensionality, but is often embarrassingly parallel because the hyperparameter settings it evaluates are typically independent of each other. === Random search === Random Search replaces the exhaustive enumeration of all combinations by selecting them randomly. This can be simply applied to the discrete setting described above, but also generalizes to continuous and mixed spaces. A benefit over grid search is that random search can explore many more values than grid search could for continuous hyperparameters. It can outperform Grid search, especially when only a small number of hyperparameters affects the final performance of the machine learning algorithm. In this case, the optimization problem is said to have a low intrinsic dimensionality. Random Search is also embarrassingly parallel, and additionally allows the inclusion of prior knowledge by specifying the distribution from which to sample. Despite its simplicity, random search remains one of the important base-lines against which to compare the performance of new hyperparameter optimization methods. === Bayesian optimization === Bayesian optimization is a global optimization method for noisy black-box functions. Applied to hyperparameter optimization, Bayesian optimization builds a probabilistic model of the function mapping from hyperparameter values to the objective evaluated on a validation set. By iteratively evaluating a promising hyperparameter configuration based on the current model, and then updating it, Bayesian optimization aims to gather observations revealing as much information as possible about this function and, in particular, the location of the optimum. It tries to balance exploration (hyperparameters for which the outcome is most uncertain) and exploitation (hyperparameters expected close to the optimum). In practice, Bayesian optimization has been shown to obtain better results in fewer evaluations compared to grid search and random search, due to the ability to reason about the quality of experiments before they are run. === Gradient-based optimization === For specific learning algorithms, it is possible to compute the gradient with respect to hyperparameters and then optimize the hyperparameters using gradient descent. The first usage of these techniques was focused on neural networks. Since then, these methods have been extended to other models such as support vector machines or logistic regression. A different approach in order to obtain a gradient with respect to hyperparameters consists in differentiating the steps of an iterative optimization algorithm using automatic differentiation. A more recent work along this direction uses the implicit function theorem to calculate hypergradients and proposes a stable approximation of the inverse Hessian. The method scales to millions of hyperparameters and requires constant memory. In a different approach, a hypernetwork is trained to approximate the best response function. One of the advantages of this method is that it can handle discrete hyperparameters as well. Self-tuning networks offer a memory efficient version of this approach by choosing a compact representation for the hypernetwork. More recently, Δ-STN has improved this method further by a slight reparameterization of the hypernetwork which speeds up training. Δ-STN also yields a better approximation of the best-response Jacobian by linearizing the network in the weights, hence removing unnecessary nonlinear effects of large changes in the weights. Apart from hypernetwork approaches, gradient-based methods can be used to optimize discrete hyperparameters also by adopting a continuous relaxation of the parameters. Such methods have been extensively used for the optimization of architecture hyperparameters in neural architecture search. === Evolutionary optimization === Evolutionary optimization is a methodology for the global optimization of noisy black-box functions. In hyperparameter optimization, evolutionary optimization uses evolutionary algorithms to search the space of hyperparameters for a given algorithm. Evolutionary hyperparameter optimization follows a process inspired by the biological concept of evolution: Create an initial population of random solutions (i.e., randomly generate tuples of hyperparameters, typically 100+) Evaluate the hyperparameter tuples and acquire their fitness function (e.g., 10-fold cross-validation accuracy of the machine learning algorithm with those hyperparameters) Rank the hyperparameter tuples by their relative fitness Replace the worst-performing hyperparameter tuples with new ones generated via crossover and mutation Repeat steps 2-4 until satisfactory algorithm performance is reached or is no longer improving. Evolutionary optimization has been used in hyperparameter optimization for statistical machine learning algorithms, automated machine learning, typical neural network and deep neural network architecture search, as well as training of the weights in deep neural networks. === Population-based === Population Based Training (PBT) learns both hyperparameter values and network weights. Multiple learning processes operate independently, using different hyperparameters. As with evolutionary methods, poorly performing models are iteratively replaced with models that adopt modified hyperparameter values and weights based on the better performers. This replacement model warm starting is the primary differentiator between PBT and other evolutionary methods. PBT thus allows the hyperparameters to evolve and eliminates the need for manual hypertuning. The process makes no assumptions regarding model architecture, loss functions or training procedures. PBT and its variants are adaptive methods: they update hyperparameters during the training of the models. On the contrary, non-adaptive methods have the sub-optimal strategy to assign a constant set of hyperparameters for the whole training. === Early stopping-based === A class of early stopping-based hyperparameter optimization algorithms is purpose-built for large search spaces of continuous and discrete hyperparameters, particularly when the computational cost to evaluate the performance of a set of hyperparameters is high. Irace implements the iterated racing algorithm, that focuses the search around the most promising configurations, using statistical tests to discard the ones that perform poorly. Another early stopping hyperparameter optimization algorithm is successive halving (SHA), which begins as a random search but periodically prunes low-performing models, thereby focusing computational resources on more promising models. Asynchronous successive halving (ASHA) further improves upon SHA's resource utilization profile by removing the need to synchronously evaluate a

    Read more →
  • Motion picture film scanner

    Motion picture film scanner

    A motion picture film scanner is a device used in digital filmmaking to scan original film for storage as high-resolution digital intermediate files. A film scanner scans original film stock: negative or positive print or reversal/IP. Units may scan gauges from 8 mm to 70 mm (8 mm, Super 8, 9.5 mm, 16 mm, Super 16, 35 mm, Super 35, 65 mm and 70 mm) with very high resolution scanning at 2K, 4K, 8K, or 16K resolutions. (2K is approximately 2048×1080 pixels and 4K is approximately 4096×2160 pixels). Some models of film scanner are intermittent pull-down film scanners which scan each frame individually, locked down in a pin-registered film gate, taking roughly a second per frame. Continuous-scan film scanners, where the film frames are scanned as the film is continuously moved past the imaging pick up device, are typically evolved from earlier telecine mechanisms, and can act as such at lower resolutions. The scanner scans the film frames into a file sequence (using high-end computer data storage devices), whose single file contains a digital scan of each still frame; the preferred image file format used as output are usually Cineon, DPX or TIFF, because they can store color information as raw data, preserving the optical characteristics of the film stock. These systems take a lot of storage area network (SAN) disk space. The files can be played back one after each other on high-end workstation non-linear editing system (NLE) or a virtual telecine systems. The playback is at the normal rate of 24 frames per second (or original projection frame rate of: 25, 30 or other speeds). Each year hard disks get larger and are able to hold more hours of movies on SAN systems. The challenge is to archive this massive amount of data on to data storage devices. The scanned footage is edited and composited on work stations then mastered back on film, see film-out and digital intermediate. Scanned film frames may also be used in digital film restoration. The film may also be projected directly on a digital projector in the theater. The data film files may be converted to SDTV (NTSC or PAL) video TV systems. Film recorders are the opposite of film scanners, copying content from a computer system onto film stock, for preservation or for display using film projectors. Telecines are similar to film scanners. == Imaging device == The front end of a motion picture film scanner is similar to a telecine. The imaging system may be either a charge-coupled device (CCD), a complementary metal–oxide–semiconductor (CMOS) or photomultipliers imaging pick up. A lamp is used as the light source in a CCD imaging front end. The CCDs convert the light to the video signals. In a cathode-ray tube (CRT) imaging system the CRT (also called a Flying spot tube) is used as the light source and part of the scanning system. Photomultipliers or avalanche photodiodes are used to convert the light to electrical video signals. A prism and/or dichroic mirrors or color filters are used to separate the light into the three: red, green and blue, imaging pick up devices. == Image processing == The three color signals (RGB) are electronically processed and color graded. A 3D look up table (3D LUT) is usually applied to the RGB values before it is coded into the DPX output files. The DPX files are usually made output through a network port cable or an optical fiber port: HIPPI, Fibre Channel or newer systems like gigabit Ethernet. A computer then stores the files on to hard drives of a storage area network for later processing and use. Modern motion picture film scanners many have an option for an infrared CCD channel for dirt mapping, that can be used to automatically or in post manually remove dirt-dust spots on the film. The IR camera channel can be used with IR dirt and scratch removal system or made output on a four IR channel for downstream dirt and dirt and scratch removal systems. Popular downstream dirt and dirt and scratch removal systems are PF Clean and Digital ICE. HDR or high dynamic range is a new system, using a compositing and tone-mapping of images to extend the dynamic range beyond the native capability of the capturing device. This may be done by using a triple exposure for the film and then compositing the three back together. Compositing can be done in a workstation in none real time or in the scanner in real time. == Models == Bold indicates a currently produced model Single frame intermittent pull-down: ARRI - Arriscan Cintel - diTTo Filmlight - Northlight 1 (up to 6K, 16mm to VistaVision), Northlight 2 (up to 8K, 16mm to VistaVision) Imagica scanner, single frame intermittent scanner. Kodak - Cineon, the first system designed for DI work, included a scanner, tapes drives, workstations and a film recorder. Lasergraphics Director 13.5K, 8mm to 70mm, IMAX & VistaVision) Continuous motion scanning: Arri - ARRISCAN XT (up to 6K, S35 down to 16mm) Cintel's C-Reality/DSX and ITK - Millennium/dataMill. Under ownership of Blackmagic Design, the Cintel Scanner was released, with the current 3rd generation capable of up to 4K scans at 30 fps. DFT - Spirit Classic (up to 2K), Spirit 4k/2k/HD (up to 4K), POLAR HQ (up to 8K, 8mm to S35), OXScan 14K (up to 14K, 16mm to 70mm), Scanity HDR (up to 4K, 16mm to S35) Filmfabriek - HDS+ (up to 4k), Pictor Pro (up to 2.7K), Pictor (up to 1080p). Filmfabriek scanners can only scan 17.5mm or smaller film formats. GE4 - Golden Eye Four - Filmscanner, 38 Mega Pixel camera. LED light source and continuous film transport using Capstan. From Digital Vision. Lasergraphics ScanStation (6.5K, 8mm to 70mm, IMAX & VistaVision) Lasergraphics Archivist (up to 5K) MWA Nova Vario series with patented laser-based, sprocket and claw free transport for 16/35mm for realtime (24/25fps) scanning with sensors for either 2K+ 2236 x 1752, or 2.5K+ HDR High Dynamic Range at 2560 x 2160, direct optical and magnetic sound on and 16 and 35mm. MWA Nova Choice 2K+ patented laser-based, sprocket and claw free transport for 8/Super8, 9.5mm, 16mm realtime (24/25fps) scanning w at 2K+, 2236 x 1752 with direct optical and magnetic sound on 16mm, magnetic from main and balance stripes on 8, Super8. Faster than real time scanning at lower resolution. P+S Technik - SteadyFrame Universal Format Film Scanner Walde - FilmStar 4K UHD 2K @ 25fps, 4K UHD @ 6fps. 35mm/16mm/8mm archive quality, continuous motion capstan driven.

    Read more →
  • Digital edition

    Digital edition

    A digital edition is an online magazine or online newspaper delivered in electronic form which is formatted identically to the print version. Digital editions are often called digital facsimiles to underline the likeness to the print version. Digital editions have the benefit of reduced cost to the publisher and reader by avoiding the time and the expense to print and deliver paper edition. This format is considered more environmentally friendly due to the reduction of paper and energy use. These editions also often feature interactive elements such as hyperlinks both within the publication itself and to other internet resources, search option and bookmarking, and can also incorporate multimedia such as video or animation to enhance articles themselves or for advertisement purposes. Some delivery methods also include animation and sound effects that replicate turning of the page to further enhance the experience of their print counterparts. Magazine publishers have traditionally relied on two revenue sources: selling ads and selling magazines. Additionally some publishers are using other electronic publication methods such as RSS to reach out to readers and inform them when new digital editions are available. Current technologies are generally either reader-based, requiring a download of an application and subsequent download of each edition, or browser-based, often using Macromedia Flash, requiring no application download (such as Adobe Acrobat). Some application-based readers allow users to access editions while not connected to internet. Dedicated hardware such as the Amazon Kindle and the iPad is also available for reading digital editions of select books, popular national magazines such as Time, The Atlantic, and Forbes and popular national newspapers such as the New York Times, Wall Street Journal, and Washington Post. Archives of print newspapers, in some cases dating hundreds of years back, are being digitized and made available online. Google is indexing existing digital archives produced by the newspapers themselves or by third parties. Newspaper and magazine archival began with microform film formats solving the problem of efficiently storing and preserving. This format, however, lacked accessibility. Many libraries, especially state libraries in the United States are archiving their collections digitally and converting existing microfilm to digital format. The Library of Congress provides project planning assistance and the National Endowment for the Humanities procures funding through grants from its National Digital Newspaper Program. Digital magazines, ezines, e-editions and emags are sometimes referred to as digital editions, however some of these formats are published only in digital format unlike digital editions which replicate a printed edition as well. == Digital magazines == Digital-replica magazines number in thousands—consumer and business publications, house magazines for associations, institutions and corporations – and conversion from print to digital was still increasing as of 2009. A 2008 report funded by digital-replica technology providers and auditing agencies counted 1,786 digital-replica editions having more than 7 million circulation among business-to-business publications, of which 230 editions were audited The same report counted 1,470 digital-replica editions of consumer magazines having 5.5 million digital circulation, of which 240 editions were audited. These authors estimated that by year end of 2009 there would be 8,000 digital magazines, having a combined distribution of more than 30 million people. Surveys have shown that, while not all subscribers prefer a digital edition, some do because of the environmental benefit and also because digital magazines are searchable and may easily be passed along or linked to. One such survey funded by a digital publisher reported on inputs from more than 30,000 subscribers to business, consumer and other digital magazines. == Digital magazine business models == === Reduced printing and distribution costs === The publishers' choice to save by moving some or all subscribers from print to digital is widely accepted. Oracle magazine, which has 176,000 of its 516,000 subscribers receiving digital according to its June 2009 BPA circulation statement, is said to be the most widely circulated digital edition of a business-to-business publication. Publishers who do this need to choose whether to make some issues all-digital, move some subscribers to digital edition, add some digital-only subscribers, or send all subscribers the digital edition. === Paid subscription revenue === In 2009, a major consumer magazine, PC Magazine, went all-digital, charging an annual subscription fee for its digital-replica edition. Many consumer magazines and newspapers are already available in eReader formats that are sold through booksellers. === Sponsorship and advertising revenue === Digital editions often carry special "front cover" advertising, or advertising on the email message alerting the subscriber of the digital edition. Publishers also produce special digital-only inserts and rich-media ads or advertorials. === Designed-for-digital issues === Another approach is to fully replace printed issues with digital ones, or to use digital editions for extra issues that would otherwise have to be printed.

    Read more →
  • Content adaptation

    Content adaptation

    Content adaptation is the action of transforming content to adapt to device capabilities. Content adaptation is usually related to mobile devices, which require special handling because of their limited computational power, small screen size, and constrained keyboard functionality. Content adaptation could roughly be divided to two fields: Media content adaptation that adapts media files. Browsing content adaptation that adapts a website to mobile devices. == Browsing content adaptation == Advances in the capabilities of small, mobile devices such as mobile phones (cell phones) and Personal Digital Assistants have led to an explosion in the number of types of device that can now access the Web. Some commentators refer to the Web that can be accessed from mobile devices as the Mobile Web. The sheer number and variety of Web-enabled devices poses significant challenges for authors of websites who want to support access from mobile devices. The W3C Device Independence Working Group described many of the issues in its report Authoring Challenges for Device Independence. Content adaptation is one approach to a solution. Rather than requiring authors to create pages explicitly for each type of device that might request them, content adaptation transforms an author's materials automatically. For example, content might be converted from a device-independent markup language, such as XDIME, an implementation of the W3C's DIAL specification, into a form suitable for the device, such as XHTML Basic, C-HTML, or WML. Similarly, a suitable device-specific CSS style sheet or a set of in-line styles might be generated from abstract style definitions. Likewise, a device specific layout might be generated from abstract layout definitions. Once created, the device-specific materials form the response returned to the device from which the request was made. Another way is to use the latest trend responsive design based on CSS, covered in this article (RWD). Content adaptation requires a processor that performs the selection, modification, and generation of materials to form the device-specific result. IBM's Websphere Everyplace Mobile Portal (WEMP), BEA Systems' WebLogic Mobility Server, Morfeo's MyMobileWeb, and Apache Cocoon are examples of such processors. Wurfl and WALL are popular open source tools for content adaptation. WURFL is an XML-based Device Description Repository with APIs to access the data in Java and PHP (and other popular programming languages). WALL (Wireless Abstraction Library) lets a developer author mobile pages which look like plain HTML, but converts them to WML, C-HTML, or XHTML Mobile Profile, depending on the capabilities of the device from which the HTTP request originates. GreasySpoon lets the developer build plugins for content editing, in JavaScript, Ruby (programming language), and more, just like the Firefox application GreaseMonkey. Alembik (Media Transcoding Server) is a Java (J2EE) application providing transcoding services for variety of clients and for different media types (image, audio, video, etc.). It is fully compliant with OMA's Standard Transcoder Interface specification and is distributed under the LGPL open source license. In 2007, the first large scale carrier-grade deployments of content transformation, on existing mass-market handsets, with no software download required, were deployed by Vodafone in the UK and globally for Yahoo! oneSearch, using the Novarra Vision solution. Novarra's content adaptation solution had been used in enterprise intranet deployments as early as 2003 (at that time, the platform was named "Engines for Wireless Data"). InfoGin, the 9-year-old content-adaptation company with customers like Vodafone, Orange, Telefónica and PCCW. The patented "Web to Mobile adaptation", Mobile Matrix Transcoder, Multimedia and Documents transcoders, Video adaptation supporte. Launched in 2007, Bytemobile's Web Fidelity Service was another carrier-grade, commercial infrastructure solution, which provided wireless content adaptation to mobile subscribers on their existing mass-market handsets, with no client download required.

    Read more →
  • Brill tagger

    Brill tagger

    The Brill tagger is an inductive method for part-of-speech tagging. It was described and invented by Eric Brill in his 1993 PhD thesis. It can be summarized as an "error-driven transformation-based tagger". It is: a form of supervised learning, which aims to minimize error; and, a transformation-based process, in the sense that a tag is assigned to each word and changed using a set of predefined rules. In the transformation process, if the word is known, it first assigns the most frequent tag, or if the word is unknown, it naively assigns the tag "noun" to it. High accuracy is eventually achieved by applying these rules iteratively and changing the incorrect tags. This approach ensures that valuable information such as the morphosyntactic construction of words is employed in an automatic tagging process. == Algorithm == The algorithm starts with initialization, which is the assignment of tags based on their probability for each word (for example, "dog" is more often a noun than a verb). Then "patches" are determined via rules that correct (probable) tagging errors made in the initialization phase: Initialization: Known words (in vocabulary): assigning the most frequent tag associated to a form of the word Unknown word == Rules and processing == The input text is first tokenized, or broken into words. Typically in natural language processing, contractions such as "'s", "n't", and the like are considered separate word tokens, as are punctuation marks. A dictionary and some morphological rules then provide an initial tag for each word token. For example, a simple lookup would reveal that "dog" may be a noun or a verb (the most frequent tag is simply chosen), while an unknown word will be assigned some tag(s) based on capitalization, various prefix or suffix strings, etc. (such morphological analyses, which Brill calls Lexical Rules, may vary between implementations). After all word tokens have (provisional) tags, contextual rules apply iteratively, to correct the tags by examining small amounts of context. This is where the Brill method differs from other part of speech tagging methods such as those using Hidden Markov Models. Rules are reapplied repeatedly, until a threshold is reached, or no more rules can apply. Brill rules are of the general form: tag1 → tag2 IF Condition where the Condition tests the preceding and/or following word tokens, or their tags (the notation for such rules differs between implementations). For example, in Brill's notation: IN NN WDPREVTAG DT while would change the tag of a word from IN (preposition) to NN (common noun), if the preceding word's tag is DT (determiner) and the word itself is "while". This covers cases like "all the while" or "in a while", where "while" should be tagged as a noun rather than its more common use as a conjunction (many rules are more general). Rules should only operate if the tag being changed is also known to be permissible, for the word in question or in principle (for example, most adjectives in English can also be used as nouns). Rules of this kind can be implemented by simple Finite-state machines. See Part of speech tagging for more general information including descriptions of the Penn Treebank and other sets of tags. Typical Brill taggers use a few hundred rules, which may be developed by linguistic intuition or by machine learning on a pre-tagged corpus. == Code == Brill's code pages at Johns Hopkins University are no longer on the web. An archived version of a mirror of the Brill tagger at its latest version as it was available at Plymouth Tech can be found on Archive.org. The software uses the MIT License.

    Read more →
  • User-generated content

    User-generated content

    User-generated content (UGC), alternatively known as user-created content (UCC), is content generated by users of the Internet such as images, videos, audio, text, testimonials, software, and user interactions. Online content aggregation platforms such as social media, discussion forums and wikis by their interactive and social nature, no longer produce multimedia content but provide tools to produce, collaborate, and share a variety of content, which can affect the attitudes and behaviors of the audience in various aspects. This transforms the role of consumers from passive spectators to active participants. User-generated content is used for a wide range of applications, including problem processing, news, entertainment, customer engagement, advertising, gossip, research and more. It is an example of the democratization of content production and the flattening of traditional media hierarchies. The BBC adopted a user-generated content platform for its websites in 2005, and Time magazine named "You" as the Person of the Year in 2006, referring to the rise in the production of UGC on Web 2.0 platforms. CNN also developed a similar user-generated content platform, known as iReport. There are other examples of news channels implementing similar protocols, especially in the immediate aftermath of a catastrophe or terrorist attack. Social media users can provide key eyewitness content and information that may otherwise have been inaccessible. Since 2020, there has been an increasing number of businesses who are utilizing User Generated Content (UGC) to promote their products and services. Several factors significantly influence how UGC is received, including the quality of the content, the credibility of the creator, and viewer engagement. These elements can impact users' perceptions and trust towards the brand, as well as influence the buying intentions of potential customers. UGC has proven to be an effective method for brands to connect with consumers, drawing their attention through the sharing of experiences and information on social media platforms. Due to new media and technology affordances, such as low cost and low barriers to entry, the Internet is an easy platform to create and dispense user-generated content, allowing the dissemination of information at a rapid pace in the wake of an event. == Definition == The advent of user-generated content marked a shift among media organizations from creating online content to providing facilities for amateurs to publish their own content. User-generated content has also been characterized as citizen media as opposed to the "packaged goods media" of the past century. Citizen Media is audience-generated feedback and news coverage. People give their reviews and share stories in the form of user-generated and user-uploaded audio and user-generated video. The former is a two-way process in contrast to the one-way distribution of the latter. Conversational or two-way media is a key characteristic of so-called Web 2.0, which encourages the publishing of one's own content and commenting on other people's content. The role of the passive audience, therefore, has shifted since the birth of new media, and an ever-growing number of participatory users are taking advantage of these interactive opportunities, especially on the Internet, to create independent content. Grassroots experimentation then generated an innovation in sounds, artists, techniques, and associations with audiences, which then are being used in mainstream media. The active, participatory, and creative audience is prevailing today with relatively accessible media, tools, and applications, and its culture is in turn affecting mass media corporations and global audiences. The Organisation for Economic Co-operation and Development (OECD) has defined three core variables for UGC: Accessible Content: User-generated content (UGC) is publicly produced through platforms located on the Internet and is available to any individual browsing such a publicly accessible website or a public social media account. There are other contexts where users must remain in a community or closed group to access and publish on such platforms (for example, wikis). This is a way of differentiating that although the content is accessible to the audience, there are certain restrictions for the users who generates the content. Creative effort: Creative effort was put into creating the work or adapting existing works to construct a new one; i.e. users must add their own value to the work. UGC often also has a collaborative element to it, as is the case with websites that users can edit collaboratively. For example, merely copying a portion of a television show and posting it to an online video website (an activity frequently seen on the UGC sites) would not be considered UGC. However, uploading photographs, expressing one's thoughts in a blog post or creating a new music video could be considered UGC. Yet the minimum amount of creative effort is hard to define and depends on the context. Creation outside of professional routines and practices: User-generated content is generally created outside of professional routines and practices. It often does not have an institutional or a commercial market context. In extreme cases, UGC may be produced by non-professionals without the expectation of profit or remuneration. Motivating factors include connecting with peers, achieving a certain level of fame, notoriety, or prestige, and the desire to express oneself. == Media pluralism == According to Cisco, in 2016 an average of 96,000 petabytes was transferred monthly over the Internet, more than twice as many as in 2012. In 2016, the number of active websites surpassed 1 billion, up from approximately 700 million in 2012. Reaching 1.66 billion daily active users in Q4 2019, Facebook has emerged as the most popular social media platform globally. Other social media platforms are also dominant at the regional level such as: Twitter in Japan, Naver in the Republic of Korea, Instagram (owned by Facebook) and LinkedIn (owned by Microsoft) in Africa, VKontakte (VK) and Odnoklassniki (eng. Classmates) in Russia and other countries in Central and Eastern Europe, WeChat and QQ in China. However, a concentration phenomenon is occurring globally giving dominance to a few online platforms that become popular for some unique features they provide, most commonly for the added privacy they offer users through disappearing messages or end-to-end encryption (e.g. Jami, Signal, Snapchat, Telegram, Viber, and WhatsApp), but they have tended to occupy niches and to facilitate the exchanges of information that remain rather invisible to larger audiences. Production of freely accessible information has been increasing since 2012. In January 2017, Wikipedia had more than 43 million articles, almost twice as many as in January 2012. This corresponded to a progressive diversification of content and an increase in contributions in languages other than English. In 2017, less than 12 percent of Wikipedia content was in English, down from 18 percent in 2012. Graham, Straumann, and Hogan say that the increase in the availability and diversity of content has not radically changed the structures and processes for the production of knowledge. For example, while content on Africa has dramatically increased, a significant portion of this content has continued to be produced by contributors operating from North America and Europe, rather than from Africa itself. == History == The massive, multi-volume Oxford English Dictionary was exclusively composed of user-generated content. In 1857, Richard Chenevix Trench of the London Philological Society sought public contributions throughout the English-speaking world for the creation of the first edition of the OED. As Simon Winchester recounts: So what we're going to do, if I have your agreement that we're going to produce such a dictionary, is that we're going to send out invitations, were going to send these invitations to every library, every school, every university, every book shop that we can identify throughout the English-speaking world... everywhere where English is spoken or read with any degree of enthusiasm, people will be invited to contribute words. And the point is, the way they do it, the way they will be asked and instructed to do it, is to read voraciously and whenever they see a word, whether it's a preposition or a sesquipedalian monster, they are to... if it interests them and if where they read it, they see it in a sentence that illustrates the way that that word is used, offers the meaning of the day to that word, then they are to write it on a slip of paper... the top left-hand side you write the word, the chosen word, the catchword, which in this case is 'twilight'. Then the quotation, the quotation illustrates the meaning of the word. And underneath it, the citation, where it came from, whether it was printed or whether it was in manuscri

    Read more →
  • Media Auxiliary Memory

    Media Auxiliary Memory

    Media Auxiliary Memory or Medium Auxiliary Memory (MAM) refers to a chip embedded into a digital media device (usually a tape cartridge) that stores a small amount of data or metadata that a computer can read without having to read the actual tape. MAMs can be used by the tape driver to increase efficiency, or by custom software to store & retrieve custom data. Some examples of MAM's are Cartridge Memory (HP/Seagate/IBM LTO) and MIC (Sony AIT).

    Read more →
  • Line splice

    Line splice

    In electrical engineering and telecommunications, a line splice is a joint directly connecting lengths of electrical cables (electrical splice) or optical fibers (optical splice). The splices are often protected by sleeves. == Splicing of copper wires == The splicing of copper wires happens in the following steps: The cores are laid one above the other at the junction. The core insulation is removed. The wires are wrapped two to three times around each other (twisting). The bare veins on a length of about 3 cm "strangle" or "twist". In some cases, the strangulation is soldered. To isolate the splice, an insulating sleeve made of paper or plastic is pushed over it. The splicing of copper wires is mainly used on paper insulated wires. LSA techniques (LSA: soldering, screwing and stripping free) are used to connect copper wires, making the copper wires faster and easier to connect. LSA techniques include: Wire connection sleeves (AVH = Adernverbindungshülsen) and other crimp connectors. The two wires to be connected are inserted into the AVH without being stripped, which is then compressed with special pliers. The about 2 cm long AVH consist of contact, pressure and insulation. For wire connection strips (AVL = Adernverbindungsleisten) several pairs of wires (10 = AVL10 or 20 = AVL20) are inserted, the strip is then closed with a lid and pressed together with a hydraulic press, which ensures the connection. == Splicing of glass fibers == Fiber-optic cables are spliced using a special arc-splicer, with installation cables connected at their ends to respective "pigtails" - short individual fibers with fiber-optic connectors at one end. The splicer precisely adjusts the light-guiding cores of the two ends of the glass fibers to be spliced. The adjustment is done fully automatically in modern devices, whereas in older models this is carried out manually by means of micrometer screws and microscope. An experienced splicer can precisely position the fiber ends within a few seconds. Subsequently, the fibers are fused together (welded) with an electric arc. Since no additional material is added, such as gas welding or soldering, this is called a "fusion splice". Depending on the quality of the splicing process, attenuation values at the splice points are achieved by 0.3 dB, with good splices also below 0.02 dB. For newer generation devices, alignment is done automatically by motors. Here one differentiates core and jacket centering. At core centering (usually single-mode fibers), the fiber cores are aligned. A possible core offset with respect to the jacket is corrected. In the jacket centering (usually in multimode fibers), the fibers are adjusted to each other by means of electronic image processing in front of the splice. When working with good equipment, the damping value is according to experience at max. 0.1 dB. Measurements are made by means of special measuring devices including optical time-domain reflectometry (OTDR). A good splice should have an attenuation of less than 0.3 dB over the entire distance. Finished fiber optic splices are housed in splice boxes. One differentiates: Fusion splice Adhesive splicing Crimp splice or NENP (no-epoxy no-polish), mechanical splice

    Read more →
  • Outline of deep learning

    Outline of deep learning

    The following outline is provided as an overview of, and topical guide to, deep learning: Deep learning is a subfield of machine learning and artificial intelligence based on artificial neural networks with multiple processing layers. It emphasizes representation learning and is widely used in areas such as computer vision, natural language processing, speech recognition, recommender systems, robotics, and generative artificial intelligence. == Ways to categorize deep learning == A field of study A branch of artificial intelligence A subfield of machine learning A subfield of computer science A form of representation learning A class of methods based on artificial neural networks An approach used in computational statistics == History == === Precursors === Cybernetics Perceptron Connectionism Neocognitron Backpropagation === Milestones === LeNet Long short-term memory Deep belief network AlexNet Sequence to sequence learning Generative adversarial network Residual neural network Transformer BERT Generative pre-trained transformer Diffusion model === Related histories === History of artificial intelligence History of machine learning Timeline of machine learning == Core concepts == == Learning settings == Supervised learning Unsupervised learning Self-supervised learning Semi-supervised learning Reinforcement learning Transfer learning Multitask learning Multimodal learning Online machine learning Continual learning == Common tasks == Image classification Object detection Image segmentation Automatic speech recognition Neural machine translation Question answering Automatic summarization Text-to-image model Protein structure prediction == Architectures == === Feedforward and convolutional architectures === Feedforward neural network Multilayer perceptron Convolutional neural network Radial basis function network Residual neural network U-Net === Recurrent and sequence architectures === Recurrent neural network Long short-term memory Gated recurrent unit Sequence to sequence learning Recursive neural network === Representation-learning architectures === Autoencoder Denoising autoencoder Sparse autoencoder Variational autoencoder Restricted Boltzmann machine Deep belief network === Attention and transformer architectures === Attention (machine learning) Transformer BERT Generative pre-trained transformer Vision transformer === Generative and probabilistic architectures === Autoregressive model Diffusion model Energy-based model Generative adversarial network Mixture of experts === Graph and memory architectures === Graph neural network Graph convolutional network Siamese network Neural Turing machine Memory network Echo state network Capsule neural network == Neural network components and techniques == Artificial neuron Activation function Rectified linear unit Sigmoid function Softmax function Embedding Convolution Pooling layer Attention Batch normalization Layer normalization Residual connections == Training and optimization == Backpropagation Gradient descent Stochastic gradient descent Adam optimization Learning rate Loss function Cross-entropy Mean squared error Regularization Dropout Early stopping Batch normalization Data augmentation Transfer learning Knowledge distillation Ensemble learning Curriculum learning == Datasets and benchmarks == CIFAR-10 ImageNet MNIST database Common Objects in Context (COCO) General Language Understanding Evaluation (GLUE) benchmark LibriSpeech SQuAD == Applications == === Computer vision === Computer vision Facial recognition system Image classification Image segmentation Medical imaging Object detection Optical character recognition === Natural language processing === Automatic summarization Chatbot Information retrieval Large language model Natural language processing Neural machine translation Question answering Sentiment analysis === Speech and audio === Automatic speech recognition Music information retrieval Speaker recognition Speech synthesis === Science and medicine === Bioinformatics Computational biology Drug discovery Medical diagnosis Protein structure prediction === Robotics and control === Autonomous car Computer game bot Control theory Robotics === Recommendation, search, and forecasting === Anomaly detection Forecasting Fraud detection Recommender system Search engine === Generative artificial intelligence === Deepfake Generative artificial intelligence Large language model Speech synthesis Text-to-image model === Computer graphics and video games === Deep Learning Anti-Aliasing (DLAA) Deep Learning Super Sampling (DLSS) == Hardware == AMD Instinct AMD XDNA Application-specific integrated circuit Deep learning processor, Neural processing unit (NPU), or Neural Engine Field-programmable gate array General-purpose computing on graphics processing units (GPGPU) Graphics processing unit NVIDIA Deep Learning Accelerator (NVDLA) Tensor processing unit Vision processing unit Wafer-scale integration === Supporting software platforms === CUDA Metal ROCm == Software == === Open-source frameworks and libraries === === Neural network software === EDLUT Emergent Encog JOONE Neuroph NeuroSolutions OpenNN Peltarion Synapse SNNS === Platforms, tools, and deployment === Amazon SageMaker Google Colab Hugging Face Kaggle Kubeflow MLflow ONNX OpenVINO TensorFlow Hub == Algorithms for deep learning and neural networks == Backpropagation Conjugate gradient method Generalized Hebbian algorithm Gradient descent Levenberg–Marquardt algorithm Perceptron Quasi-Newton method Wake-sleep algorithm == Methods and related topics == === Representation and metric learning === Contrastive learning Embedding Feature learning Manifold learning Metric learning === Generative modeling === Autoregressive model Diffusion model Generative adversarial network Generative model Variational inference === Efficient and scalable deep learning === Knowledge distillation Low-rank approximation Mixture of experts Quantization Sparsity === Reliability, safety, and interpretability === Adversarial machine learning AI alignment Algorithmic bias Catastrophic forgetting Differential privacy Explainable artificial intelligence Federated learning Hallucination (artificial intelligence) == Conferences and workshops == Annual Meeting of the Association for Computational Linguistics Conference on Computer Vision and Pattern Recognition Conference on Neural Information Processing Systems International Conference on Computer Vision International Conference on Learning Representations International Conference on Machine Learning == Organizations == === Research laboratories and institutions === Allen Institute for AI Alberta Machine Intelligence Institute European Laboratory for Learning and Intelligent Systems Google DeepMind Meta AI Mila Microsoft Research Vector Institute === Companies === Anthropic Cerebras Cohere DeepSeek Mistral AI OpenAI Stability AI xAI == Publications == === Books === Deep Learning – Ian Goodfellow and Yoshua Bengio Neural Networks and Deep Learning – Michael Nielsen Perceptrons – Marvin Minsky and Seymour Papert === Journals === IEEE Transactions on Neural Networks and Learning Systems Neural Networks Neural Computation == Influential persons ==

    Read more →
  • Digital scrapbooking

    Digital scrapbooking

    Digital scrapbooking is the term for the creation of a new 2D artwork by re-combining various graphic elements. It is a form of scrapbooking that is done using a personal computer, digital or scanned photos and computer graphics software. It is a relatively new form of the traditional print scrapbooking. Recent advances in technology now enable the craft to be pursued on tablets and smart devices utilising imaging apps as well as hobby specific apps, some of which have been created specifically by brands for use with their own image products. Digital scrapbooking kits are available to purchase and download at many websites that specialize in the craft. Kits contain graphics and word-art and are usually themed and color-coordinated. They usually consist of a mix of background images and "cut out" [extracted] images containing alpha channels. Once a kit has been downloaded to the computer or device, it can then be used over and over again to make new scrapbook pages (scrapbook layouts) within the software program that one chooses to use, often in combination with the users's own family photographs, scanned keepsakes and other unique personal elements scanned on a flatbed scanner. Scanning is usually done at 300dpi, to make the resulting images suitable for print. == Licensing and Copyright == Kits are sometimes licensed differently from other forms of traditional royalty-free stock images that may be purchased per-item or in sets at online stock photography sites. Some kit packs will be wholly royalty-free, but some kit makers may restrict usage to non-commercial work only. Some may specifically forbid the use of their work in projects for commercial gain, for example greetings cards and gift tags that may be made with their kits. Licensing often varies from kit to kit, even from the same maker. Some kits include derivative works of public domain material. In contrast to stock, creators of digital scrapbooking kits often require a credit or byline to indicate that their image elements have been used in a new creation. == Uses == Some artistic individuals combine digital scrapbooking with traditional scrapbooking to create what's known as hybrid scrapbooking projects. Hybrid scrapbooking involves creating layouts on the computer using digital supplies that will then be printed and combined with traditional supplies such as buttons, ribbons and other elements. Conversely, a hybrid scrapbook project may also be created using traditional paper supplies and augmented with digital elements that have been printed and cut out specifically for use on the project. Journaling may be done within the software programs to accompany images and to create digital storybooks, or scrapbooks, which are then published in photo books via various popular print-on-demand services, printed and added to traditional scrapbooks, burned to CDs or posted on the Web. Digital Scrapbooking may also be done online by uploading photos to a specialist scrapbooking website and utilising their custom built platforms and decorative image elements to complete the projects for print to finished products, for example photo books and holiday greeting cards. == Market Size == The traditional scrapbooking market appeared to decline somewhat in the USA since 2010, probably due to the 2008 financial crisis, and the digital scrapbooking market (being potentially a much cheaper form of scrapbooking) may have increased accordingly. Both markets currently appear to have recovered lost ground and expanded since the beginning of the COVID-19 pandemic as many people sought to productively fill their time during lockdowns, quarantines and self-isolation / stay at home directions. == Digital scrapbooking software == The main software programs that are typically used are Adobe Photoshop, Adobe Photoshop Elements, paint.net (freeware), Filter Forge, Corel Paintshop Pro, and GIMP. Additionally Adobe offer the Photoshop iOS product using the same code base as the desktop version to drive the app version. == Digital scrapbooking supplies == Digital scrapbooking supplies are downloaded from the Internet and then stored on a computer or external hardrive, DVD or CD media, SD cards, or in the cloud, to be used as needed. Both paid and free digital scrapbooking supplies available from numerous designers on their blogs or in e-commerce stores either as solo designers or as part of a wide cohort of designers working cooperatively in large full service e-commerce websites. Usually designed at 300ppi image resolution, digital scrapbooking product offerings and supplies often include: Full coordinated kits containing digital background “papers”, decorative alphabets, and diverse embellishments generally containing a mixture of .JPG and .PNG files; "Quick pages", flattened files containing a completed page layout with transparent photo windows in .PNG file format; Digital templates, fully layered layouts i.e. pages that have had the composition pre-designed ready for use in an imaging program or app, fully customizable for color schemes, kit choices, photographs and other embellishments, generally supplied in either .PSD or .TIF file format; Hybrid “quick pages”, i.e. layouts that are both fully designed and fully layered for customization, generally supplied in either .PSD or .TIF file format; Adobe Photoshop actions, brushes, custom shapes, paths and styles, saved in their respective native Photoshop file formats; and Corel PaintShop Pro equivalent tools.

    Read more →
  • Distributed operating system

    Distributed operating system

    A distributed operating system is system software over a collection of independent software, networked, communicating, and physically separate computational nodes. They handle jobs which are serviced by multiple CPUs. Each individual node holds a specific software subset of the global aggregate operating system. Each subset is a composite of two distinct service provisioners. The first is a ubiquitous minimal kernel, or microkernel, that directly controls that node's hardware. Second is a higher-level collection of system management components that coordinate the node's individual and collaborative activities. These components abstract microkernel functions and support user applications. The microkernel and the management components collection work together. They support the system's goal of integrating multiple resources and processing functionality into an efficient and stable system. This seamless integration of individual nodes into a global system is referred to as transparency, or single system image; describing the illusion provided to users of the global system's appearance as a single computational entity. == Description == A distributed OS provides the essential services and functionality required of an OS but adds attributes and particular configurations to allow it to support additional requirements such as increased scale and availability. To a user, a distributed OS works in a manner similar to a single-node, monolithic operating system. That is, although it consists of multiple nodes, it appears to users and applications as a single-node. Separating minimal system-level functionality from additional user-level modular services provides a "separation of mechanism and policy". Mechanism and policy can be simply interpreted as "what something is done" versus "how something is done," respectively. This separation increases flexibility and scalability. == Overview == === The kernel === At each locale (typically a node), the kernel provides a minimally complete set of node-level utilities necessary for operating a node's underlying hardware and resources. These mechanisms include allocation, management, and disposition of a node's resources, processes, communication, and input/output management support functions. Within the kernel, the communications sub-system is of foremost importance for a distributed OS. In a distributed OS, the kernel often supports a minimal set of functions, including low-level address space management, thread management, and inter-process communication (IPC). A kernel of this design is referred to as a microkernel. Its modular nature enhances reliability and security, essential features for a distributed OS. === System management === System management components are software processes that define the node's policies. These components are the part of the OS outside the kernel. These components provide higher-level communication, process and resource management, reliability, performance and security. The components match the functions of a single-entity system, adding the transparency required in a distributed environment. The distributed nature of the OS requires additional services to support a node's responsibilities to the global system. In addition, the system management components accept the "defensive" responsibilities of reliability, availability, and persistence. These responsibilities can conflict with each other. A consistent approach, balanced perspective, and a deep understanding of the overall system can assist in identifying diminishing returns. Separation of policy and mechanism mitigates such conflicts. === Working together as an operating system === The architecture and design of a distributed operating system must realize both individual node and global system goals. Architecture and design must be approached in a manner consistent with separating policy and mechanism. In doing so, a distributed operating system attempts to provide an efficient and reliable distributed computing framework allowing for an absolute minimal user awareness of the underlying command and control efforts. The multi-level collaboration between a kernel and the system management components, and in turn between the distinct nodes in a distributed operating system is the functional challenge of the distributed operating system. This is the point in the system that must maintain a perfect harmony of purpose, and simultaneously maintain a complete disconnect of intent from implementation. This challenge is the distributed operating system's opportunity to produce the foundation and framework for a reliable, efficient, available, robust, extensible, and scalable system. However, this opportunity comes at a very high cost in complexity. === The price of complexity === In a distributed operating system, the exceptional degree of inherent complexity could easily render the entire system an anathema to any user. As such, the logical price of realizing a distributed operation system must be calculated in terms of overcoming vast amounts of complexity in many areas, and on many levels. This calculation includes the depth, breadth, and range of design investment and architectural planning required in achieving even the most modest implementation. These design and development considerations are critical and unforgiving. For instance, a deep understanding of a distributed operating system's overall architectural and design detail is required at an exceptionally early point. An exhausting array of design considerations are inherent in the development of a distributed operating system. Each of these design considerations can potentially affect many of the others to a significant degree. This leads to a massive effort in balanced approach, in terms of the individual design considerations, and many of their permutations. As an aid in this effort, most rely on documented experience and research in distributed computing power. == History == Research and experimentation efforts began in earnest in the 1970s and continued through the 1990s, with focused interest peaking in the late 1980s. A number of distributed operating systems were introduced during this period; however, very few of these implementations achieved even modest commercial success. Fundamental and pioneering implementations of primitive distributed operating system component concepts date to the early 1950s. Some of these individual steps were not focused directly on distributed computing, and at the time, many may not have realized their important impact. These pioneering efforts laid important groundwork, and inspired continued research in areas related to distributed computing. In the mid-1970s, research produced important advances in distributed computing. These breakthroughs provided a solid, stable foundation for efforts that continued through the 1990s. The accelerating proliferation of multi-processor and multi-core processor systems research led to a resurgence of the distributed OS concept. === The DYSEAC === One of the first efforts was the DYSEAC, a general-purpose synchronous computer. In one of the earliest publications of the Association for Computing Machinery, in April 1954, a researcher at the National Bureau of Standards – now the National Institute of Standards and Technology (NIST) – presented a detailed specification of the DYSEAC. The introduction focused upon the requirements of the intended applications, including flexible communications, but also mentioned other computers: Finally, the external devices could even include other full-scale computers employing the same digital language as the DYSEAC. For example, the SEAC or other computers similar to it could be harnessed to the DYSEAC and by use of coordinated programs could be made to work together in mutual cooperation on a common task… Consequently[,] the computer can be used to coordinate the diverse activities of all the external devices into an effective ensemble operation. The specification discussed the architecture of multi-computer systems, preferring peer-to-peer rather than master-slave. Each member of such an interconnected group of separate computers is free at any time to initiate and dispatch special control orders to any of its partners in the system. As a consequence, the supervisory control over the common task may initially be loosely distributed throughout the system and then temporarily concentrated in one computer, or even passed rapidly from one machine to the other as the need arises. …the various interruption facilities which have been described are based on mutual cooperation between the computer and the external devices subsidiary to it, and do not reflect merely a simple master-slave relationship. This is one of the earliest examples of a computer with distributed control. The Dept. of the Army reports certified it reliable and that it passed all acceptance tests in April 1954. It was completed and delivered on time, in May 1954. This was a "portable comput

    Read more →