AI Data Bias

AI Data Bias — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Video imprint (computer vision)

    Video imprint (computer vision)

    Proposed as an extension of image epitomes in the field of video content analysis, video imprint is obtained by recasting video contents into a fixed-sized tensor representation regardless of video resolution or duration. Specifically, statistical characteristics are retained to some degrees so that common video recognition tasks can be carried out directly on such imprints, e.g., event retrieval, temporal action localization. It is claimed that both spatio-temporal interdependences are accounted for and redundancies are mitigated during the computation of video imprints. The option of computing video imprints exploiting the epitome model has the advantage of more flexible input feature formats and more efficient training stage for video content analysis.

    Read more →
  • Transparent decryption

    Transparent decryption

    Transparent decryption is a method of decrypting data which unavoidably produces evidence that the decryption operation has taken place. The idea is to prevent the covert decryption of data. In particular, transparent decryption protocols allow a user Alice to share with Bob the right to access data, in such a way that Bob may decrypt at a time of his choosing, but only while simultaneously leaving evidence for Alice of the fact that decryption occurred. Transparent decryption supports privacy, because this evidence alerts data subjects to the fact that information about them has been decrypted and disincentivises data misuse. Recent work further formalizes transparent decryption and explores practical implementations based on cryptographic protocols and blockchain systems. == Applications == Transparent decryption has been proposed for several systems where there is a need to simultaneously achieve accountability and secrecy. For example: In lawful interception, law enforcement agencies can access private messages and emails. Transparent decryption can make such accesses accountable, giving citizens guarantees about how their private information is accessed. Data arising from vehicles and IoT devices may contain personal information about the vehicle or device owners and their activities. Nevertheless, the data is typically processed in order to provide user functionality and also to investigate and fight crime. Transparent decryption can be used to help users monitor when and how data about them is being accessed and used. == Implementation == In transparent decryption, the decryption key is distributed among a set of agents (called trustees); they use their key share only if the required transparency conditions have been satisfied. Typically, the transparency condition can be formulated as the presence of the decryption request in a distributed ledger. == Alternative solutions == Besides transparent decryption, some other techniques have been proposed for achieving law enforcement while preserving privacy. Solutions that allow competing parties to unify their data access policies. Attribute-based encryption with oblivious attribute translation (OTABE) is an extension of attribute-based encryption that allows translation between proprietary attributes belonging to different organisations, and it has been applied to the problem of law-enforcement access to phone call metadata. Solutions that rely on sophisticated cryptography, such as zero-knowledge proofs that the actions of law enforcement is consistent with judge rulings and the actions of companies, and multi-party computation to compute results.

    Read more →
  • Backup

    Backup

    In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is "backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of IT disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server. A backup system contains at least one copy of all data considered worth saving. The data storage requirements can be large. An information repository model may be used to provide structure to this storage. There are different types of data storage devices used for copying backups of data that is already in secondary storage onto archive files. There are also different ways these devices can be arranged to provide geographic dispersion, data security, and portability. Data is selected, extracted, and manipulated for storage. The process can include methods for dealing with live data, including open files, as well as compression, encryption, and de-duplication. Additional techniques apply to enterprise client-server backup. Backup schemes may include dry runs that validate the reliability of the data being backed up. There are limitations and human factors involved in any backup scheme. == Storage == A backup strategy requires an information repository, "a secondary storage space for data" that aggregates backups of data "sources". The repository could be as simple as a list of all backup media (DVDs, etc.) and the dates produced, or could include a computerized index, catalog, or relational database. === 3-2-1 Backup Rule === The backup data needs to be stored, requiring a backup rotation scheme, which is a system of backing up data to computer media that limits the number of backups of different dates retained separately, by appropriate re-use of the data storage media by overwriting of backups no longer needed. The scheme determines how and when each piece of removable storage is used for a backup operation and how long it is retained once it has backup data stored on it. The 3-2-1 rule can aid in the backup process. It states that there should be at least 3 copies of the data, stored on 2 different types of storage media, and one copy should be kept offsite, in a remote location (this can include cloud storage). 2 or more different media should be used to eliminate data loss due to similar reasons (for example, optical discs may tolerate being underwater while LTO tapes may not, and SSDs cannot fail due to head crashes or damaged spindle motors since they do not have any moving parts, unlike hard drives). An offsite copy protects against fire, theft of physical media (such as tapes or discs) and natural disasters like floods and earthquakes. Physically protected hard drives are an alternative to an offsite copy, but they have limitations like only being able to resist fire for a limited period of time, so an offsite copy still remains as the ideal choice. Because there is no perfect storage, many backup experts recommend maintaining a second copy on a local physical device, even if the data is also backed up offsite. === Backup methods === ==== Unstructured ==== An unstructured repository may simply be a stack of tapes, DVD-Rs or external HDDs with minimal information about what was backed up and when. This method is the easiest to implement, but unlikely to achieve a high level of recoverability as it lacks automation. ==== Full only/System imaging ==== A repository using this backup method contains complete source data copies taken at one or more specific points in time. Copying system images, this method is frequently used by computer technicians to record known good configurations. However, imaging is generally more useful as a way of deploying a standard configuration to many systems rather than as a tool for making ongoing backups of diverse systems. ==== Incremental ==== An incremental backup stores data changed since a reference point in time. Duplicate copies of unchanged data are not copied. Typically a full backup of all files is made once or at infrequent intervals, serving as the reference point for an incremental repository. Subsequently, a number of incremental backups are made after successive time periods. Restores begin with the last full backup and then apply the incrementals. Some backup systems can create a synthetic full backup from a series of incrementals, thus providing the equivalent of frequently doing a full backup. When done to modify a single archive file, this speeds restores of recent versions of files. ==== Near-CDP ==== Continuous Data Protection (CDP) refers to a backup that instantly saves a copy of every change made to the data. This allows restoration of data to any point in time and is the most comprehensive and advanced data protection. Near-CDP backup applications—often marketed as "CDP"—automatically take incremental backups at a specific interval, for example every 15 minutes, one hour, or 24 hours. They can therefore only allow restores to an interval boundary. Near-CDP backup applications use journaling and are typically based on periodic "snapshots", read-only copies of the data frozen at a particular point in time. Near-CDP (except for Apple Time Machine) intent-logs every change on the host system, often by saving byte or block-level differences rather than file-level differences. This backup method differs from simple disk mirroring in that it enables a roll-back of the log and thus a restoration of old images of data. Intent-logging allows precautions for the consistency of live data, protecting self-consistent files but requiring applications "be quiesced and made ready for backup." Near-CDP is more practicable for ordinary personal backup applications, as opposed to true CDP, which must be run in conjunction with a virtual machine or equivalent and is therefore generally used in enterprise client-server backups. Software may create copies of individual files such as written documents, multimedia projects, or user preferences, to prevent failed write events caused by power outages, operating system crashes, or exhausted disk space, from causing data loss. A common implementation is an appended ".bak" extension to the file name. ==== Reverse incremental ==== A Reverse incremental backup method stores a recent archive file "mirror" of the source data and a series of differences between the "mirror" in its current state and its previous states. A reverse incremental backup method starts with a non-image full backup. After the full backup is performed, the system periodically synchronizes the full backup with the live copy, while storing the data necessary to reconstruct older versions. This can either be done using hard links—as Apple Time Machine does, or using binary diffs. ==== Differential ==== A differential backup saves only the data that has changed since the last full backup. This means a maximum of two backups from the repository are used to restore the data. However, as time from the last full backup (and thus the accumulated changes in data) increases, so does the time to perform the differential backup. Restoring an entire system requires starting from the most recent full backup and then applying just the last differential backup. A differential backup copies files that have been created or changed since the last full backup, regardless of whether any other differential backups have been made since, whereas an incremental backup copies files that have been created or changed since the most recent backup of any type (full or incremental). Changes in files may be detected through a more recent date/time of last modification file attribute, and/or changes in file size. Other variations of incremental backup include multi-level incrementals and block-level incrementals that compare parts of files instead of just entire files. === Storage media === Regardless of the repository model that is used, the data has to be copied onto an archive file data storage medium. The medium used is also referred to as the type of backup destination. ==== Magnetic tape ==== Magnetic tape was for a long time the most commonly used medium for bulk data storage, backup, archiving, and interchange. It was previously a less expensive option, but this is no longer the case for smaller amounts of data. Tape is a sequential access medium, so the rate of continuously writing or reading data can be very fast. While tape media itself has a low cost per space, tape drives are typically dozens of times as expensive as hard disk drives and optical drives. Tape media are generally rotated on a schedule so at least one set is off-site in case something should happe

    Read more →
  • Business continuity and disaster recovery auditing

    Business continuity and disaster recovery auditing

    Given organizations' increasing dependency on information technology (IT) to run their operations, business continuity planning (and its subset IT service continuity planning) covers the entire organization, while disaster recovery focuses on IT. Auditing documents covering an organization's business continuity and disaster recovery (BCDR) plans provides a third-party validation to stakeholders that the documentation is complete and does not contain material misrepresentations. == Overview == Often used together, the terms business continuity (BC) and disaster recovery (DR) are very different. BC refers to the ability of a business to continue critical functions and business processes after the occurrence of a disaster, whereas DR refers specifically to the IT functions of the business, albeit a subset of BC. == Metrics == The primary objective is to protect the organization in the event that all or part of its operations and/or computer services are rendered partially or completely unusable. === DR metrics === Minimizing downtime and data loss during disaster recovery is typically measured in terms of two key concepts: Recovery time objective (RTO), time until a system is completely up and running Recovery point objective (RPO), a measure of the ability to recover files by specifying a point in time the backup copy will restore to. == The auditor's role == Role of the Internal Auditor in Auditing a Disaster Recovery Plan (DRP): 1. Governance & Oversight - Confirm roles, responsibilities, and oversight are defined, and DRP aligns with risk appetite and continuity strategy. 2. Risk Assessment & BIA - Verify risk and impact assessments identify critical systems and define RTO/RPO. 3. Plan Design & Documentation - Ensure the DRP is current, complete, and includes key recovery procedures. 4. Testing & Validation - Confirm regular DRP testing occurs and results are used to improve the plan. 5. Backup & Recovery - Assess backup frequency and recovery capabilities against RTO/RPO targets. 6. Communication & Training - Verify staff are trained and communication protocols are in place for crises. 7. Maintenance & Improvement - Ensure the DRP is regularly updated and lessons learned are integrated. == Documentation == === Disaster recovery plan === A disaster recovery plan (DRP) is a documented process or set of procedures to execute an organization's disaster recovery processes and recover and protect a business IT infrastructure in the event of a disaster. It is "a comprehensive statement of consistent actions to be taken before, during and after a disaster". The disaster could be natural, environmental or man-made. Man-made disasters could be intentional (for example, an act of a terrorist) or unintentional (that is, accidental, such as the breakage of a man-made dam or even "fat fingers" - or errant commands entered - on a computer system). ==== Types of plans ==== Although there is no one-size-fits-all plan, there are three basic strategies: prevention, including proper backups, having surge protectors and generators detection, a byproduct of routine inspections, which may discover new (potential) threats correction The latter may include securing proper insurance policies, and holding a "lessons learned" brainstorming session. ==== Best practices ==== To maximize their effectiveness, DRPs are most effective when updated frequently, and should: be an integral part of all business analysis processes, be revisited at every major corporate acquisition, at every new product launch and at every new system development milestone. be thoroughly tested, not just unpracticed bureaucratic documentation Adequate records need to be retained by the organization. The auditor examines records, billings, and contracts to verify that records are being kept. One such record is a current list of the organization's hardware and software vendors. Such list is made and periodically updated to reflect changing business practices and as part of an IT asset management system. Copies of it are stored on and off site and are made available or accessible to those who require them. An auditor tests the procedures used to meet this objective and determine their effectiveness. === Relationship to BCPs === Disaster recovery is a subset of business continuity. Where DRP encompasses the policies, tools and procedures to enable recovery of data following a catastrophic event, BCP involves keeping all aspects of a business functioning regardless of potential disruptive events. As such, a business continuity plan is a comprehensive organizational strategy that includes the DRP as well as threat prevention, detection, recovery, and resumption of operations should a data breach or other disaster event occur. Therefore, BCP consists of five component plans: Business resumption plan Occupant emergency plan Continuity of operations plan Incident management plan Disaster recovery plan The first three components (business resumption, occupant emergency, and continuity of operations plans) do not deal with the IT infrastructure. The incident management plan (IMP) does deal with the IT infrastructure, but since it establishes structure and procedures to address cyber attacks against an organization's IT systems, it generally does not represent an agent for activating the DRP; thus DRP is the only BCP component of active interest to IT. == Testing == The overall categorization of tests are functional- and discussion-based. Types of tests include: tabletop exercises, checklists, simulations, parallel processing (testing recovery site while primary site is in operation), and full interruption (fail over) tests. These apply to both BC and DR. == Benefits == Like every insurance plan, there are benefits that can be obtained from proper business continuity planning, including: Studies have shown a correlation between higher spending on auditing fees and lower rates of Incidents. Minimizing risk of delays Guaranteeing the reliability of standby systems (even automating the failure detection and recovery in certain scenarios) Providing a standard for testing the plan Minimizing decision-making during a disaster Reducing potential legal liabilities Lowering unnecessarily stressful work environment === Planning and testing methodology === According to Geoffrey H. Wold of the Disaster Recovery Journal, the entire process involved in developing a Disaster Recovery Plan consists of 10 steps: Performing a risk assessment: The planning committee prepares a risk analysis and a business impact analysis (BIA) that includes a range of possible disasters. Each functional area of the organization is analyzed to determine potential consequences. Traditionally, fire has posed the greatest threat. A thorough plan provides for "worst case" situations, such as destruction of the main building. Establishing priorities for processing and operations: Critical needs of each department are evaluated and prioritized. Written agreements for alternatives selected are prepared, with details specifying duration, termination conditions, system testing, cost, any special security procedures, procedure for the notification of system changes, hours of operation, the specific hardware and other equipment required for processing, personnel requirements, definition of the circumstances constituting an emergency, process to negotiate service extensions, guarantee of compatibility, availability, non-mainframe resource requirements, priorities, and other contractual issues. Collecting data: This includes various lists (employee backup position listing, critical telephone numbers list, master call list, master vendor list, notification checklist), inventories (communications equipment, documentation, office equipment, forms, insurance policies, workgroup and data center computer hardware, microcomputer hardware and software, office supply, off-site storage location equipment, telephones, etc.), distribution register, software and data files backup/retention schedules, temporary location specifications, any other such lists, materials, inventories, and documentation. Pre-formatted forms are often used to facilitate the data gathering process. Organizing and documenting a written plan Developing testing criteria and procedures: reasons for testing include Determining the feasibility and compatibility of backup facilities and procedures. Identifying areas in the plan that need modification. Providing training to the team managers and team members. Demonstrating the ability of the organization to recover. Providing motivation for maintaining and updating the disaster recovery plan. Testing the plan: An initial "dry run" of the plan is performed by conducting a structured walk-through test. An actual test-run must be performed. Problems are corrected. Initial testing can be plan is done in sections and after normal business hours to minimize disruptions. Subsequent tests occur during normal business hours. === Caveats/controversie

    Read more →
  • Super-resolution imaging

    Super-resolution imaging

    Super-resolution imaging (SR) is a class of techniques that improve the resolution of an imaging system. In optical SR the diffraction limit of systems is transcended, while in geometrical SR the resolution of digital imaging sensors is enhanced. In some radar and sonar imaging applications (e.g. magnetic resonance imaging (MRI), high-resolution computed tomography), subspace decomposition-based methods (e.g. MUSIC) and compressed sensing-based algorithms (e.g., SAMV) are employed to achieve SR over standard periodogram algorithm. Super-resolution imaging techniques are used in general image processing and in super-resolution microscopy. == Super-resolution principles == Several concepts are fundamental to super-resolution imaging: Diffraction limit: the capacity of an optical instrument to reproduce the details of an object in an image has limits that are imposed by laws of physics: the diffraction equations in the wave theory of light, or the uncertainty principle for photons in quantum mechanics. Information transfer can never be increased beyond this boundary, but packets outside the limits can be cleverly swapped for (or multiplexed with) some inside it. Super-resolution microscopy does not so much “break” as “circumvent” the diffraction limit. New procedures probing electro-magnetic disturbances at the molecular level (in the so-called near field) remain fully consistent with Maxwell's equations. Spatial frequency domain: A succinct expression of the diffraction limit is given in the spatial frequency domain. In Fourier optics light distributions are expressed as superpositions of a series of grating light patterns in a range of fringe widths - these widths represent the spatial frequencies. It is generally taught that diffraction theory stipulates an upper limit, the cut-off spatial-frequency, beyond which pattern elements fail to be transferred into the optical image, i.e., are not resolved. But in fact what is set by diffraction theory is the width of the passband, not a fixed upper limit. No laws of physics are broken when a spatial frequency band beyond the cut-off spatial frequency is swapped for one inside it: this has long been implemented in dark-field microscopy. Nor are information-theoretical rules broken when superimposing several bands, disentangling them in the received image needs assumptions of object invariance during multiple exposures, i.e., the substitution of one kind of uncertainty for another. Information: When the term super-resolution is used in techniques based on the inference of object details using a statistical treatment of the image within standard resolution limits (for example, averaging multiple exposures), it involves an exchange of one kind of information (extracting signal from noise) for another (the assumption that the target has remained invariant). Recent breakthroughs incorporate quantum-transformer hybrids into super-resolution, such as QUIET‑SR, a 2025 model that employs shifted quantum window attention within a transformer to enhance image detail while respecting diffraction and information-theory limits Similarly, frequency-integrated transformers (e.g., FIT) enrich super-resolution by explicitly combining spatial and frequency-domain information via FFT-based attention, improving reconstruction across scales Resolution and localization: True resolution involves the distinction of whether a target, e.g. a star or a spectral line, is single or double, ordinarily requiring separable peaks in the image. When a target is known to be single, its location can be determined with higher precision than the image width by finding the centroid (center of gravity) of its image light distribution. The word ultra-resolution had been proposed for this process but it did not catch on, and the high-precision localization procedure is typically referred to as super-resolution. == Techniques == === Optical or diffractive super-resolution === Substituting spatial-frequency bands: Though the bandwidth allowable by diffraction is fixed, it can be positioned anywhere in the spatial-frequency spectrum. Dark-field illumination in microscopy is an example. See also aperture synthesis. ==== Multiplexing spatial-frequency bands ==== An image is formed using the normal passband of the optical device. Then, some known light structure (for example, a set of light fringes) is superimposed on the target. The image now contains components resulting from the combination of the target and the superimposed light structure, e.g. moiré fringes, and carries information about target detail which simple unstructured illumination does not. The “superresolved” components, however, need disentangling to be revealed. For an example, see structured illumination (figure to left). ==== Multiple parameter use within traditional diffraction limit ==== If a target has no special polarization or wavelength properties, two polarization states or non-overlapping wavelength regions can be used to encode target details, one in a spatial-frequency band inside the cut-off limit the other beyond it. Both would use normal passband transmission but are then separately decoded to reconstitute target structure with extended resolution. ==== Probing near-field electromagnetic disturbance ==== Super-resolution microscopy is generally discussed within the realm of conventional optical imagery. However, modern technology allows the probing of electromagnetic disturbance within molecular distances of the source, which has superior resolution properties. See also evanescent waves and the development of the new super lens. === Geometrical or image-processing super-resolution === ==== Multi-exposure image noise reduction ==== When an image is degraded by noise, the resolution may be improved by averaging multiple exposures. See example on the right. ==== Single-frame deblurring ==== Known defects in a given imaging situation, such as defocus or aberrations, can sometimes be mitigated in whole or in part by suitable spatial-frequency filtering of even a single image. Such procedures all stay within the diffraction-mandated passband, and do not extend it. ==== Sub-pixel image localization ==== The location of a single source can be determined by computing the "center of gravity" (centroid) of the light distribution extending over several adjacent pixels (see figure on the left). Provided that there is enough light, this can be achieved with arbitrary precision, very much better than pixel width of the detecting apparatus and the resolution limit for the decision of whether the source is single or double. This technique, which requires the presupposition that all the light comes from a single source, is at the basis of what has become known as super-resolution microscopy, e.g. stochastic optical reconstruction microscopy (STORM), where fluorescent probes attached to molecules give nanoscale distance information. It is also the mechanism underlying visual hyperacuity. ==== Bayesian induction beyond traditional diffraction limit ==== Some object features, though beyond the diffraction limit, may be known to be associated with other object features that are within the limits and hence contained in the image. Then conclusions can be drawn, using statistical methods, from the available image data about the presence of the full object. The classical example is Toraldo di Francia's proposition of judging whether an image is that of a single or double star by determining whether its width exceeds the spread from a single star. This can be achieved at separations well below the classical resolution bounds, and requires the prior limitation to the choice "single or double?" The approach can take the form of extrapolating the image in the frequency domain, by assuming that the object is an analytic function, and that we can exactly know the function values in some interval. This method is severely limited by the ever-present noise in digital imaging systems, but it can work for radar, astronomy, microscopy or magnetic resonance imaging. More recently, a fast single image super-resolution algorithm based on a closed-form solution to ℓ 2 − ℓ 2 {\displaystyle \ell _{2}-\ell _{2}} problems has been proposed and demonstrated to accelerate most of the existing Bayesian super-resolution methods significantly. == Aliasing == Geometrical SR reconstruction algorithms are possible if and only if the input low resolution images have been under-sampled and therefore contain aliasing. Because of this aliasing, the high-frequency content of the desired reconstruction image is embedded in the low-frequency content of each of the observed images. Given a sufficient number of observation images, and if the set of observations vary in their phase (i.e. if the images of the scene are shifted by a sub-pixel amount), then the phase information can be used to separate the aliased high-frequency content from the true low-frequency content, and the full-resolution image can be accurate

    Read more →
  • Cipher device

    Cipher device

    A cipher device was a term used by the US military in the first half of the 20th century to describe a manually operated cipher equipment that converted the plaintext into ciphertext or vice versa. A similar term, cipher machine, was used to describe the cipher equipment that required external power for operation. Cipher box or crypto box is a physical cryptographic device used to encrypt and decrypt messages between plaintext (unencrypted) and ciphertext (encrypted or secret) forms. The ciphertext is suitable for transmission over a channel, such as radio, that might be observed by an adversary the communicating parties wish to conceal the plaintext from.

    Read more →
  • Brooklyn Bridge (software)

    Brooklyn Bridge (software)

    The Brooklyn Bridge from White Crane Systems was a data transfer enabler. Although it came with some hardware, it was the software which was the basis of the product. It also could transform the data's format. == Overview == The New York Times described its category as being among "communications packages used to transfer files." In an era of 300 baud, Brooklyn Bridge operated at "115,200 baud" so that a transfer which "at 300 baud took 4 minutes and 36 seconds" only needed 5 seconds. Unlike some communications packages, this one retains the original version-date, so as not to alarm people when they seem to have what looks like an update, when it's not. == Description == Once the software is installed, users comfortable with typing the word "COPY" can do so as readily as they sneakernet. An earlier review described it as "less cumbersome than conventional communications software" The use of neither specialized hardware nor specialized software is ideal in an era when this can be done using online or other "outside" services.

    Read more →
  • Why We Post

    Why We Post

    Why We Post is a research project funded by the European Research Council and launched in 2012 by Daniel Miller with the objective of examining the global impact of new social media. The study is based on ethnographic data collected through the course of 15 months in China, India, Turkey, Italy, United Kingdom, Trinidad, Chile and Brazil. The results of this project were released on 29 February 2016. This included the first three of eleven Open Access books (available via UCL Press), a five-week e-course (MOOC) on FutureLearn in English, also available in Chinese, Portuguese, Hindi, Tamil, Italian, Turkish, and Spanish on UCLeXtend. In addition a website containing key discoveries, stories and over 100 films is available in the same 8 languages.

    Read more →
  • Neural style transfer

    Neural style transfer

    Neural style transfer (NST) software algorithms are able to manipulate digital images, or videos, in order to adopt the appearance or visual style of another image. NST algorithms are characterized by their use of deep neural networks for the sake of image transformation. Common uses for NST are the creation of artificial artwork from photographs, for example by transferring the appearance of famous paintings to user-supplied photographs. Several notable mobile apps use NST techniques for this purpose, including DeepArt and Prisma. This method has been used by artists and designers around the globe to develop new artwork based on existent style(s). == History == NST is an example of image stylization, a problem studied for over two decades within the field of non-photorealistic rendering. The first two example-based style transfer algorithms were image analogies and image quilting. Both of these methods were based on patch-based texture synthesis algorithms. Given a training pair of images–a photo and an artwork depicting that photo–a transformation could be learned and then applied to create new artwork from a new photo, by analogy. If no training photo was available, it would need to be produced by processing the input artwork; image quilting did not require this processing step, though it was demonstrated on only one style. NST was first published in the paper "A Neural Algorithm of Artistic Style" by Leon Gatys et al., originally released to ArXiv 2015, and subsequently accepted by the peer-reviewed CVPR conference in 2016. The original paper used a VGG-19 architecture that has been pre-trained to perform object recognition using the ImageNet dataset. In 2017, Google AI introduced a method that allows a single deep convolutional style transfer network to learn multiple styles at the same time. This algorithm permits style interpolation in real-time, even when done on video media. == Mathematics == This section closely follows the original paper. === Overview === The idea of Neural Style Transfer (NST) is to take two images—a content image p → {\displaystyle {\vec {p}}} and a style image a → {\displaystyle {\vec {a}}} —and generate a third image x → {\displaystyle {\vec {x}}} that minimizes a weighted combination of two loss functions: a content loss L content ( p → , x → ) {\displaystyle {\mathcal {L}}_{\text{content }}({\vec {p}},{\vec {x}})} and a style loss L style ( a → , x → ) {\displaystyle {\mathcal {L}}_{\text{style }}({\vec {a}},{\vec {x}})} . The total loss is a linear sum of the two: L NST ( p → , a → , x → ) = α L content ( p → , x → ) + β L style ( a → , x → ) {\displaystyle {\mathcal {L}}_{\text{NST}}({\vec {p}},{\vec {a}},{\vec {x}})=\alpha {\mathcal {L}}_{\text{content}}({\vec {p}},{\vec {x}})+\beta {\mathcal {L}}_{\text{style}}({\vec {a}},{\vec {x}})} By jointly minimizing the content and style losses, NST generates an image that blends the content of the content image with the style of the style image. Both the content loss and the style loss measures the similarity of two images. The content similarity is the weighted sum of squared-differences between the neural activations of a single convolutional neural network (CNN) on two images. The style similarity is the weighted sum of Gram matrices within each layer (see below for details). The original paper used a VGG-19 CNN, but the method works for any CNN. === Symbols === Let x → {\textstyle {\vec {x}}} be an image input to a CNN. Let F l ∈ R N l × M l {\textstyle F^{l}\in \mathbb {R} ^{N_{l}\times M_{l}}} be the matrix of filter responses in layer l {\textstyle l} to the image x → {\textstyle {\vec {x}}} , where: N l {\textstyle N_{l}} is the number of filters in layer l {\textstyle l} ; M l {\textstyle M_{l}} is the height times the width (i.e. number of pixels) of each filter in layer l {\textstyle l} ; F i j l ( x → ) {\textstyle F_{ij}^{l}({\vec {x}})} is the activation of the i th {\textstyle i^{\text{th}}} filter at position j {\textstyle j} in layer l {\textstyle l} . A given input image x → {\textstyle {\vec {x}}} is encoded in each layer of the CNN by the filter responses to that image, with higher layers encoding more global features, but losing details on local features. === Content loss === Let p → {\textstyle {\vec {p}}} be an original image. Let x → {\textstyle {\vec {x}}} be an image that is generated to match the content of p → {\textstyle {\vec {p}}} . Let P l {\textstyle P^{l}} be the matrix of filter responses in layer l {\textstyle l} to the image p → {\textstyle {\vec {p}}} . The content loss is defined as the squared-error loss between the feature representations of the generated image and the content image at a chosen layer l {\displaystyle l} of a CNN: L content ( p → , x → , l ) = 1 2 ∑ i , j ( A i j l ( x → ) − A i j l ( p → ) ) 2 {\displaystyle {\mathcal {L}}_{\text{content }}({\vec {p}},{\vec {x}},l)={\frac {1}{2}}\sum _{i,j}\left(A_{ij}^{l}({\vec {x}})-A_{ij}^{l}({\vec {p}})\right)^{2}} where A i j l ( x → ) {\displaystyle A_{ij}^{l}({\vec {x}})} and A i j l ( p → ) {\displaystyle A_{ij}^{l}({\vec {p}})} are the activations of the i th {\displaystyle i^{\text{th}}} filter at position j {\displaystyle j} in layer l {\displaystyle l} for the generated and content images, respectively. Minimizing this loss encourages the generated image to have similar content to the content image, as captured by the feature activations in the chosen layer. The total content loss is a linear sum of the content losses of each layer: L content ( p → , x → ) = ∑ l v l L content ( p → , x → , l ) {\displaystyle {\mathcal {L}}_{\text{content }}({\vec {p}},{\vec {x}})=\sum _{l}v_{l}{\mathcal {L}}_{\text{content }}({\vec {p}},{\vec {x}},l)} , where the v l {\displaystyle v_{l}} are positive real numbers chosen as hyperparameters. === Style loss === The style loss is based on the Gram matrices of the generated and style images, which capture the correlations between different filter responses at different layers of the CNN: L style ( a → , x → ) = ∑ l = 0 L w l E l , {\displaystyle {\mathcal {L}}_{\text{style }}({\vec {a}},{\vec {x}})=\sum _{l=0}^{L}w_{l}E_{l},} where E l = 1 4 N l 2 M l 2 ∑ i , j ( G i j l ( x → ) − G i j l ( a → ) ) 2 . {\displaystyle E_{l}={\frac {1}{4N_{l}^{2}M_{l}^{2}}}\sum _{i,j}\left(G_{ij}^{l}({\vec {x}})-G_{ij}^{l}({\vec {a}})\right)^{2}.} Here, G i j l ( x → ) {\displaystyle G_{ij}^{l}({\vec {x}})} and G i j l ( a → ) {\displaystyle G_{ij}^{l}({\vec {a}})} are the entries of the Gram matrices for the generated and style images at layer l {\displaystyle l} . Explicitly, G i j l ( x → ) = ∑ k F i k l ( x → ) F j k l ( x → ) {\displaystyle G_{ij}^{l}({\vec {x}})=\sum _{k}F_{ik}^{l}({\vec {x}})F_{jk}^{l}({\vec {x}})} Minimizing this loss encourages the generated image to have similar style characteristics to the style image, as captured by the correlations between feature responses in each layer. The idea is that activation pattern correlations between filters in a single layer captures the "style" on the order of the receptive fields at that layer. Similarly to the previous case, the w l {\displaystyle w_{l}} are positive real numbers chosen as hyperparameters. === Hyperparameters === In the original paper, they used a particular choice of hyperparameters. The style loss is computed by w l = 0.2 {\displaystyle w_{l}=0.2} for the outputs of layers conv1_1, conv2_1, conv3_1, conv4_1, conv5_1 in the VGG-19 network, and zero otherwise. The content loss is computed by w l = 1 {\displaystyle w_{l}=1} for conv4_2, and zero otherwise. The ratio α / β ∈ [ 5 , 50 ] × 10 − 4 {\displaystyle \alpha /\beta \in [5,50]\times 10^{-4}} . === Training === Image x → {\displaystyle {\vec {x}}} is initially approximated by adding a small amount of white noise to input image p → {\displaystyle {\vec {p}}} and feeding it through the CNN. Then we successively backpropagate this loss through the network with the CNN weights fixed in order to update the pixels of x → {\displaystyle {\vec {x}}} . After several thousand epochs of training, an x → {\displaystyle {\vec {x}}} (hopefully) emerges that matches the style of a → {\displaystyle {\vec {a}}} and the content of p → {\displaystyle {\vec {p}}} . As of 2017, when implemented on a GPU, it takes a few minutes to converge. == Extensions == In some practical implementations, it is noted that the resulting image has too much high-frequency artifact, which can be suppressed by adding the total variation to the total loss. Compared to VGGNet, AlexNet does not work well for neural style transfer. NST has also been extended to videos. Subsequent work improved the speed of NST for images by using special-purpose normalizations. In a paper by Fei-Fei Li et al. adopted a different regularized loss metric and accelerated method for training to produce results in real-time (three orders of magnitude faster than Gatys). Their idea was to use not the pixel-based loss defined above but rather a 'perceptual loss' measuring t

    Read more →
  • Feistel cipher

    Feistel cipher

    In cryptography, a Feistel cipher (also known as Luby–Rackoff block cipher) is a symmetric structure used in the construction of block ciphers, named after the German-born physicist and cryptographer Horst Feistel, who did pioneering research while working for IBM; it is also commonly known as a Feistel network. A large number of block ciphers use the scheme, including the US Data Encryption Standard, the Soviet/Russian GOST (aka Magma) and the more recent Blowfish and Twofish ciphers. In a Feistel cipher, encryption and decryption are very similar operations, and both consist of iteratively running a function called a "round function" a fixed number of times. == History == Many modern symmetric block ciphers are based on Feistel networks. Feistel networks were first seen commercially in IBM's Lucifer cipher, designed by Horst Feistel and Don Coppersmith in 1973. Feistel networks gained respectability when the U.S. Federal Government adopted the DES (a cipher based on Lucifer, with changes made by the NSA) in 1976. Like other components of the DES, the iterative nature of the Feistel construction makes implementing the cryptosystem in hardware easier (particularly on the hardware available at the time of DES's design). == Design == A Feistel network uses a round function, a function which takes two inputs – a data block and a subkey – and returns one output of the same size as the data block. In each round, the round function is run on half of the data to be encrypted, and its output is XORed with the other half of the data. This is repeated a fixed number of times, and the final output is the encrypted data. An important advantage of Feistel networks compared to other cipher designs such as substitution–permutation networks (SP-networks) is that the entire operation is guaranteed to be invertible (that is, encrypted data can be decrypted), even if the round function is not itself invertible. The round function can be made arbitrarily complicated, since it does not need to be designed to be invertible. Furthermore, the encryption and decryption operations are very similar, even identical in some cases, requiring only a reversal of the key schedule. Therefore, the size of the code or circuitry required to implement such a cipher is nearly halved. Unlike SP-networks, Feistel networks also do not depend on a substitution box that could cause timing side-channels in software implementations. == Theoretical work == The structure and properties of Feistel ciphers have been extensively analyzed by cryptographers. Michael Luby and Charles Rackoff analyzed the Feistel cipher construction and proved that if the round function is a cryptographically secure pseudorandom function, with Ki used as the seed, then 3 rounds are sufficient to make the block cipher a pseudorandom permutation, while 4 rounds are sufficient to make it a "strong" pseudorandom permutation (which means that it remains pseudorandom even to an adversary who gets oracle access to its inverse permutation). Because of this very important result of Luby and Rackoff, Feistel ciphers are sometimes called Luby–Rackoff block ciphers. Further theoretical work has generalized the construction somewhat and given more precise bounds for security. == Construction details == Let F {\displaystyle \mathrm {F} } be the round function and let K 0 , K 1 , … , K n {\displaystyle K_{0},K_{1},\ldots ,K_{n}} be the sub-keys for the rounds 0 , 1 , … , n {\displaystyle 0,1,\ldots ,n} respectively. Then the basic operation is as follows: Split the plaintext block into two equal pieces: ( L 0 {\displaystyle L_{0}} , R 0 {\displaystyle R_{0}} ). For each round i = 0 , 1 , … , n {\displaystyle i=0,1,\dots ,n} , compute L i + 1 = R i , {\displaystyle L_{i+1}=R_{i},} R i + 1 = L i ⊕ F ( R i , K i ) , {\displaystyle R_{i+1}=L_{i}\oplus \mathrm {F} (R_{i},K_{i}),} where ⊕ {\displaystyle \oplus } means XOR. Then the ciphertext is ( R n + 1 , L n + 1 ) {\displaystyle (R_{n+1},L_{n+1})} . Decryption of a ciphertext ( R n + 1 , L n + 1 ) {\displaystyle (R_{n+1},L_{n+1})} is accomplished by computing for i = n , n − 1 , … , 0 {\displaystyle i=n,n-1,\ldots ,0} R i = L i + 1 , {\displaystyle R_{i}=L_{i+1},} L i = R i + 1 ⊕ F ⁡ ( L i + 1 , K i ) . {\displaystyle L_{i}=R_{i+1}\oplus \operatorname {F} (L_{i+1},K_{i}).} Then ( L 0 , R 0 ) {\displaystyle (L_{0},R_{0})} is the plaintext again. The diagram illustrates both encryption and decryption. Note the reversal of the subkey order for decryption; this is the only difference between encryption and decryption. === Unbalanced Feistel cipher === Unbalanced Feistel ciphers use a modified structure where L 0 {\displaystyle L_{0}} and R 0 {\displaystyle R_{0}} are not of equal lengths. The Skipjack cipher is an example of such a cipher. The Texas Instruments digital signature transponder uses a proprietary unbalanced Feistel cipher to perform challenge–response authentication. The Thorp shuffle is an extreme case of an unbalanced Feistel cipher in which one side is a single bit. This has better provable security than a balanced Feistel cipher but requires more rounds. There exists Type-1, Type-2, and Type-3 Feistel networks, where the Feistel function is one fourth the size of the block but operates a varying number of times within one round. === Other uses === The Feistel construction is also used in cryptographic algorithms other than block ciphers. For example, the optimal asymmetric encryption padding (OAEP) scheme uses a simple Feistel network to randomize ciphertexts in certain asymmetric-key encryption schemes. A generalized Feistel algorithm can be used to create strong permutations on small domains of size not a power of two (see format-preserving encryption). === Feistel networks as a design component === Whether the entire cipher is a Feistel cipher or not, Feistel-like networks can be used as a component of a cipher's design. For example, MISTY1 is a Feistel cipher using a three-round Feistel network in its round function, Skipjack is a modified Feistel cipher using a Feistel network in its G permutation, and Threefish (part of Skein) is a non-Feistel block cipher that uses a Feistel-like MIX function. == List of Feistel ciphers == Feistel or modified Feistel: Generalised Feistel: CAST-256 CLEFIA MacGuffin RC2 RC6 Skipjack SMS4

    Read more →
  • Feistel cipher

    Feistel cipher

    In cryptography, a Feistel cipher (also known as Luby–Rackoff block cipher) is a symmetric structure used in the construction of block ciphers, named after the German-born physicist and cryptographer Horst Feistel, who did pioneering research while working for IBM; it is also commonly known as a Feistel network. A large number of block ciphers use the scheme, including the US Data Encryption Standard, the Soviet/Russian GOST (aka Magma) and the more recent Blowfish and Twofish ciphers. In a Feistel cipher, encryption and decryption are very similar operations, and both consist of iteratively running a function called a "round function" a fixed number of times. == History == Many modern symmetric block ciphers are based on Feistel networks. Feistel networks were first seen commercially in IBM's Lucifer cipher, designed by Horst Feistel and Don Coppersmith in 1973. Feistel networks gained respectability when the U.S. Federal Government adopted the DES (a cipher based on Lucifer, with changes made by the NSA) in 1976. Like other components of the DES, the iterative nature of the Feistel construction makes implementing the cryptosystem in hardware easier (particularly on the hardware available at the time of DES's design). == Design == A Feistel network uses a round function, a function which takes two inputs – a data block and a subkey – and returns one output of the same size as the data block. In each round, the round function is run on half of the data to be encrypted, and its output is XORed with the other half of the data. This is repeated a fixed number of times, and the final output is the encrypted data. An important advantage of Feistel networks compared to other cipher designs such as substitution–permutation networks (SP-networks) is that the entire operation is guaranteed to be invertible (that is, encrypted data can be decrypted), even if the round function is not itself invertible. The round function can be made arbitrarily complicated, since it does not need to be designed to be invertible. Furthermore, the encryption and decryption operations are very similar, even identical in some cases, requiring only a reversal of the key schedule. Therefore, the size of the code or circuitry required to implement such a cipher is nearly halved. Unlike SP-networks, Feistel networks also do not depend on a substitution box that could cause timing side-channels in software implementations. == Theoretical work == The structure and properties of Feistel ciphers have been extensively analyzed by cryptographers. Michael Luby and Charles Rackoff analyzed the Feistel cipher construction and proved that if the round function is a cryptographically secure pseudorandom function, with Ki used as the seed, then 3 rounds are sufficient to make the block cipher a pseudorandom permutation, while 4 rounds are sufficient to make it a "strong" pseudorandom permutation (which means that it remains pseudorandom even to an adversary who gets oracle access to its inverse permutation). Because of this very important result of Luby and Rackoff, Feistel ciphers are sometimes called Luby–Rackoff block ciphers. Further theoretical work has generalized the construction somewhat and given more precise bounds for security. == Construction details == Let F {\displaystyle \mathrm {F} } be the round function and let K 0 , K 1 , … , K n {\displaystyle K_{0},K_{1},\ldots ,K_{n}} be the sub-keys for the rounds 0 , 1 , … , n {\displaystyle 0,1,\ldots ,n} respectively. Then the basic operation is as follows: Split the plaintext block into two equal pieces: ( L 0 {\displaystyle L_{0}} , R 0 {\displaystyle R_{0}} ). For each round i = 0 , 1 , … , n {\displaystyle i=0,1,\dots ,n} , compute L i + 1 = R i , {\displaystyle L_{i+1}=R_{i},} R i + 1 = L i ⊕ F ( R i , K i ) , {\displaystyle R_{i+1}=L_{i}\oplus \mathrm {F} (R_{i},K_{i}),} where ⊕ {\displaystyle \oplus } means XOR. Then the ciphertext is ( R n + 1 , L n + 1 ) {\displaystyle (R_{n+1},L_{n+1})} . Decryption of a ciphertext ( R n + 1 , L n + 1 ) {\displaystyle (R_{n+1},L_{n+1})} is accomplished by computing for i = n , n − 1 , … , 0 {\displaystyle i=n,n-1,\ldots ,0} R i = L i + 1 , {\displaystyle R_{i}=L_{i+1},} L i = R i + 1 ⊕ F ⁡ ( L i + 1 , K i ) . {\displaystyle L_{i}=R_{i+1}\oplus \operatorname {F} (L_{i+1},K_{i}).} Then ( L 0 , R 0 ) {\displaystyle (L_{0},R_{0})} is the plaintext again. The diagram illustrates both encryption and decryption. Note the reversal of the subkey order for decryption; this is the only difference between encryption and decryption. === Unbalanced Feistel cipher === Unbalanced Feistel ciphers use a modified structure where L 0 {\displaystyle L_{0}} and R 0 {\displaystyle R_{0}} are not of equal lengths. The Skipjack cipher is an example of such a cipher. The Texas Instruments digital signature transponder uses a proprietary unbalanced Feistel cipher to perform challenge–response authentication. The Thorp shuffle is an extreme case of an unbalanced Feistel cipher in which one side is a single bit. This has better provable security than a balanced Feistel cipher but requires more rounds. There exists Type-1, Type-2, and Type-3 Feistel networks, where the Feistel function is one fourth the size of the block but operates a varying number of times within one round. === Other uses === The Feistel construction is also used in cryptographic algorithms other than block ciphers. For example, the optimal asymmetric encryption padding (OAEP) scheme uses a simple Feistel network to randomize ciphertexts in certain asymmetric-key encryption schemes. A generalized Feistel algorithm can be used to create strong permutations on small domains of size not a power of two (see format-preserving encryption). === Feistel networks as a design component === Whether the entire cipher is a Feistel cipher or not, Feistel-like networks can be used as a component of a cipher's design. For example, MISTY1 is a Feistel cipher using a three-round Feistel network in its round function, Skipjack is a modified Feistel cipher using a Feistel network in its G permutation, and Threefish (part of Skein) is a non-Feistel block cipher that uses a Feistel-like MIX function. == List of Feistel ciphers == Feistel or modified Feistel: Generalised Feistel: CAST-256 CLEFIA MacGuffin RC2 RC6 Skipjack SMS4

    Read more →
  • Commit (data management)

    Commit (data management)

    In computer science and data management, a commit is a behavior that marks the end of a transaction and provides Atomicity, Consistency, Isolation, and Durability (ACID) in transactions. The submission records are stored in the submission log for recovery and consistency in case of failure. In terms of transactions, the opposite of committing is giving up tentative changes to the transaction, which is rolled back. Due to the rise of distributed computing and the need to ensure data consistency across multiple systems, commit protocols have been evolving since their emergence in the 1970s. The main developments include the Two-Phase Commit (2PC) first proposed by Jim Gray, which is the fundamental core of distributed transaction management. Subsequently, the Three-phase Commit (3PC), Hypothesis Commit (PC), Hypothesis Abort (PA), and Optimistic Commit protocols gradually emerged, solving the problems of blocking and fault recovery. Today, new fields such as e-commerce payment and blockchain technology are emerging, and submission protocols play a significant role in various business areas. By effectively handling transactions, resolving faults and recovering problems, the commit protocol becomes crucial in ensuring the reliability and consistency of data management. == History == The concept of Commit originated in the late 1960s and early 1970s, when computer technology was rapidly advancing and data management was becoming an important requirement in business and finance. Enterprises have gradually replaced the traditional paper records with computers, which has fully improved the work efficiency. The reliability and consistency of data have become a necessary requirement. Transaction management at this stage is relatively simple, limited to using a single computer for processing. It merely effectively records the changes in data to ensure that the data remains stable after the transaction is completed or terminated. In the late 1970s, as database systems moved from a single calculator operation to multiple distributed collaborations, ensuring data consistency and reliability became a new challenge. In 1978, computer scientist Jim Gray proposed the famous two-phase Commit Protocol (2PC), which became an effective solution for distributed transaction management, successfully managing data synchronization problems between multiple nodes. However, this commit protocol has some potential transaction blocking problems when nodes fail. In the early 1980s, researchers discovered that although the two-step commit protocol was effective at synchronizing data, there could be long waits and even system crashes, with limitations. To improve this problem, people have begun to explore new and effective methods, including enhancing efficiency by reducing message communication during the protocol process. IBM's R database introduced the Assumed Commit and Assumed abort protocols, which contributed significantly to transaction management efficiency. These two protocols have greatly improved the processing efficiency of distributed transactions by reducing communication overhead and have become an important breakthrough in the technology of transaction commit protocols. By the early 1990s, with the increase in business demands and the complexity of transactions, enterprises required higher efficiency in distributed transaction processing. In order to adapt to the needs of different environments, the scientific community has gradually developed various variants of commit protocols to provide more flexible transaction management options for different needs. For example, the three-phase commit protocol promotes the commit of transactions more effectively and reduces the occurrence of blocking problems by adding a pre-commit protocol and a timeout mechanism. In the 21st century, with the popularization of mobile Internet and wireless technology, the commit protocol has been further developed, and researchers have begun to pay attention to how to reduce the blocking in the transaction process to solve the problem of broadband limitation, battery life and network instability in the mobile environment. The proposal of optimistic commit protocol marks the extension of commit technology from traditional database to the emerging mobile data field. This protocol allows transactions to temporarily use unconfirmed data, improving the user experience in cases of poor network conditions. In recent years, with the rise of blockchain and decentralized technologies, submission protocols and consensus mechanisms have gradually merged. These consensus algorithms play a role in tamper-proofing and preventing malicious attacks on node pairs in a decentralized environment. This enables commit to no longer be confined to the scope of traditional database management, but to become the core technology of trust computing and distributed ledgers, further expanding the application field of commit in the digital age. This integration has brought about extensive application impacts. Each transaction can achieve the effect of tracking global submissions through the verification of the consensus mechanism, becoming an important technical foundation for promoting the circulation of digital assets, the operation of cryptocurrencies and decentralized applications. == Commit Protocol Types == In the world of data management, a transaction is a series of database operations, such as bank transfers and order submission. In order to ensure the accuracy, consistency, and security of the data, transactions are usually completed completely, or cancelled completely, leaving no partially completed results. Commit protocol is the method used to coordinate this process. Different protocols are applicable to different submission scenarios and have their own advantages and disadvantages. There are four major commit protocols. === Two-Phase Commit (2PC) === The two-phase commit protocol is the most classic and broadest approach to distributed transactions, which includes both a preparation phase and a commit phase. This commit protocol is designed to allow the database coordinator to determine if all participating nodes agree. The preparation phase is the phase in which the coordination node sends a ready to commit request to all nodes participating in the transaction. The commit phase is a global commit after all participating nodes are ready, and if no agreement is reached, all nodes roll back the transaction and undo all previous operations. Although the two-phase commit protocol is the easiest to operate and widely used, its obvious drawback is that it can cause transactions to be blocked for a long time when nodes fail, resulting in a decline in system performance and making it difficult to terminate or continue immediately. === Three-Phase Commit (3PC) === The three-phase commit protocol is an improved non-blocking protocol based on 2PC, which is divided into three stages: preparation, pre-commit and commit. Firstly, each node sends a "preparation" request. After confirmation, a "pre-submission" stage is added. At this point, each node has completed most of the preparatory work and is waiting for the final confirmation. Finally, in the formal commit stage, after all nodes send the "commit" request, the transaction is completed and committed. Compared with 2PC, it increases the timeout mechanism, avoids the blocking problem caused by single point of failure, and improves the reliability of the system. The three-phase commit protocol significantly optimizes transaction reliability, but adds additional overhead for message transmission and state maintenance. It is more suitable for distributed application scenarios with high transaction sensitivity and no acceptance of long waiting times. === Presumed Commit (PC) and Presumed Abort (PA) === Presumed Commit (PC) is the default that the transaction will be committed successfully and rollback will be notified unless an anomaly is encountered. This commit reduces the message overhead and logging costs of a normal commits. Presumed Abort (PA) is assumed that the default state of the transaction is a rollback and will only be committed when all nodes have explicitly agreed. This commit is applicable to transactions that are not updated frequently or have a low probability of successful commit. The IBM R Distributed Database management System was the first to propose and practice the PC and PA protocols, handling distributed transaction management very efficiently and becoming a classic case in the field of database transaction management. === Optimistic Commit Protocol === With the rise of the Internet, the previous commit protocols are facing new challenges, especially in mobile scenarios with unstable networks. Excessively long transaction waiting times can affect the user experience. The Optimistic Commit Protocol allows a transaction to temporarily access uncommitted data before committing to avoid wait times. This type of commit is suitable f

    Read more →
  • Butler in a Box

    Butler in a Box

    Butler in a Box was an early voice-controlled home automation device developed in 1983 by magician Gus Searcy and programmer Franz Kavan. The device allowed users to control various home electronics, such as lights and phones, using voice commands. It predated modern smart speakers and virtual assistants by several decades. == History == The idea for the Butler in a Box originated in 1983 when Searcy was asked by friends why he couldn't simply command lights to turn on and off if he could pull rabbits out of hats, given his background as a professional magician. Searcy partnered with former IBM programmer Kavan to develop the device, with their first prototype being named "Sidney". The Butler in a Box combined remote control technology with voice recognition to enable control of home devices. However, it faced challenges due to the technological limitations of the era and its high price point of nearly $1,500 (equivalent to around $3,700 in 2021). == Features and functionality == Users could activate the Butler in a Box by speaking a wake word, typically a traditional butler name, and the device would address the user as "boss". It was capable of performing tasks such as: Turning lights on and off, controlling individual zones if lights were connected to remote control modules Making and receiving phone calls Setting timers Pairing with sensors to function as a security alarm system However, the device required extensive voice training for each user, a time-consuming process compared to modern voice recognition. Additionally, settings and trained commands would be lost if power was out for over 3 hours due to the volatile memory technology used at the time. == Reception and legacy == While innovative for its time, the Butler in a Box did not achieve widespread commercial success due to its high price and the technical limitations of the 1980s. Nevertheless, it served as an important early step in the development of home automation and showcased the potential for voice-controlled technology to enhance accessibility and convenience in the home. Decades later, products like Amazon Alexa, Google Home, and Apple's Siri would make voice-controlled smart home devices commonplace and affordable, building on the groundwork laid by early attempts like the Butler in a Box.

    Read more →
  • Social influence bias

    Social influence bias

    The social influence bias is an asymmetric herding effect on online social media platforms which makes users overcompensate for negative ratings but amplify positive ones. Driven by the desire to be accepted within a specific group, it surrounds the idea that people alter certain behaviors to be like those of the people within a group. Therefore, it is a subgroup term for various types of cognitive biases. Some social influence bias types include the bandwagon effect, authority bias, groupthinking effect, social comparison bias, social media bias and more. Understanding these biases helps us understand the term overall. However, the composition of the term "social influence bias" requires critical examination to understand the way that it affects individuals' and groups' lives. The term "influence" has 2 different types of stigma. For one, it surrounds the idea that people show their true inner selves when "under the influence". On the other end, it also proposes the idea that people are not their own selves when "under the influence". These tend to be constructions made by people, which also tend to fit the situation based on their own perspectives. So, even in social terms, it requires both sides to be examined to understand whether we truly are affected by context, or we remain to be and behave in terms of our own selves. The term "influence" doesn't necessarily say that there lies greater strength in our inner self's desires and decisions, nor does it say that external factors have the greater power. In a similar manner, both social and non-social judgments are to be associated with anxiety, but the same can't necessarily be said in the case of social conformity. So, the gray areas within this topic beg the question, "What does social influence bias say about us, and does it affect us all in the same way?" == Social media bias == Media bias is reflected in search systems in social media. Kulshrestha and her team found through research in 2018 that the top-ranked results returned by these search engines can influence users' perceptions when they conduct searches for events or people, which is particularly reflected in political bias and polarizing topics. Fueled by confirmation bias, online echo chambers allow users to be steeped within their own ideology. Because social media is tailored to your interests and your selected friends, it is an easy outlet for political echo chambers. Social media bias is also reflected in hostile media effect. Social media has a place in disseminating news in modern society, where viewers are exposed to other people's comments while reading news articles. In their 2020 study, Gearhart and her team showed that viewers' perceptions of bias increased and perceptions of credibility decreased after seeing comments with which they held different opinions. == In research context == In observational data, how social influence affects collected judgment is challenging to fully understand. Positive social influence can accumulate and result in a rating bubble, while negative social influence is neutralized by crowd correction. This phenomenon was first described in a paper written by Lev Muchnik, Sinan Aral and Sean J. Taylor in 2014, then the question was revisited by Cicognani et al., whose experiment reinforced Munchnik's and his co-authors' results. == Relevance == Online customer reviews are trusted sources of information in various contexts such as online marketplaces, dining, accommodation, movies, or digital products. However, these online ratings are not immune to herd behavior, which means that subsequent reviews are not independent from each other. As on many such sites, preceding opinions are visible to a new reviewer, he or she can be heavily influenced by the antecedent evaluations in his or her decision about the certain product, service or online content. This form of herding behavior inspired Muchnik, Aral and Taylor to conduct their experiment on influence in social contexts. == Experimental design == Muchnik, Aral, and Taylor designed a large-scale randomized experiment to measure social influence on user reviews. The experiment was conducted on social news aggregation website like Reddit. The study lasted for 5 months, the authors randomly assigned 101 281 comments to one of the following treatment groups: up-treated (4049), down-treated (1942), or control (the proportions reflect the observed ratio of up-and down-votes. Comments which fell to the first group were given an up-vote upon the creation of the comment, the second group got a down-vote upon creation, the comments in the control group remained untouched. A vote is equivalent to a single rating (+1 or -1). As other users are unable to trace a user’s votes, they were unaware of the experiment. Due to randomization, comments in the control and the treatment group were not different in terms of expected rating. The treated comments were viewed more than 10 million times and rated 308 515 times by successive users. == Results == The up-vote treatment increased the probability of up-voting by the first viewer by 32% over the control group, while the probability of down-voting did not change compared to the control group, which means that users did not correct the random positive rating. The upward bias remained inplace for the observed 5-month period. The accumulating herding effect increased the comment’s mean rating by 25% compared to the control group comments. Positively manipulated comments did receive higher ratings at all parts of the distribution, which means that they were also more likely to collect extremely high scores. The negative manipulation created an asymmetric herd effect: although the probability of subsequent down-votes was increased by the negative treatment, the probability of up-voting also grew for these comments. The community performed a correction which neutralized the negative treatment and resulted non-different final mean ratings from the control group. The authors also compared the final mean scores of comments across the most active topic categories on the website. The observed positive herding effect was present in the "politics," "culture and society," and "business" subreddits, but was not applicable for "economics," "IT," "fun," and "general news".- == Implications == The skewed nature of online ratings makes review outcomes different to what it would be without the social influence bias. In a 2009 experiment by Hu, Zhang and Pavlou showed that the distribution of reviews of a certain product made by unconnected individuals is approximately normal, however, the rating of the same product on Amazon followed a J-Shaped distribution with twice as much five-star ratings than others. Cicognani, Figini and Magnani came to similar conclusions after their experiment conducted on a tourism services website: positive preceding ratings influenced raters' behavior more than mediocre ones. Positive crowd correction makes community-based opinions upward-biased.

    Read more →
  • Control-flow diagram

    Control-flow diagram

    A control-flow diagram (CFD) is a diagram to describe the control flow of a business process, process or review. Control-flow diagrams were developed in the 1950s, and are widely used in multiple engineering disciplines. They are one of the classic business process modeling methodologies, along with flow charts, drakon-charts, data flow diagrams, functional flow block diagram, Gantt charts, PERT diagrams, and IDEF. == Overview == A control-flow diagram can consist of a subdivision to show sequential steps, with if-then-else conditions, repetition, and/or case conditions. Suitably annotated geometrical figures are used to represent operations, data, or equipment, and arrows are used to indicate the sequential flow from one to another. There are several types of control-flow diagrams, for example: Change-control-flow diagram, used in project management Configuration-decision control-flow diagram, used in configuration management Process-control-flow diagram, used in process management Quality-control-flow diagram, used in quality control. In software and systems development, control-flow diagrams can be used in control-flow analysis, data-flow analysis, algorithm analysis, and simulation. Control and data are most applicable for real time and data-driven systems. These flow analyses transform logic and data requirements text into graphic flows which are easier to analyze than the text. PERT, state transition, and transaction diagrams are examples of control-flow diagrams. == Types of control-flow diagrams == === Process-control-flow diagram === A flow diagram can be developed for the process [control system] for each critical activity. Process control is normally a closed cycle in which a sensor. The application determines if the sensor information is within the predetermined (or calculated) data parameters and constraints. The results of this comparison, which controls the critical component. This [feedback] may control the component electronically or may indicate the need for a manual action. This closed-cycle process has many checks and balances to ensure that it stays safe. It may be fully computer controlled and automated, or it may be a hybrid in which only the sensor is automated and the action requires manual intervention. Further, some process control systems may use prior generations of hardware and software, while others are state of the art. === Performance-seeking control-flow diagram === The figure presents an example of a performance-seeking control-flow diagram of the algorithm. The control law consists of estimation, modeling, and optimization processes. In the Kalman filter estimator, the inputs, outputs, and residuals were recorded. At the compact propulsion-system-modeling stage, all the estimated inlet and engine parameters were recorded. In addition to temperatures, pressures, and control positions, such estimated parameters as stall margins, thrust, and drag components were recorded. In the optimization phase, the operating-condition constraints, optimal solution, and linear-programming health-status condition codes were recorded. Finally, the actual commands that were sent to the engine through the DEEC were recorded.

    Read more →