AI Email Management

AI Email Management — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Voice activity detection

    Voice activity detection

    Voice activity detection (VAD), also known as speech activity detection or speech detection, is the detection of the presence or absence of human speech, used in speech processing. The main uses of VAD are in speaker diarization, speech coding and speech recognition. It can facilitate speech processing, and can also be used to deactivate some processes during non-speech section of an audio session: it can avoid unnecessary coding/transmission of silence packets in Voice over Internet Protocol (VoIP) applications, saving on computation and on network bandwidth. VAD is an important enabling technology for a variety of speech-based applications. Therefore, various VAD algorithms have been developed that provide varying features and compromises between latency, sensitivity, accuracy and computational cost. Some VAD algorithms also provide further analysis, for example whether the speech is voiced, unvoiced or sustained. Voice activity detection is usually independent of language. It was first investigated for use on time-assignment speech interpolation (TASI) systems. == Algorithm overview == The typical design of a VAD algorithm is as follows: There may first be a noise reduction stage, e.g. via spectral subtraction. Then some features or quantities are calculated from a section of the input signal. A classification rule is applied to classify the section as speech or non-speech – often this classification rule finds when a value exceeds a certain threshold. There may be some feedback in this sequence, in which the VAD decision is used to improve the noise estimate in the noise reduction stage, or to adaptively vary the threshold(s). These feedback operations improve the VAD performance in non-stationary noise (i.e. when the noise varies a lot). A representative set of recently published VAD methods formulates the decision rule on a frame by frame basis using instantaneous measures of the divergence distance between speech and noise. The different measures which are used in VAD methods include spectral slope, correlation coefficients, log likelihood ratio, cepstral, weighted cepstral, and modified distance measures. Independently from the choice of VAD algorithm, a compromise must be made between having voice detected as noise, or noise detected as voice (between false positive and false negative). A VAD operating in a mobile phone must be able to detect speech in the presence of a range of very diverse types of acoustic background noise. In these difficult detection conditions it is often preferable that a VAD should fail-safe, indicating speech detected when the decision is in doubt, to lower the chance of losing speech segments. The biggest difficulty in the detection of speech in this environment is the very low signal-to-noise ratios (SNRs) that are encountered. It may be impossible to distinguish between speech and noise using simple level detection techniques when parts of the speech utterance are buried below the noise. == Applications == VAD is an integral part of different speech communication systems such as audio conferencing, echo cancellation, speech recognition, speech encoding, speaker recognition and hands-free telephony. In the field of multimedia applications, VAD allows simultaneous voice and data applications. Similarly, in Universal Mobile Telecommunications Systems (UMTS), it controls and reduces the average bit rate and enhances overall coding quality of speech. In cellular radio systems (for instance GSM and CDMA systems) based on Discontinuous Transmission (DTX) mode, VAD is essential for enhancing system capacity by reducing co-channel interference and power consumption in portable digital devices. In speech processing applications, voice activity detection plays an important role since non-speech frames are often discarded. For a wide range of applications such as digital mobile radio, Digital Simultaneous Voice and Data (DSVD) or speech storage, it is desirable to provide a discontinuous transmission of speech-coding parameters. Advantages can include lower average power consumption in mobile handsets, higher average bit rate for simultaneous services like data transmission, or a higher capacity on storage chips. However, the improvement depends mainly on the percentage of pauses during speech and the reliability of the VAD used to detect these intervals. On the one hand, it is advantageous to have a low percentage of speech activity. On the other hand, clipping, that is the loss of milliseconds of active speech, should be minimized to preserve quality. This is the crucial problem for a VAD algorithm under heavy noise conditions. === Use in telemarketing === One controversial application of VAD is in conjunction with predictive dialers used by telemarketing firms. In order to maximize agent productivity, telemarketing firms set up predictive dialers to call more numbers than they have agents available, knowing most calls will end up in either "Ring – No Answer" or answering machines. When a person answers, they typically speak briefly ("Hello", "Good evening", etc.) and then there is a brief period of silence. Answering machine messages are usually 3–15 seconds of continuous speech. By setting VAD parameters correctly, dialers can determine whether a person or a machine answered the call and, if it's a person, transfer the call to an available agent. If it detects an answering machine message, the dialer hangs up. Often, even when the system correctly detects a person answering the call, no agent may be available, resulting in a "silent call". Call screening with a multi-second message like "please say who you are, and I may pick up the phone" will frustrate such automated calls. == Performance evaluation == To evaluate a VAD, its output using test recordings is compared with those of an "ideal" VAD – created by hand-annotating the presence or absence of voice in the recordings. The performance of a VAD is commonly evaluated on the basis of the following four parameters: FEC (Front End Clipping): clipping introduced in passing from noise to speech activity; MSC (Mid Speech Clipping): clipping due to speech misclassified as noise; OVER: noise interpreted as speech due to the VAD flag remaining active in passing from speech activity to noise; NDS (Noise Detected as Speech): noise interpreted as speech within a silence period. Although the method described above provides useful objective information concerning the performance of a VAD, it is only an approximate measure of the subjective effect. For example, the effects of speech signal clipping can at times be hidden by the presence of background noise, depending on the model chosen for the comfort noise synthesis, so some of the clipping measured with objective tests is in reality not audible. It is therefore important to carry out subjective tests on VADs, the main aim of which is to ensure that the clipping perceived is acceptable. In VoIP applications, front-end clipping can be reduced by rewinding to shortly before the detection and sending very slightly delayed data. This kind of test requires a certain number of listeners to judge recordings containing the processing results of the VADs being tested, giving marks to several speech sequences on the following features: Quality; Comprehension difficulty; Audibility of clipping. These marks are then used to calculate average results for each of the features listed above, thus providing a global estimate of the behavior of the VAD being tested. To conclude, whereas objective methods are very useful in an initial stage to evaluate the quality of a VAD, subjective methods are more significant. As they require the participation of several people for a few days, increasing cost, they are generally only used when a proposal is about to be standardized. == Implementations == One early standard VAD is that developed by British Telecom for use in the Pan-European digital cellular mobile telephone service in 1991. It uses inverse filtering trained on non-speech segments to filter out background noise, so that it can then more reliably use a simple power-threshold to decide if a voice is present. The G.729 standard calculates the following features for its VAD: line spectral frequencies, full-band energy, low-band energy (<1 kHz), and zero-crossing rate. It applies a simple classification using a fixed decision boundary in the space defined by these features, and then applies smoothing and adaptive correction to improve the estimate. The GSM standard includes two VAD options developed by ETSI. Option 1 computes the SNR in nine bands and applies a threshold to these values. Option 2 calculates different parameters: channel power, voice metrics, and noise power. It then thresholds the voice metrics using a threshold that varies according to the estimated SNR. The Speex audio compression library uses a procedure named Improved Minima Controlled Recursive Averaging, which uses a smoothed representation of spectral pow

    Read more →
  • Screen space directional occlusion

    Screen space directional occlusion

    Screen space directional occlusion (SSDO) is a computer graphics technique enhancing screen space ambient occlusion (SSAO) by taking direction into account to sample the ambient light (both the light coming directly at an object, as well as the light reflected off of the object directly behind it), to better approximate global illumination. SSDO was introduced by Tobias Ritschel, Thorsten Grosch, and Hans-Peter Seidel in their 2009 ACM Symposium on Interactive 3D Graphics and Games paper Approximating dynamic global illumination in image space, which describes it as extending SSAO to directional occlusion with one diffuse indirect bounce of light; later literature notes that SSDO still suffers from common screen-space artifacts such as noise and banding. == Method == The original SSDO paper describes a two-pass screen-space approach, with one pass for direct lighting and a second pass for indirect bounces. Later literature describes SSDO as assuming a general shadowing direction that allows color bleeding and a single light bounce.

    Read more →
  • Secure environment

    Secure environment

    In computing, a secure environment is any system which implements the controlled storage and use of information. In the event of computing data loss, a secure environment is used to protect personal or confidential data. It may also be known as a trusted execution environment (TEE). Often, secure environments employ cryptography as a means to protect information. This is typically used for processing confidential or restricted information. Some secure environments employ cryptographic hashing, simply to verify that the information has not been altered since it was last modified.

    Read more →
  • Hit-testing

    Hit-testing

    In computer graphics programming, hit-testing (hit detection, picking, or pick correlation) is the process of determining whether a user-controlled cursor (such as a mouse cursor or touch-point on a touch-screen interface) intersects a given graphical object (such as a shape, line, or curve) drawn on the screen. Hit-testing may be performed on the movement or activation of a mouse or other pointing device. Hit-testing is used by GUI environments to respond to user actions, such as selecting a menu item or a target in a game based on its visual location. In web programming languages such as HTML, SVG, and CSS, this is associated with the concept of pointer-events (e.g. user-initiated cursor movement or object selection). Collision detection is a related concept for detecting intersections of two or more different graphical objects, rather than intersection of a cursor with one or more graphical objects. == Algorithm == There are many different algorithms that may be used to perform hit-testing, with different performance or accuracy outcomes. One common hit-test algorithm for axis aligned bounding boxes. A key idea is that the box being tested must be either entirely above, entirely below, entirely to the right or left of the current box. If this is not possible, they are colliding. Example logic is presented in the pseudo-code below: In Python:

    Read more →
  • Lingua Libre

    Lingua Libre

    Lingua Libre is an online collaborative project and tool by the Wikimédia France association, which aims to build a collaborative, multilingual, audiovisual speech corpus under a free license. It mostly consists of a rapid recording online service which allows the user to chain hundreds of recordings. Contributors have produced content in 310+ languages. == Description == Lingua Libre enables the recording of words, phrases or sentences of any language, oral (audio recording) or signed (video recording). Words are presented to the speaker in the form of a list, created on the spot, in advance, or by reusing an existing Wikimedia category. The speaker simply reads the word displayed on the screen, and the software moves on to the next word when it detects a silence after the read word. This principle, borrowed from the open source software Shtooka recorder with the help of its creator, Nicolas Vion, makes it possible to record several hundreds of words per hour. The recordings are then uploaded automatically from the web client to the Wikimedia Commons media library. In spring 2021, Lingua Libre was offline due to a fire in Strasbourg, but no audio recordings were lost. === Use of the recordings === The recordings can be consulted either on Lingua Libre or on Commons. They are mainly used on other Wikimedia projects, for example to illustrate entries on Wiktionaries or proper nouns in Wikipedia articles. The re-use of the recordings in a language teaching context is envisaged. Language learners can freely download pronunciations and use them on GoldenDict, a popular dictionary software. Thus, audio recordings can be used as “Pronunciation Dictionaries” on GoldenDict without needing internet connection. The recordings are also reused in Natural Language Processing projects, for example to drive Mozilla's DeepSpeech speech recognition engines. == Versions == Lingua Libre was initiated on January 23, 2015 and has had three successive versions: === Lingua Libre v.1 (2016) === As part of the Languages of France project, which aims to document and promote the regional languages of France on Wikimedia and Internet projects in general, the conception of Lingua Libre started in November 2015, partly funded by the DGLFLF (General Delegation for the French language and the languages of France). The first version of the project was launched in August 2016. Only suitable for audio recording, Lingua Libre was shown during a workshop on Occitan language in December 2016, and then presented to the online Wikimedia community and at international events in 2017. === Lingua Libre v.2 (2018) === A complete rebuilding was launched at the end of 2017. The new version of Lingua Libre is based on MediaWiki, uses Wikibase and OAuth to better integrate into the Wikimedia environment. The interface is translated via Translatewiki.net so that the project can be used by a large number of communities. The new version of the site was ready in June 2018 and opened to the public in August 2018. === Lingua Libre v.2.2 (2020) === In 2020, important changes were made to the platform; a new look was developed especially for the site, the .org domain replaced the .fr domain used until then, and added support for sign languages through video recording. == Statistics == In the first two years of the project's launch, approximately 10,000 recordings were made. The transition to v.2 was accompanied by a sharp increase in the contributions. The number of recordings multiplied by 10 in less than a year, exceeding the 100,000 threshold in May 2019. These recordings were made by 127 speakers in almost 50 languages. By September 2020, the platform had more than 300,000 recordings in 90 languages with more than 350 speakers. The 500,000 recordings milestone was reached in June 2021, thanks to 540 speakers of 120 languages.

    Read more →
  • Confidential computing

    Confidential computing

    Confidential computing is a security and privacy-enhancing computational technique focused on protecting data in use. Confidential computing can be used in conjunction with storage and network encryption, which protect data at rest and data in transit respectively. It is designed to address software, protocol, cryptographic, and basic physical and supply-chain attacks, although some critics have demonstrated architectural and side-channel attacks effective against the technology. The technology protects data in use by performing computations in a hardware-based trusted execution environment (TEE). Confidential data is released to the TEE only once it is assessed to be trustworthy. Different types of confidential computing define the level of data isolation used, whether virtual machine, application, or function, and the technology can be deployed in on-premise data centers, edge locations, or the public cloud. It is often compared with other privacy-enhancing computational techniques such as fully homomorphic encryption, secure multi-party computation, and Trusted Computing. Confidential computing is promoted by the Confidential Computing Consortium (CCC) industry group, whose membership includes major providers of the technology. == Properties == Trusted execution environments (TEEs) "prevent unauthorized access or modification of applications and data while they are in use, thereby increasing the security level of organizations that manage sensitive and regulated data". Trusted execution environments can be instantiated on a computer's processing components such as a central processing unit (CPU) or a graphics processing unit (GPU). In their various implementations, TEEs can provide different levels of isolation including virtual machine, individual application, or compute functions. Typically, data in use in a computer's compute components and memory exists in a decrypted state and can be vulnerable to examination or tampering by unauthorized software or administrators. According to the CCC, confidential computing protects data in use through a minimum of three properties: Data confidentiality: "Unauthorized entities cannot view data while it is in use within the TEE". Data integrity: "Unauthorized entities cannot add, remove, or alter data while it is in use within the TEE". Code integrity: "Unauthorized entities cannot add, remove, or alter code executing in the TEE". In addition to trusted execution environments, remote cryptographic attestation is an essential part of confidential computing. The attestation process assesses the trustworthiness of a system and helps ensure that confidential data is released to a TEE only after it presents verifiable evidence that it is genuine and operating with an acceptable security posture. It allows the verifying party to assess the trustworthiness of a confidential computing environment through an "authentic, accurate, and timely report about the software and data state" of that environment. "Hardware-based attestation schemes rely on a trusted hardware component and associated firmware to execute attestation routines in a secure environment". Without attestation, a compromised system could deceive others into trusting it, claim it is running certain software in a TEE, and potentially compromise the confidentiality or integrity of the data being processed or the integrity of the trusted code. == Technical approaches == Technical approaches to confidential computing may vary in which software, infrastructure and administrator elements are allowed to access confidential data. The "trust boundary," which circumscribes a trusted computing base (TCB), defines which elements have the potential to access confidential data, whether they are acting benignly or maliciously. Confidential computing implementations enforce the defined trust boundary at a specific level of data isolation. The three main types of confidential computing are: Virtual machine isolation Application isolation, also known as process isolation Function isolation, also known as library isolation Virtual machine isolation removes the elements controlled by the computer infrastructure or cloud provider, but allows potential data access by elements inside a virtual machine running on the infrastructure. Application or process isolation permits data access only by authorized software applications or processes. Function or library isolation is designed to permit data access only by authorized subroutines or modules within a larger application, blocking access by any other system element, including unauthorized code in the larger application. == Threat model == As confidential computing is concerned with the protection of data in use, only certain threat models can be addressed by this technique. Other types of attacks are better addressed by other privacy-enhancing technologies. === In scope === The following threat vectors are generally considered in scope for confidential computing: Software attacks: including attacks on the host’s software and firmware. This may include the operating system, hypervisor, BIOS, other software and workloads. Protocol attacks: including "attacks on protocols associated with attestation as well as workload and data transport". This includes vulnerabilities in the "provisioning or placement of the workload" or data that could cause a compromise. Cryptographic attacks: including "vulnerabilities found in ciphers and algorithms due to a number of factors, including mathematical breakthroughs, availability of computing power and new computing approaches such as quantum computing". The CCC notes several caveats in this threat vector, including relative difficulty of upgrading cryptographic algorithms in hardware and recommendations that software and firmware be kept up-to-date. A multi-faceted, defense-in-depth strategy is recommended as a best practice. Basic physical attacks: including cold boot attacks, bus and cache snooping and plugging attack devices into an existing port, such as a PCI Express slot or USB port. Basic upstream supply-chain attacks: including attacks that would compromise TEEs through changes such as added debugging ports. The degree and mechanism of protection against these threats varies with specific confidential computing implementations. === Out of scope === Threats generally defined as out of scope for confidential computing include: Sophisticated physical attacks: including physical attacks that "require long-term and/or invasive access to hardware" such as chip scraping techniques and electron microscope probes. Upstream hardware supply-chain attacks: including attacks on the CPU manufacturing process, CPU supply chain in key injection/generation during manufacture. Attacks on components of a host system that are not directly providing the capabilities of the trusted execution environment are also generally out-of-scope. Availability attacks: confidential computing is designed to protect the confidentiality and integrity of protected data and code. It does not address availability attacks such as Denial of Service or Distributed Denial of Service attacks. == Use cases == Confidential computing can be deployed in the public cloud, on-premise data centers, or distributed "edge" locations, including network nodes, branch offices, industrial systems and others. === Data privacy and security === Confidential computing protects the confidentiality and integrity of data and code from the infrastructure provider, unauthorized or malicious software and system administrators, and other cloud tenants, which may be a concern for organizations seeking control over sensitive or regulated data. The additional security capabilities offered by confidential computing can help accelerate the transition of more sensitive workloads to the cloud or edge locations. === Multi-party analytics === Confidential computing can enable multiple parties to engage in joint analysis using confidential or regulated data inside a TEE while preserving privacy and regulatory compliance. In this case, all parties benefit from the shared analysis, but no party's sensitive data or confidential code is exposed to the other parties or system host. Examples include multiple healthcare organizations contributing data to medical research, or multiple banks collaborating to identify financial fraud or money laundering. Oxford University researchers proposed the alternative paradigm called "Confidential Remote Computing" (CRC), which supports confidential operations in Trusted Execution Environments across endpoint computers considering multiple stakeholders as mutually distrustful data, algorithm and hardware providers. === Confidential generative AI === Confidential computing technologies can be applied to various stages of a generative AI deployments to help increase data or model privacy, security, and regulatory compliance. TEEs and remote attestation can protect the integrity of data during AI model training, keep

    Read more →
  • Sample (graphics)

    Sample (graphics)

    In computer graphics, a sample is an intersection of a channel and a pixel. The diagram below depicts a 24-bit pixel, consisting of 3 samples for Red, Green, and Blue. In this particular diagram, the Red sample occupies 9 bits, the Green sample occupies 7 bits and the Blue sample occupies 8 bits, totaling 24 bits per pixel. Note that the samples do not have to be equal size and not all samples are mandatory in a pixel. Also, a pixel can consist of more than 3 samples (e.g. 4 samples of the RGBA color space). A sample is related to a subpixel on a physical display.

    Read more →
  • Immediate mode (computer graphics)

    Immediate mode (computer graphics)

    Immediate mode is an API design pattern in computer graphics libraries, in which the client calls directly cause rendering of graphics objects to the display, or in which the data to describe rendering primitives is inserted frame by frame directly from the client into a command list (in the case of immediate mode primitive rendering), without the use of extensive indirection – thus immediate – to retained resources. It does not preclude the use of double-buffering. Retained mode is an alternative approach. Historically, retained mode has been the dominant style in GUI libraries; however, both can coexist in the same library and are not necessarily exclusive in practice. == Overview == In immediate mode, the scene (complete object model of the rendering primitives) is retained in the memory space of the client, instead of the graphics library. This implies that in an immediate mode application, the lists of graphical objects to be rendered are kept by the client and are not saved by the graphics library API. The application must re-issue all drawing commands required to describe the entire scene each time a new frame is required, regardless of actual changes. This method provides on the one hand a maximum of control and flexibility to the application program, but on the other hand it also generates continuous work load on the CPU. Examples of immediate mode rendering systems include Direct2D, OpenGL and Quartz. There are some immediate mode GUIs that are particularly suitable when used in conjunction with immediate mode rendering systems. == Immediate mode primitive rendering == Primitive vertex attribute data may be inserted frame by frame into a command buffer by a rendering API. This involves significant bandwidth and processor time (especially if the graphics processing unit is on a separate bus), but may be advantageous for data generated dynamically by the CPU. It is less common since the advent of increasingly versatile shaders, with which a graphics processing unit may generate increasingly complex effects without the need for CPU intervention. == Immediate mode rendering with vertex buffers == Although drawing commands have to be re-issued for each new frame, modern systems using this method are generally able to avoid the unnecessary duplication of more memory-intensive display data by referring to that unchanging data (via indirection) (e.g. textures and vertex buffers) in the drawing commands. == Immediate mode GUI == Graphical user interfaces traditionally use retained mode-style API design, but immediate mode GUIs instead use an immediate mode-style API design, in which user code directly specifies the GUI elements to draw in the user input loop. For example, rather than having a CreateButton() function that a user would call once to instantiate a button, an immediate-mode GUI API may have a DoButton() function which should be called whenever the button should be on screen. The technique was developed by Casey Muratori in 2002. Prominent implementations include Omar Cornut's Dear ImGui in C++, Nic Barker's Clay in C and Micha Mettke's Nuklear in C.

    Read more →
  • Software configuration management

    Software configuration management

    Software configuration management (SCM), a.k.a. software change and configuration management (SCCM), is the software engineering practice of tracking and controlling changes to a software system. It is part of the larger cross-disciplinary field of configuration management (CM). SCM includes version control and the establishment of baselines. == Goals == The goals of SCM include: Configuration identification - Identifying configurations, configuration items and baselines. Configuration control - Implementing a controlled change process. This is usually achieved by setting up a change control board whose primary function is to approve or reject all change requests that are sent against any baseline. Configuration status accounting - Recording and reporting all the necessary information on the status of the development process. Configuration auditing - Ensuring that configurations contain all their intended parts and are sound with respect to their specifying documents, including requirements, architectural specifications and user manuals. Build management - Managing the process and tools used for builds. Process management - Ensuring adherence to the organization's development process. Environment management - Managing the software and hardware that host the system. Teamwork - Facilitate team interactions related to the process. Defect tracking - Making sure every defect has traceability back to the source. With the introduction of cloud computing and DevOps the purposes of SCM tools have become merged in some cases. The SCM tools themselves have become virtual appliances that can be instantiated as virtual machines and saved with state and version. The tools can model and manage cloud-based virtual resources, including virtual appliances, storage units, and software bundles. The roles and responsibilities of the actors have become merged as well with developers now being able to dynamically instantiate virtual servers and related resources. == History == == Examples == Ansible – Open-source software platform for remote configuring and managing computers CFEngine – Configuration management software Chef – Configuration management toolPages displaying short descriptions of redirect targets LCFG – Computer configuration management system NixOS – Linux distribution OpenMake Software – DevOps company Otter Puppet – Open source configuration management software Salt – Configuration management software Rex – Open source software

    Read more →
  • Site Security Handbook

    Site Security Handbook

    The Site Security Handbook, RFC 2196, is a guide on setting computer security policies and procedures for sites that have systems on the Internet (however, the information provided should also be useful to sites not yet connected to the Internet). The guide lists issues and factors that a site must consider when setting their own policies. It makes a number of recommendations and provides discussions of relevant areas. This guide is only a framework for setting security policies and procedures. In order to have an effective set of policies and procedures, a site will have to make many decisions, gain agreement, and then communicate and implement these policies. The guide is a product of the IETF SSH working group, and was published in 1997, obsoleting the earlier RFC 1244 from 1991.

    Read more →
  • Record sealing

    Record sealing

    Record sealing is the process of making public records inaccessible to the public. In many cases, a person with a sealed record gains the legal right to deny or not acknowledge anything to do with the arrest and the legal proceedings from the case itself. Records are commonly sealed in a number of situations: Sealed birth records (typically after adoption or determination of paternity) Juvenile criminal records may be sealed Other types of cases involving juveniles may be sealed, anonymized, or pseudonymized ("impounded"); e.g., child sex offense or custody cases Cases using witness protection information may be partly sealed Cases involving trade secrets Cases involving state secrets == Filing under seal in US court == Normally, records should not be filed under seal without a court permission. However, FRCP 5.2 requires that sensitive text – like Social Security number, Taxpayer Identification Number, birthday, bank accounts, and children’s names – should be redacted off the filings made with the court and accompanying exhibits. A person making a redacted filing can file an unredacted copy under seal, or the Court can choose to order later that an additional filing be made under seal without redaction. Alternately, the filing party may ask the court’s permission to file some exhibits completely under seal. When the document is filed "under seal", it should have a clear indication for the court clerk to file it separately – most often by stamping words "Filed Under Seal" on the bottom of each page. Person making filing should also provide instructions to the court clerk that the document needs to be filed "under seal". Courts often have specific requirements to these filings in their Local Rules. == Difference from expungement == Expungement, which is a physical destruction, namely a complete erasure of one's criminal records, and therefore usually carries a higher standard, differs from record sealing, which is only to restrict the public's access to records, so that only certain law enforcement agencies or courts, under special circumstances, will have access to them. A record seal will greatly improve the chance of employment, as employers will not have access to damning records. There are occasions, like expungement, where one can truthfully state under oath that they have never been convicted before. Most of the time, a record seal has more relaxed requirements than an expungement. If an expungement is not allowed with a case, then sealing a record may be the best bet. Different states have different terms for what constitutes sealing of a record. == Cybersecurity incidents involving sealed records == Several cybersecurity incidents have demonstrated that sealed court documents are not always secure in practice, with vulnerabilities and data breaches exposing sensitive information. In January 2021, following the SolarWinds cyber attack, the U.S. Bankruptcy Court United States District Court for the District of Nevada announced that its Case Management/Electronic Case Files CM/ECF system had been potentially compromised. The judiciary stated that additional safeguards were being implemented to protect filings, and that the review of the incident and its impact was ongoing. Reports noted that the breach raised concerns about exposure of highly sensitive and sealed documents submitted through the CM/ECF system. In 2023, security researcher Jason Parker, following a tip from an activist, identified flaws in online court systems that exposed sealed records including confidential testimony and medical records through publicly accessible portals. In 2024, a cyber intrusion targeting attorneys in a civil case involving Representative Matt Gaetz led to the unauthorized access and leak of sealed depositions and related records. The breach exposed confidential testimony and financial records, some of which were later reported by news outlets, raising concerns about the security of electronically stored legal materials and the handling of sealed filings. In 2025, multiple reports confirmed that the federal judiciary's CM/ECF and PACER (law) filing system was compromised, exposing sealed indictments, confidential informant information, and other sensitive filings. Some courts temporarily reverted to paper-based filing to mitigate the risks of further disclosure. The FBI later confirmed that the breach had exposed sealed records, and investigators suspected foreign state actors were involved. == GAO publications referencing sealed records == Closed Criminal Plea and Sentencing Proceedings (1983) – Reviewed Department of Justice policies on closing plea and sentencing hearings. GAO noted that sealed transcripts should be unsealed once the reasons for closure no longer applied. Information on Plea Agreements and Settlements in Defense Procurement Fraud Cases (1992) – Examined outcomes of procurement fraud prosecutions. GAO observed that in some instances the results were sealed from public access. Military Recruiting: More Needs to Be Done to Better Screen Applicants and Detect Fraud (1999) – Investigated fraudulent enlistments in the armed forces. The report highlighted that sealed juvenile records often prevented recruiters from discovering prior offenses. Social Security Numbers: Governments Could Do More to Reduce Display in Public Records (2004) – Analyzed risks associated with SSN availability in state and local records. GAO pointed out that some categories of records, such as adoption proceedings, were sealed and less likely to expose identifiers. Social Security Numbers: Stronger Safeguards Needed to Protect Privacy (2005 testimony) – Testimony before Congress reiterating concerns over SSN exposure in public records, while noting that sealed categories (e.g., adoption) were exceptions. U.S. Supreme Court: Policies and Perspectives on Video and Audio Coverage of Appellate Court Proceedings (2016) – Surveyed appellate court policies on courtroom media coverage. The report acknowledged distinctions between public filings, confidential submissions, and sealed materials. Evictions: National Data Are Limited and Challenging to Collect (2024) – Examined nationwide eviction data. GAO reported that in some states eviction records may be sealed or expunged, limiting researchers' ability to compile datasets. DOD Fraud Risk Management: Enhanced Data and Collaboration Could Improve Efforts (2024) – Reviewed Department of Defense fraud-risk management. GAO noted that some adjudicative records in its dataset were sealed, restricting completeness of oversight data.

    Read more →
  • Site Security Handbook

    Site Security Handbook

    The Site Security Handbook, RFC 2196, is a guide on setting computer security policies and procedures for sites that have systems on the Internet (however, the information provided should also be useful to sites not yet connected to the Internet). The guide lists issues and factors that a site must consider when setting their own policies. It makes a number of recommendations and provides discussions of relevant areas. This guide is only a framework for setting security policies and procedures. In order to have an effective set of policies and procedures, a site will have to make many decisions, gain agreement, and then communicate and implement these policies. The guide is a product of the IETF SSH working group, and was published in 1997, obsoleting the earlier RFC 1244 from 1991.

    Read more →
  • Exposure Notification

    Exposure Notification

    The (Google/Apple) Exposure Notification System (GAEN) is a framework and protocol specification developed by Apple Inc. and Google to facilitate digital contact tracing during the COVID-19 pandemic. When used by health authorities, it augments more traditional contact tracing techniques by automatically logging close approaches among notification system users using Android or iOS smartphones. Exposure Notification is a decentralized reporting protocol built on a combination of Bluetooth Low Energy technology and privacy-preserving cryptography. It is an opt-in feature within COVID-19 apps developed and published by authorized health authorities. Unveiled on April 10, 2020, it was made available on iOS on May 20, 2020, as part of the iOS 13.5 update and on December 14, 2020, as part of the iOS 12.5 update for older iPhones. On Android, it was added to devices via a Google Play Services update, supporting all versions since Android Marshmallow. The Apple/Google protocol is similar to the Decentralized Privacy-Preserving Proximity Tracing (DP-3T) protocol created by the European DP-3T consortium and the Temporary Contact Number (TCN) protocol by Covid Watch, but is implemented at the operating system level, which allows for more efficient operation as a background process. Since May 2020, a variant of the DP-3T protocol is supported by the Exposure Notification Interface. Other protocols are constrained in operation because they are not privileged over normal apps. This leads to issues, particularly on iOS devices where digital contact tracing apps running in the background experience significantly degraded performance. The joint approach is also designed to maintain interoperability between Android and iOS devices, which constitute nearly all of the market. The ACLU stated the approach "appears to mitigate the worst privacy and centralization risks, but there is still room for improvement". In late April, Google and Apple shifted the emphasis of the naming of the system, describing it as an "exposure notification service", rather than "contact tracing" system. == Technical specification == Digital contact tracing protocols typically have two major responsibilities: encounter logging and infection reporting. Exposure Notification only involves encounter logging which is a decentralized architecture. The majority of infection reporting is centralized in individual app implementations. To handle encounter logging, the system uses Bluetooth Low Energy to send tracking messages to nearby devices running the protocol to discover encounters with other people. The tracking messages contain unique identifiers that are encrypted with a secret daily key held by the sending device. These identifiers change every 15–20 minutes as well as Bluetooth MAC address in order to prevent tracking of clients by malicious third parties through observing static identifiers over time. The sender's daily encryption keys are generated using a random number generator. Devices record received messages, retaining them locally for 14 days. If a user tests positive for infection, the last 14 days of their daily encryption keys can be uploaded to a central server, where it is then broadcast to all devices on the network. The method through which daily encryption keys are transmitted to the central server and broadcast is defined by individual app developers. The Google-developed reference implementation calls for a health official to request a one-time verification code (VC) from a verification server, which the user enters into the encounter logging app. This causes the app to obtain a cryptographically signed certificate, which is used to authorize the submission of keys to the central reporting server. The received keys are then provided to the protocol, where each client individually searches for matches in their local encounter history. If a match meeting certain risk parameters is found, the app notifies the user of potential exposure to the infection. Google and Apple intend to use the received signal strength (RSSI) of the beacon messages as a source to infer proximity. RSSI and other signal metadata will also be encrypted to resist deanonymization attacks. === Version 1.0 === To generate encounter identifiers, first a persistent 32-byte private Tracing Key ( t k {\displaystyle tk} ) is generated by a client. From this a 16 byte Daily Tracing Key is derived using the algorithm d t k i = H K D F ( t k , N U L L , 'CT-DTK' | | D i , 16 ) {\displaystyle dtk_{i}=HKDF(tk,NULL,{\text{'CT-DTK'}}||D_{i},16)} , where H K D F ( Key, Salt, Data, OutputLength ) {\displaystyle HKDF({\text{Key, Salt, Data, OutputLength}})} is a HKDF function using SHA-256, and D i {\displaystyle D_{i}} is the day number for the 24-hour window the broadcast is in starting from Unix Epoch Time. These generated keys are later sent to the central reporting server should a user become infected. From the daily tracing key a 16-byte temporary Rolling Proximity Identifier is generated every 10 minutes with the algorithm R P I i , j = Truncate ( H M A C ( d t k i , 'CT-RPI' | | T I N j ) , 16 ) {\displaystyle RPI_{i,j}={\text{Truncate}}(HMAC(dtk_{i},{\text{'CT-RPI'}}||TIN_{j}),16)} , where H M A C ( Key, Data ) {\displaystyle HMAC({\text{Key, Data}})} is a HMAC function using SHA-256, and T I N j {\displaystyle TIN_{j}} is the time interval number, representing a unique index for every 10 minute period in a 24-hour day. The Truncate function returns the first 16 bytes of the HMAC value. When two clients come within proximity of each other they exchange and locally store the current R P I i , j {\displaystyle RPI_{i,j}} as the encounter identifier. Once a registered health authority has confirmed the infection of a user, the user's Daily Tracing Key for the past 14 days is uploaded to the central reporting server. Clients then download this report and individually recalculate every Rolling Proximity Identifier used in the report period, matching it against the user's local encounter log. If a matching entry is found, then contact has been established and the app presents a notification to the user warning them of potential infection. === Version 1.1 === Unlike version 1.0 of the protocol, version 1.1 does not use a persistent tracing key, rather every day a new random 16-byte Temporary Exposure Key ( t e k i {\displaystyle tek_{i}} ) is generated. This is analogous to the daily tracing key from version 1.0. Here i {\displaystyle i} denotes the time is discretized in 10 minute intervals starting from Unix Epoch Time. From this two 128-bit keys are calculated, the Rolling Proximity Identifier Key ( R P I K i {\displaystyle RPIK_{i}} ) and the Associated Encrypted Metadata Key ( A E M K i {\displaystyle AEMK_{i}} ). R P I K i {\displaystyle RPIK_{i}} is calculated with the algorithm R P I K i = H K D F ( t e k i , N U L L , 'EN-RPIK' , 16 ) {\displaystyle RPIK_{i}=HKDF(tek_{i},NULL,{\text{'EN-RPIK'}},16)} , and A E M K i {\displaystyle AEMK_{i}} using the algorithm A E M K i = H K D F ( t e k i , N U L L , 'EN-AEMK' , 16 ) {\displaystyle AEMK_{i}=HKDF(tek_{i},NULL,{\text{'EN-AEMK'}},16)} . From these values a temporary Rolling Proximity Identifier ( R P I i , j {\displaystyle RPI_{i,j}} ) is generated every time the BLE MAC address changes, roughly every 15–20 minutes. The following algorithm is used: R P I i , j = A E S 128 ( R P I K i , 'EN-RPI' | | 0 x 000000000000 | | E N I N j ) {\displaystyle RPI_{i,j}=AES128(RPIK_{i},{\text{'EN-RPI'}}||{\mathtt {0x000000000000}}||ENIN_{j})} , where A E S 128 ( Key, Data ) {\displaystyle AES128({\text{Key, Data}})} is an AES cryptography function with a 128-bit key, the data is one 16-byte block, j {\displaystyle j} denotes the Unix Epoch Time at the moment the roll occurs, and E N I N j {\displaystyle ENIN_{j}} is the corresponding 10-minute interval number. Next, additional Associated Encrypted Metadata is encrypted. What the metadata represents is not specified, likely to allow the later expansion of the protocol. The following algorithm is used: Associated Encrypted Metadata i , j = A E S 128 _ C T R ( A E M K i , R P I i , j , Metadata ) {\displaystyle {\text{Associated Encrypted Metadata}}_{i,j}=AES128\_CTR(AEMK_{i},RPI_{i,j},{\text{Metadata}})} , where A E S 128 _ C T R ( Key, IV, Data ) {\displaystyle AES128\_CTR({\text{Key, IV, Data}})} denotes AES encryption with a 128-bit key in CTR mode. The Rolling Proximity Identifier and the Associated Encrypted Metadata are then combined and broadcast using BLE. Clients exchange and log these payloads. Once a registered health authority has confirmed the infection of a user, the user's Temporary Exposure Keys t e k i {\displaystyle tek_{i}} and their respective interval numbers i {\displaystyle i} for the past 14 days are uploaded to the central reporting server. Clients then download this report and individually recalculate every Rolling Proximity Identifier starting from interval number i {\displaystyle i} ,

    Read more →
  • Unspent transaction output

    Unspent transaction output

    In cryptocurrencies, an unspent transaction output (UTXO, often capitalized as UTxO) is a distinctive element in a subset of digital currency models. A UTXO represents a certain amount of cryptocurrency that has been authorized by a sender and is available to be spent by a recipient. The utilization of UTXOs in transaction processes is a key feature of many cryptocurrencies, but it primarily characterizes those implementing the UTXO model. UTXOs employ public key cryptography to ascertain and transfer ownership. More specifically, the recipient's public key is formatted into the UTXO, thereby limiting the capability to spend the UTXO to the account that can demonstrate ownership of the corresponding private key. A valid digital signature associated with the public key must be included for the UTXO to be spent. In the UTXO model, each unit of currency is treated as a discrete object. The history of a UTXO is documented only within the blocks where it is transferred. To ascertain the total balance of an account, one must scan each block to find the latest UTXOs linked to that account. While all nodes within a blockchain network must consent to the block history, the blocks relevant to an account's balance are unique to that account. UTXOs constitute a chain of ownership depicted as a series of digital signatures dating back to the coin's inception, regardless of whether the coin was minted via mining, staking, or another procedure determined by the cryptocurrency protocol. The UTXO model was invented for Bitcoin. Cardano uses an extended version of the UTXO model known as EUTXO. == Origins == The conceptual framework of the UTXO model can be traced back to Hal Finney's Reusable Proofs of Work proposal, which itself was based on Adam Back's 1997 Hashcash proposal. Bitcoin, released in 2009, was the first widespread implementation of the UTXO model in practice. == UTXO model vs. account Model == Cryptocurrencies that utilize the UTXO model function differently compared to those using the account model. In the UTXO model, individual units of cryptocurrency, termed as unspent transaction outputs (UTXOs), are transferred between users, analogous to the exchange of physical cash. This model impacts how transactions and ownership are recorded and verified within the blockchain network. The account model preserves a record of each account and its corresponding balance for every block added to the network. This setup enables quicker balance verification without the need to scan historical blocks, but it increases the raw size of each block (though data compression techniques can be utilized to alleviate this). However, both models necessitate the inspection of past blocks to fully authenticate the origin of coins. In the UTXO model, each object is immutable - units of coins cannot be 'edited' in the same way an account balance is modified when a transaction occurs. Rather, the balance is computed from the transaction history dating back to when the coins were first minted. This simplicity enhances security, as a UTXO either exists in its anticipated form or it does not. In contrast, the account model requires meticulous verification of the account's status during transactions, which can lead to oversights if not conducted correctly. In valid blockchain transactions, only unspent outputs (UTXOs) are permissible for funding subsequent transactions. This requirement is critical to prevent double-spending and fraud. Accordingly, inputs in a transaction are removed from the UTXO set, while outputs create new UTXOs that are added to the set. The holders of private keys, such as those with cryptocurrency wallets, can utilize these UTXOs for future transactions.

    Read more →
  • Variable data publishing

    Variable data publishing

    Variable-data publishing (VDP) (also known as database publishing) is a term referring to the output of a variable composition system. While these systems can produce both electronically viewable and hard-copy (print) output, the "variable-data publishing" term today often distinguishes output destined for electronic viewing, rather than that which is destined for hard-copy print (e.g. variable data printing). Essentially the same techniques are employed to perform variable-data publishing, as those utilized with variable data printing. The difference is in the interpretation for output. While variable-data printing may be interpreted to produce various print streams or page-description files (e.g. AFP/IPDS, PostScript, PCL), variable-data publishing produces electronically viewable files, most commonly seen in the forms of PDF, HTML, or XML. Variable-data composition involves the use of data to conditionally: exhibit text (static blocks and/or variable content) exhibit images select fonts select colors format page layouts & flows Variable-data may be as simple as an address block or salutation. However, it can be any or all of the document's textual content—including words, sentences, paragraphs, pages, or the entire document. In other words, it can make up as little or as much of the document as the composer desires. Variable data may also be used to exhibit various images, such as logos, products, or membership photos. Further, variable-data can be used to build rule-based design schemes, including fonts, colors, and page formats. The possibilities are vast. The variable-data tools available today, make it possible to perform variable-data composition at nearly every stage of document production. However, the level of control that can be achieved varies, based upon how far into the document production process a variable-data tool is deployed. For example, if variable-data insertion occurs just prior to output...it's not likely that the text flow or layout can be altered with nearly as much control as would be available at the time of initial document composition. Many organizations will produce multiple forms of output (aka: multi-channel output), for the same document. This ensures that the published content is available to recipients via any form of access method they might require. When multi-channel output is utilized, integrity between those output channels often becomes important. Variable-data publishing may be performed on everything from a personal computer to a mainframe system. However, the speed and practical output volumes which can be achieved are directly affected by the computer power utilized. == Origin of the concept == The term variable-data publishing was likely an offshoot of the term "variable-data printing", first introduced to the printing industry by Frank Romano, Professor Emeritus, School of Print Media, at the College of Imaging Arts and Sciences at Rochester Institute of Technology. However, the concept of merging static document elements and variable document elements predates the term and has seen various implementations ranging from simple desktop 'mail merge', to complex mainframe applications in the financial and banking industry. In the past, the term VDP has been most closely associated with digital printing machines. However, in the past 3 years the application of this technology has spread to web pages, emails, and mobile messaging.

    Read more →