AI Face Fusion

AI Face Fusion — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • ALL-IN-1

    ALL-IN-1

    ALL-IN-1 was an office automation product developed and sold by Digital Equipment Corporation in the 1980s. It was one of the first purchasable off the shelf electronic mail products. It was later known as Office Server V3.2 for OpenVMS Alpha and OpenVMS VAX systems before being discontinued. == Overview == ALL-IN-1 was advertised as an office automation system including functionality in Electronic Messaging, Word Processing and Time Management. It offered an application development platform and customization capabilities that ranged from scripting to code-level integration. ALL-IN-1 was designed and developed by Skip Walter, John Churin and Marty Skinner from Digital Equipment Corporation who began work in 1977. Sheila Chance was hired as the software engineering manager in 1981. The first version of the software, called CP/OSS, the Charlotte Package of Office System Services, named after the location of the developers, was released in May 1982. In 1983, the product was renamed ALL-IN-1 and the Charlotte group continued to develop versions 1.1 through 1.3. Digital then made the decision to move most of the development activity to its central engineering facility in Reading, United Kingdom, where a group there took responsibility for the product from version 2.0 (released in field test in 1984 and to customers in 1985) onward. The Charlotte group continued to work on the Time Management subsystem until version 2.3 and other contributions were made from groups based in Sophia Antipolis, France (System for Customization Management and the integration with VAX Notes), Reading (Message Router and MAILbus), and Nashua, New Hampshire (FMS). ALL-IN-1 V3.0 introduced shared file cabinets and the File Cabinet Server (FCS) to lay the foundation for an eventual integration with TeamLinks, Digital's PC office client. Previous integrations with PCs included PC ALL-IN-1, a DOS-based product introduced in 1989 that never proved popular with customers. Bob Wyman was the first product manager. He oversaw the growth of the product culminating in over $2 billion per year in revenue and market leadership in the proprietary office automation sector. Other consultants from Digital Equipment Corporation involved include Frank Nicodem, Donald Vickers and Tony Redmond.

    Read more →
  • Superincreasing sequence

    Superincreasing sequence

    In mathematics, a sequence of positive real numbers ( s 1 , s 2 , . . . ) {\displaystyle (s_{1},s_{2},...)} is called superincreasing if every element of the sequence is greater than the sum of all previous elements in the sequence. Formally, this condition can be written as s n + 1 > ∑ j = 1 n s j {\displaystyle s_{n+1}>\sum _{j=1}^{n}s_{j}} for all n ≥ 1. == Program == The following Python source code tests a sequence of numbers to determine if it is superincreasing: This produces the following output: Sum: 0 Element: 1 Sum: 1 Element: 3 Sum: 4 Element: 6 Sum: 10 Element: 13 Sum: 23 Element: 27 Sum: 50 Element: 52 Is it a superincreasing sequence? True == Examples == (1, 3, 6, 13, 27, 52) is a superincreasing sequence, but (1, 3, 4, 9, 15, 25) is not. The series a^x for a>=2 == Properties == Multiplying a superincreasing sequence by a positive real constant keeps it superincreasing.

    Read more →
  • Data definition specification

    Data definition specification

    In computing, a data definition specification (DDS) is a guideline to ensure comprehensive and consistent data definition. It represents the attributes required to quantify data definition. A comprehensive data definition specification encompasses enterprise data, the hierarchy of data management, prescribed guidance enforcement and criteria to determine compliance. == Overview == A data definition specification may be developed for any organization or specialized field, improving the quality of its products through consistency and transparency. It eliminates redundancy (since all contributing areas are referencing the same specification) and provides standardization and degrees of compliance, making it easier and more efficient to create, modify, verify, analyze and share information across the enterprise. To understand how a data definition specification works in an enterprise, we must look at the elements of a DDS. Writing data definitions, defining business terms (or rules) in the context of a particular environment, provides structure for an organization's data architecture. In developing these definitions, the words used must be traceable to clearly defined data. A data definition specification may be used in the following activities: Business intelligence Business process modeling Business rules management Data analysis and modeling Information architecture Metadata modeling Data mastering Report generation == Criteria == A data definition specification requires data definitions to be: Atomic – singular, describing only one concept. Commonly used and ambiguous terms should be defined. While a term refers to one concept, several words may be used in a term: File – A concept identifiable with one word File extension – A concept identifiable with more than one word Traceable – Mapped to a specific data element. In business, a term may be traced to an entity (for example, a customer) or an attribute (such as a customer's name). A term may be a value in a data set (such as gender), or designate the data set itself. Traceability indicates relationships in the data hierarchy. Consistent - Used in a standard syntax; if used in a specific context, the context is noted Accurate - Precise, correct and unambiguous, stating what the term is and is not Clear - Readily understood by the reader Complete - With the term, its description and contextual references Concise - To avoid circular references == Applications == === Enterprise data === A data definition specification was produced by the Open Mobile Alliance to document charging data. The document, the centralized catalog of data elements defined for interfaces, specifies the mapping of these data elements to protocol fields in the interfaces. Created for the exchange of financial data, Market Data Definition Language (MDDL) is an XML specification designed to enable the interchange of information necessary to account, to analyze, and to trade financial instruments of the world's markets. It defines an XML-based interchange format and common data dictionary on the fields needed to describe: (1) financial instruments, (2) corporate events affecting value and tradability, and (3) market-related, economic and industrial indicators. The principal function of MDDL is to allow entities to exchange market data by standardizing formats and definitions. MDDL provides a common format for market data so that it can be efficiently passed from one processing system to another and provides a common understanding of market data content by standardizing terminology and by normalizing the relationships of various data elements to one another ... From the user perspective, the goal of MDDL is to enable users to integrate data from multiple sources by standardizing both the input feeds used for data warehousing (i.e., define what's being provided by vendors) and the output methods by which client applications request the data (i.e., ensure compatibility on how to get data in and out of applications)." === Clinical submissions === The Clinical Data Interchange Standards Consortium, a global, multidisciplinary, non-profit organization, has established standards to support the acquisition, exchange, submission and archiving of clinical research data and metadata. CDISC standards are vendor-neutral, platform-independent and freely available from the CDISC website. The Case Report Tabulation Data Definition Specification (define.xml) draft version 2.0, the oldest data definition specification, is part of the evolution from the 1999 FDA electronic submission (eSub) guidance and electronic Common Technical Document (eCTD) documents specifying that a document describing the content and structure of included data be included in a submission. Define.xml was developed to automate the review process by generating a machine-readable data-definition document. Define.xml has standardized submissions to the Food and Drug Administration, reducing review times from over two years to several months. === Archival data === A data definition specification is the foundation of metadata for scientific data archiving. The Metadata Encoding and Transmission Standard (METS) uses one principle of a DDS: consistent use of key terms to catalog digital objects for global use. The METS schema is a flexible mechanism for encoding descriptive, administrative and structural metadata for a digital library object and expressing complex links between metadata, and can provide a useful standard for the exchange of digital-library objects between repositories. A similar effort is underway to preserve complex data associated with video-game archiving. Preserving Virtual Worlds attempted to address archival-format deficiencies, citing the lack of suitable documentation for interactive fiction and games at the bit level: specifically, the absence of "representation information" needed to map raw bits into higher-level data constructs. Preserving Virtual Worlds 2 is a research project expanding on initial efforts in this field.

    Read more →
  • IWARP

    IWARP

    iWARP is a computer networking protocol that implements remote direct memory access (RDMA) for efficient data transfer over Internet Protocol networks. Contrary to some accounts, iWARP is not an acronym. Because iWARP is layered on Internet Engineering Task Force (IETF)-standard congestion-aware protocols such as Transmission Control Protocol (TCP) and Stream Control Transmission Protocol (SCTP), it makes few requirements on the network, and can be successfully deployed in a broad range of environments. == History == In 2007, the IETF published five Request for Comments (RFCs) that define iWARP: RFC 5040 A Remote Direct Memory Access Protocol Specification is layered over Direct Data Placement Protocol (DDP). It defines how RDMA Send, Read, and Write operations are encoded using DDP into headers on the network. RFC 5041 Direct Data Placement over Reliable Transports is layered over MPA/TCP or SCTP. It defines how received data can be directly placed into an upper layer protocols receive buffer without intermediate buffers. RFC 5042 Direct Data Placement Protocol (DDP) / Remote Direct Memory Access Protocol (RDMAP) Security analyzes security issues related to iWARP DDP and RDMAP protocol layers. RFC 5043 Stream Control Transmission Protocol (SCTP) Direct Data Placement (DDP) Adaptation defines an adaptation layer that enables DDP over SCTP. RFC 5044 Marker PDU Aligned Framing for TCP Specification defines an adaptation layer that enables preservation of DDP-level protocol record boundaries layered over the TCP reliable connected byte stream. These RFCs are based on the RDMA Consortium's specifications for RDMA over TCP. The RDMA Consortium's specifications are influenced by earlier RDMA standards, including Virtual Interface Architecture (VIA) and InfiniBand (IB). Since 2007, the IETF has published three additional RFCs that maintain and extend iWARP: RFC 6580 IANA Registries for the Remote Direct Data Placement (RDDP) Protocols published in 2012 defines IANA registries for Remote Direct Data Placement (RDDP) error codes, operation codes, and function codes. RFC 6581 Enhanced Remote Direct Memory Access (RDMA) Connection Establishment published in 2011 fixes shortcomings with iWARP connection setup. RFC 7306 Remote Direct Memory Access (RDMA) Protocol Extensions published in 2014 extends RFC 5040 with atomic operations and RDMA Write with Immediate Data. == Protocol == The main component in the iWARP protocol is the Direct Data Placement Protocol (DDP), which permits the actual zero-copy transmission. DDP itself does not perform the transmission; the underlying protocol (TCP or SCTP) does. However, TCP does not respect message boundaries; it sends data as a sequence of bytes without regard to protocol data units (PDU). In this regard, DDP itself may be better suited for SCTP, and indeed the IETF proposed a standard RDMA over SCTP. To run DDP over TCP requires a tweak known as marker PDU aligned (MPA) framing to guarantee boundaries of messages. Furthermore, DDP is not intended to be accessed directly. Instead, a separate RDMA protocol (RDMAP) provides the services to read and write data. Therefore, the entire RDMA over TCP specification is really RDMAP over DDP over either MPA/TCP or SCTP. All of these protocols can be implemented in hardware. Unlike IB, iWARP only has reliable connected communication, as this is the only service that TCP and SCTP provide. The iWARP specification omits other features of IB, such as Send with Immediate Data operations. With RFC 7306, the IETF is working to reduce these omissions. == Implementation == Because a kernel implementation of the TCP stack can be seen as a bottleneck, the protocol is typically implemented in hardware RDMA network interface controllers (rNICs). As simple data losses are rare in tightly coupled network environments, the error-correction mechanisms of TCP may be performed by software while the more frequently performed communications are handled strictly by logic embedded on the rNIC. Similarly, connections are often established entirely by software and then handed off to the hardware. Furthermore, the handling of iWARP specific protocol details is typically isolated from the TCP implementation, allowing rNICs to be used for both as RDMA offload and TCP offload (in support of traditional sockets based TCP/IP applications). The portion of the hardware implementation used for implementing the TCP protocol is known as the TCP Offload Engine (TOE). TOE itself does not prevent copying on the reception side, and must be combined with RDMA hardware for zero-copy results. The RDMA / TCP specification is a set of different wire protocols intended to be implemented in hardware (though it seems feasible to emulate it in software for compatibility but without the performance benefits). == Interfaces == iWARP is a protocol, not an implementation, but defines protocol behavior in terms of the operations that are legal for the protocol, known as Verbs. As such, iWARP does not have any single standard programming interface. However, programming interfaces tend to very closely correspond to the Verbs. Several programmatic interfaces have been proposed, including OpenFabrics Verbs, Network Direct, uDAPL, kDAPL, IT-API, and RNICPI. Implementations of some of these interfaces are available for different platforms, including Windows and Linux. == Services available == Networking services implemented over iWARP include those offered in the OpenFabrics Enterprise Distribution (OFED) by the OpenFabrics Alliance for Linux operating systems, and by Microsoft Windows via Network Direct. NVMe over Fabrics (NVMEoF) iSCSI Extensions for RDMA (iSER) Server Message Block Direct (SMB Direct) Sockets Direct Protocol (SDP) SCSI RDMA Protocol (SRP) Network File System over RDMA (NFS over RDMA) GPUDirect

    Read more →
  • ISPConfig

    ISPConfig

    ISPConfig is an open source hosting control panel for Linux, licensed under BSD license and developed by the company ISPConfig UG. The ISPConfig project was started in autumn 2005 by Till Brehm from the German company projektfarm GmbH. == Overview == Using the dashboard, administrators have the ability to manage websites, email addresses, MySQL and MariaDB as well as PostgreSQL (since version 3.3) databases, FTP accounts, Shell accounts and DNS records through a web-based interface. The software has 4 login levels: administrator, reseller, client, and email-user, each with a different set of permissions. == Operating Systems == ISPConfig is only available on Linux, with CentOS, Debian, and Ubuntu being among the supported distributions. == Features == The following services and features are supported: Management of a single or multiple servers from one control panel. Web server management for Apache HTTP Server and Nginx. Mail server management (with virtual mail users) with spam and antivirus filter using Postfix (software) and Dovecot (software). DNS server management (BIND, Powerdns). Configuration mirroring and clusters. Administrator, reseller, client and mail-user login. Virtual server management for OpenVZ Servers. Website statistics using Webalizer and AWStats

    Read more →
  • Memory-hard function

    Memory-hard function

    In cryptography, a memory-hard function (MHF) is a function that costs a significant amount of memory to efficiently evaluate. It differs from a memory-bound function, which incurs cost by slowing down computation through memory latency. MHFs have found use in key stretching and proof of work as their increased memory requirements significantly reduce the computational efficiency advantage of custom hardware over general-purpose hardware compared to non-MHFs. == Introduction == MHFs are designed to consume large amounts of memory on a computer in order to reduce the effectiveness of parallel computing. In order to evaluate the function using less memory, a significant time penalty is incurred. As each MHF computation requires a large amount of memory, the number of function computations that can occur simultaneously is limited by the amount of available memory. This reduces the efficiency of specialised hardware, such as application-specific integrated circuits and graphics processing units, which utilise parallelisation, in computing a MHF for a large number of inputs, such as when brute-forcing password hashes or mining cryptocurrency. == Motivation and examples == Bitcoin's proof-of-work uses repeated evaluation of the SHA-256 function, but modern general-purpose processors, such as off-the-shelf CPUs, are inefficient when computing a fixed function many times over. Specialized hardware, such as application-specific integrated circuits (ASICs) designed for Bitcoin mining, can use 30,000 times less energy per hash than x86 CPUs whilst having much greater hash rates. This led to concerns about the centralization of mining for Bitcoin and other cryptocurrencies. Because of this inequality between miners using ASICs and miners using CPUs or off-the shelf hardware, designers of later proof-of-work systems utilised hash functions for which it was difficult to construct ASICs that could evaluate the hash function significantly faster than a CPU. As memory cost is platform-independent, MHFs have found use in cryptocurrency mining, such as for Litecoin, which uses scrypt as its hash function. They are also useful in password hashing because they significantly increase the cost of trying many possible passwords against a leaked database of hashed passwords without significantly increasing the computation time for legitimate users. == Measuring memory hardness == There are various ways to measure the memory hardness of a function. One commonly seen measure is cumulative memory complexity (CMC). In a parallel model, CMC is the sum of the memory required to compute a function over every time step of the computation. Other viable measures include integrating memory usage against time and measuring memory bandwidth consumption on a memory bus. Functions requiring high memory bandwidth are sometimes referred to as "bandwidth-hard functions". == Variants == MHFs can be categorized into two different groups based on their evaluation patterns: data-dependent memory-hard functions (dMHF) and data-independent memory-hard functions (iMHF). As opposed to iMHFs, the memory access pattern of a dMHF depends on the function input, such as the password provided to a key derivation function. Examples of dMHFs are scrypt and Argon2d, while examples of iMHFs are Argon2i and catena. Many of these MHFs have been designed to be used as password hashing functions because of their memory hardness. A notable problem with dMHFs is that they are prone to side-channel attacks such as cache timing. This has resulted in a preference for using iMHFs when hashing passwords. However, iMHFs have been mathematically proven to have weaker memory hardness properties than dMHFs.

    Read more →
  • Azuqua

    Azuqua

    Azuqua is an American cloud-based integration and automation company headquartered in Seattle, Washington. As such, they integrate SaaS applications and create automations that are designed to eliminate manual work. Azuqua's platform has the ability to set up workflows between multiple applications so disparate teams can stay in the loop. Azuqua's customers include companies such as Charles Schwab, General Electric, General Motors, HubSpot, and Airbnb. == History == Nikhil Hasija and Craig Unger founded Azuqua in 2011. In 2013, the team participated in Techstars Microsoft's Windows Azure Accelerator, a Seattle-based incubator that helps entrepreneurs gain traction through deep mentor engagement and rapid iteration cycles. Azuqua announced in 2014 that they have received their Series A funding from Ignition Partners which amounted to $5 million. 2017 included a 65% growth in new customers, a doubling of new SaaS connectors, and a 50% growth in overall employee headcount. Azuqua also received their Series B funding which totaled to $10.8 million. This funding was led by Insight Ventures Partners, with DFJ and Ignition Partners also joining the round In March 2018, Azuqua hired Todd Owens as CEO. Owens was previously CEO of Appuri, a customer data platform. Hasija has transitioned to the role of Chief Product Officer. Azuqua also hired on Dan Kogan who has taken on the role of Chief Marketing Officer. Kogan previously worked at Tableau, a BI and analytics company, as a Senior Director of Product Marketing. Okta acquired Azuqua in 2019. == Product Description/Features == Logic Library: Logic functions that can be used for data processing, branching logic, and business rules Drag and Drop Visual Designer: No-code visual designer Use of API's for each cloud service a business is using to allow the various apps to communicate and share data API Publishing: Integrations and automations can be made available as secure endpoints, webhooks, or open services Connector Builder: Build a connector to an application Connector Library: Pre-built connectors to SaaS applications Error Handling: Automations that execute when an error is detected

    Read more →
  • Data profiling

    Data profiling

    Data profiling is the process of examining the data available from an existing information source (e.g. a database or a file) and collecting statistics or informative summaries about that data. The purpose of these statistics may be to: Find out whether existing data can be easily used for other purposes Improve the ability to search data by tagging it with keywords, descriptions, or assigning it to a category Assess data quality, including whether the data conforms to particular standards or patterns Assess the risk involved in integrating data in new applications, including the challenges of joins Discover metadata of the source database, including value patterns and distributions, key candidates, foreign-key candidates, and functional dependencies Assess whether known metadata accurately describes the actual values in the source database Understanding data challenges early in any data intensive project, so that late project surprises are avoided. Finding data problems late in the project can lead to delays and cost overruns. Have an enterprise view of all data, for uses such as master data management, where key data is needed, or data governance for improving data quality. == Introduction == Data profiling refers to the analysis of information for use in a data warehouse in order to clarify the structure, content, relationships, and derivation rules of the data. Profiling helps to not only understand anomalies and assess data quality, but also to discover, register, and assess enterprise metadata. The result of the analysis is used to determine the suitability of the candidate source systems, usually giving the basis for an early go/no-go decision, and also to identify problems for later solution design. == How data profiling is conducted == Data profiling utilizes methods of descriptive statistics such as minimum, maximum, mean, mode, percentile, standard deviation, frequency, variation, aggregates such as count and sum, and additional metadata information obtained during data profiling such as data type, length, discrete values, uniqueness, occurrence of null values, typical string patterns, and abstract type recognition. The metadata can then be used to discover problems such as illegal values, misspellings, missing values, varying value representation, and duplicates. Different analyses are performed for different structural levels. E.g. single columns could be profiled individually to get an understanding of frequency distribution of different values, type, and use of each column. Embedded value dependencies can be exposed in a cross-columns analysis. Finally, overlapping value sets possibly representing foreign key relationships between entities can be explored in an inter-table analysis. Normally, purpose-built tools are used for data profiling to ease the process. The computational complexity increases when going from single column, to single table, to cross-table structural profiling. Therefore, performance is an evaluation criterion for profiling tools. == When is data profiling conducted? == According to Kimball, data profiling is performed several times and with varying intensity throughout the data warehouse developing process. A light profiling assessment should be undertaken immediately after candidate source systems have been identified and DW/BI business requirements have been satisfied. The purpose of this initial analysis is to clarify at an early stage if the correct data is available at the appropriate detail level and that anomalies can be handled subsequently. If this is not the case the project may be terminated. Additionally, more in-depth profiling is done prior to the dimensional modeling process in order assess what is required to convert data into a dimensional model. Detailed profiling extends into the ETL system design process in order to determine the appropriate data to extract and which filters to apply to the data set. Additionally, data profiling may be conducted in the data warehouse development process after data has been loaded into staging, the data marts, etc. Conducting data at these stages helps ensure that data cleaning and transformations have been done correctly and in compliance of requirements. == Benefits and examples == Data profiling can improve data quality, shorten the implementation cycle of major projects, and improve users' understanding of data. Discovering business knowledge embedded in data itself is one of the significant benefits derived from data profiling. It can improve data accuracy in corporate databases.

    Read more →
  • Deep Learning Super Sampling

    Deep Learning Super Sampling

    Deep Learning Super Sampling (DLSS) is a suite of real-time deep learning image enhancement and upscaling technologies developed by Nvidia that are available in a number of video games. The goal of these technologies is to allow the majority of the graphics pipeline to run at a lower resolution for increased performance, and then infer a higher resolution image from this that approximates the same level of detail as if the image had been rendered at this higher resolution. This allows for higher graphical settings or frame rates for a given output resolution, depending on user preference. All generations of DLSS are available on all RTX-branded cards from Nvidia in supported titles. However, the Frame Generation feature is only supported on RTX 40 series GPUs or newer and Multi Frame Generation is only available on 50 series GPUs. == History == Nvidia advertised DLSS as a key feature of GeForce RTX 20 series GPUs when they launched in September 2018. At that time, the results were limited to a few video games, namely Battlefield V, or Metro Exodus, because the algorithm had to be trained specifically on each game on which it was applied and the results were usually not as good as simple resolution upscaling. In 2019, Control shipped with ray tracing and an image processing algorithm that approximated DLSS, which did not use the Tensor Cores. In April 2020, Nvidia advertised and shipped an improved version of DLSS named DLSS 2 with driver version 445.75. DLSS 2.0 was available for a few existing games including Control and Wolfenstein: Youngblood, and would later be added to many newly released games and game engines such as Unreal Engine and Unity. This time Nvidia said that it used the Tensor Cores again, and that the AI did not need to be trained specifically on each game. Despite sharing the DLSS branding, the two iterations of DLSS differ significantly and are not backwards-compatible. In January 2025, Nvidia stated that there are over 540 games and apps supporting DLSS, and that over 80% of Nvidia RTX users activate DLSS. In March 2025, there were more than 100 games that support DLSS 4, according to Nvidia. By May 2025, over 125 games supported DLSS 4. The first video game console to use DLSS, the Nintendo Switch 2, was released on June 5, 2025. Nvidia announced DLSS 4.5 at CES 2026. In January 2026, Nvidia stated that over 250 games and applications support Multi Frame Generation. On March 16, 2026, at GTC 2026, Nvidia CEO Jensen Huang presented DLSS 5, a real-time AI model based on neural rendering that realistically enhances lighting and material surfaces at up to 4K resolution while retaining the developer's intended art style. It is planned to release in fall of 2026. In a blog post on its website, Nvidia has announced that DLSS 5 will be available in such games as Assassin's Creed Shadows, Delta Force, Hogwarts Legacy, Naraka: Bladepoint, Phantom Blade Zero, Resident Evil Requiem, Starfield, The Elder Scrolls IV: Oblivion Remastered, and more. On May 31, 2026, Nvidia announced an updated version of Ray Reconstruction for DLSS 4.5 in a blog post, scheduled for release on all RTX GPUs in August of the same year. They said it is designed to better embed spatial awareness into scenes and analyze engine data on movements and lighting conditions, resulting in a sharper, more stable, and less noisy image. === Release timeline === == Technology == === DLSS 1 === The first iteration of DLSS is a predominantly spatial image upscaler with two stages, both relying on convolutional auto-encoder neural networks. The first step is an image enhancement network which uses the current frame and motion vectors to perform edge enhancement, and spatial anti-aliasing. The second stage is an image upscaling step which uses the single raw, low-resolution frame to upscale the image to the desired output resolution. Using just a single frame for upscaling means the neural network itself must generate a large amount of new information to produce the high-resolution output, which can result in slight hallucinations such as leaves that differ in style to the source content. The neural networks are trained on a per-game basis by generating a "perfect frame" using traditional supersampling to 64 samples per pixel, as well as the motion vectors for each frame. The data collected must be as comprehensive as possible, including as many levels, times of day, graphical settings, resolutions, etc. as possible. This data is also augmented using common augmentations such as rotations, colour changes, and random noise to help generalize the test data. Training is performed on Nvidia's Saturn V supercomputer. This first iteration received a mixed response, with many criticizing the often soft appearance and artifacts along with glitches in certain situations; likely a side effect of the limited data from only using a single frame input to the neural networks which could not be trained to perform optimally in all scenarios and edge-cases. Nvidia also demonstrated the ability for the auto-encoder networks to learn the ability to recreate depth-of-field and motion blur, although this functionality has never been included in a publicly released product. === DLSS 2 === DLSS 2 is a temporal anti-aliasing upsampling (TAAU) implementation, using data from previous frames extensively through sub-pixel jittering to resolve fine detail and reduce aliasing. The data DLSS 2 collects includes: the raw low-resolution input, motion vectors, depth buffers, and exposure / brightness information. It can also be used as a simpler TAA implementation where the image is rendered at 100% resolution, rather than being upsampled by DLSS, Nvidia brands this as DLAA (Deep Learning Anti-Aliasing). TAA(U) is used in many modern video games and game engines; however, all previous implementations have used some form of manually written heuristics to prevent temporal artifacts such as ghosting and flickering. One example of this is neighborhood clamping which forcefully prevents samples collected in previous frames from deviating too much compared to nearby pixels in newer frames. This helps to identify and fix many temporal artifacts, but deliberately removing fine details in this way is analogous to applying a blur filter, and thus the final image can appear blurry when using this method. DLSS 2 uses a convolutional auto-encoder neural network trained to identify and fix temporal artifacts, instead of manually programmed heuristics as mentioned above. Because of this, DLSS 2 can generally resolve detail better than other TAA and TAAU implementations, while also removing most temporal artifacts. This is why DLSS 2 can sometimes produce a sharper image than rendering at higher, or even native resolutions using traditional TAA. However, no temporal solution is perfect, and artifacts (ghosting in particular) are still visible in some scenarios when using DLSS 2. Because temporal artifacts occur in most art styles and environments in broadly the same way, the neural network that powers DLSS 2 does not need to be retrained when being used in different games. Despite this, Nvidia does frequently ship new minor revisions of DLSS 2 with new titles, so this could suggest some minor training optimizations may be performed as games are released, although Nvidia does not provide changelogs for these minor revisions to confirm this. The main advancements compared to DLSS 1 include: Significantly improved detail retention, a generalized neural network that does not need to be re-trained per-game, and ~2x less overhead (~1–2 ms vs ~2–4 ms). It should also be noted that forms of TAAU such as DLSS 2 are not upscalers in the same sense as techniques such as ESRGAN or DLSS 1, which attempt to create new information from a low-resolution source; instead, TAAU works to recover data from previous frames, rather than creating new data. In practice, this means low resolution textures in games will still appear low-resolution when using current TAAU techniques. This is why Nvidia recommends game developers use higher resolution textures than they would normally for a given rendering resolution by applying a mip-map bias when DLSS 2 is enabled. === DLSS 3 === Augments DLSS 2 with improved image quality and the introduction of a new motion interpolation feature, called Frame Generation. The DLSS Frame Generation algorithm takes two rendered frames from the rendering pipeline and generates a new frame that smoothly transitions between them. For every frame rendered, one additional frame is generated. DLSS 3.0 makes use of a new generation Optical Flow Accelerator (OFA) included in the Ada Lovelace architecture of GeForce RTX 40 series GPUs and with that is exclusive to them. The new OFA is said to be faster and more accurate than the one already available in previous Turing and Ampere RTX GPUs. === DLSS 3.5 === DLSS 3.5 adds Ray Reconstruction, replacing multiple denoising algorithms with a single AI model trained o

    Read more →
  • Backup

    Backup

    In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is "backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of IT disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server. A backup system contains at least one copy of all data considered worth saving. The data storage requirements can be large. An information repository model may be used to provide structure to this storage. There are different types of data storage devices used for copying backups of data that is already in secondary storage onto archive files. There are also different ways these devices can be arranged to provide geographic dispersion, data security, and portability. Data is selected, extracted, and manipulated for storage. The process can include methods for dealing with live data, including open files, as well as compression, encryption, and de-duplication. Additional techniques apply to enterprise client-server backup. Backup schemes may include dry runs that validate the reliability of the data being backed up. There are limitations and human factors involved in any backup scheme. == Storage == A backup strategy requires an information repository, "a secondary storage space for data" that aggregates backups of data "sources". The repository could be as simple as a list of all backup media (DVDs, etc.) and the dates produced, or could include a computerized index, catalog, or relational database. === 3-2-1 Backup Rule === The backup data needs to be stored, requiring a backup rotation scheme, which is a system of backing up data to computer media that limits the number of backups of different dates retained separately, by appropriate re-use of the data storage media by overwriting of backups no longer needed. The scheme determines how and when each piece of removable storage is used for a backup operation and how long it is retained once it has backup data stored on it. The 3-2-1 rule can aid in the backup process. It states that there should be at least 3 copies of the data, stored on 2 different types of storage media, and one copy should be kept offsite, in a remote location (this can include cloud storage). 2 or more different media should be used to eliminate data loss due to similar reasons (for example, optical discs may tolerate being underwater while LTO tapes may not, and SSDs cannot fail due to head crashes or damaged spindle motors since they do not have any moving parts, unlike hard drives). An offsite copy protects against fire, theft of physical media (such as tapes or discs) and natural disasters like floods and earthquakes. Physically protected hard drives are an alternative to an offsite copy, but they have limitations like only being able to resist fire for a limited period of time, so an offsite copy still remains as the ideal choice. Because there is no perfect storage, many backup experts recommend maintaining a second copy on a local physical device, even if the data is also backed up offsite. === Backup methods === ==== Unstructured ==== An unstructured repository may simply be a stack of tapes, DVD-Rs or external HDDs with minimal information about what was backed up and when. This method is the easiest to implement, but unlikely to achieve a high level of recoverability as it lacks automation. ==== Full only/System imaging ==== A repository using this backup method contains complete source data copies taken at one or more specific points in time. Copying system images, this method is frequently used by computer technicians to record known good configurations. However, imaging is generally more useful as a way of deploying a standard configuration to many systems rather than as a tool for making ongoing backups of diverse systems. ==== Incremental ==== An incremental backup stores data changed since a reference point in time. Duplicate copies of unchanged data are not copied. Typically a full backup of all files is made once or at infrequent intervals, serving as the reference point for an incremental repository. Subsequently, a number of incremental backups are made after successive time periods. Restores begin with the last full backup and then apply the incrementals. Some backup systems can create a synthetic full backup from a series of incrementals, thus providing the equivalent of frequently doing a full backup. When done to modify a single archive file, this speeds restores of recent versions of files. ==== Near-CDP ==== Continuous Data Protection (CDP) refers to a backup that instantly saves a copy of every change made to the data. This allows restoration of data to any point in time and is the most comprehensive and advanced data protection. Near-CDP backup applications—often marketed as "CDP"—automatically take incremental backups at a specific interval, for example every 15 minutes, one hour, or 24 hours. They can therefore only allow restores to an interval boundary. Near-CDP backup applications use journaling and are typically based on periodic "snapshots", read-only copies of the data frozen at a particular point in time. Near-CDP (except for Apple Time Machine) intent-logs every change on the host system, often by saving byte or block-level differences rather than file-level differences. This backup method differs from simple disk mirroring in that it enables a roll-back of the log and thus a restoration of old images of data. Intent-logging allows precautions for the consistency of live data, protecting self-consistent files but requiring applications "be quiesced and made ready for backup." Near-CDP is more practicable for ordinary personal backup applications, as opposed to true CDP, which must be run in conjunction with a virtual machine or equivalent and is therefore generally used in enterprise client-server backups. Software may create copies of individual files such as written documents, multimedia projects, or user preferences, to prevent failed write events caused by power outages, operating system crashes, or exhausted disk space, from causing data loss. A common implementation is an appended ".bak" extension to the file name. ==== Reverse incremental ==== A Reverse incremental backup method stores a recent archive file "mirror" of the source data and a series of differences between the "mirror" in its current state and its previous states. A reverse incremental backup method starts with a non-image full backup. After the full backup is performed, the system periodically synchronizes the full backup with the live copy, while storing the data necessary to reconstruct older versions. This can either be done using hard links—as Apple Time Machine does, or using binary diffs. ==== Differential ==== A differential backup saves only the data that has changed since the last full backup. This means a maximum of two backups from the repository are used to restore the data. However, as time from the last full backup (and thus the accumulated changes in data) increases, so does the time to perform the differential backup. Restoring an entire system requires starting from the most recent full backup and then applying just the last differential backup. A differential backup copies files that have been created or changed since the last full backup, regardless of whether any other differential backups have been made since, whereas an incremental backup copies files that have been created or changed since the most recent backup of any type (full or incremental). Changes in files may be detected through a more recent date/time of last modification file attribute, and/or changes in file size. Other variations of incremental backup include multi-level incrementals and block-level incrementals that compare parts of files instead of just entire files. === Storage media === Regardless of the repository model that is used, the data has to be copied onto an archive file data storage medium. The medium used is also referred to as the type of backup destination. ==== Magnetic tape ==== Magnetic tape was for a long time the most commonly used medium for bulk data storage, backup, archiving, and interchange. It was previously a less expensive option, but this is no longer the case for smaller amounts of data. Tape is a sequential access medium, so the rate of continuously writing or reading data can be very fast. While tape media itself has a low cost per space, tape drives are typically dozens of times as expensive as hard disk drives and optical drives. Tape media are generally rotated on a schedule so at least one set is off-site in case something should happe

    Read more →
  • Social employee

    Social employee

    A social employee is a worker operating within a social business model. Following an organization's social computing guidelines, social employees use social media tools both for internal workflow and collaboration purposes and for external engagement with customers, prospects and stakeholders through a combination of social media marketing, content marketing, social marketing, and social selling. Social employee programs are considered to be as much about culture and engagement as they are about business processes and best practices. In addition to increased leads and sales, social employee best practices are said to improve business outcomes important to social media marketing, such as increased connections and web traffic, improved brand identification and "chatter", and better customer advocacy. == Overview == The term "social employee" was first introduced to describe those exhibiting the emerging characteristics of workers operating under a social business model. The term is often used interchangeably with similar designations like "employee advocate" or "social employee advocate". Crucial to the perceived value of the social employee is the concept of the digital footprint. While organizations are able to generate large bases of followers through social media, research shows that brand marketing and engagement efforts through these networks are not as effective as those of individual employees. In fact, some research indicates that employee experts are more trusted than any other member of an organization. Because of this, social employee programs are designed to train, empower, and support employee engagement efforts in the hopes of authentically engaging larger communities, increasing the frequency of shares, reviews, and other forms of "earned media" and expanding the brand's presence on the web. == The personal or employee brand == A foundational concept of the social employee is the idea of the personal or employee brand. This concept first gained popular attention in a 1997 FastCompany article by business leader Tom Peters titled "The Brand Called You". In the article, Peters argued that the premium placed on branding impacted workers' lives to such an extent that creating and cultivating a distinct personal brand had become a professional necessity. According to Peters, doing so built trust, loyalty, visibility, influence, and employability. With increased adoption of social media tools by both businesses and consumers in the early 21st century, many business leaders became increasingly concerned with social engagement, both internally among employees and externally with customers and other stakeholders. While many in the business community acknowledged the potential social tools had for improved collaboration, productivity, and brand messaging, the concern that employees would misrepresent their brand, disclose proprietary information, or otherwise damage their company's reputation or ability to conduct business persisted. As a result, many began to advocate for employee branding as a solution to this problem. This helped give new meaning to the concept of brand ambassadorship, positioning everyday employees in public, and potentially high-profile, engagement roles. == Characteristics == === Engaged === Because social employee advocacy is dependent on the perceived authenticity of the employee, engagement is highly valued in social organizations. Further, data show the measurable impact of employee engagement on organizational productivity and profitability: Happy employees were found to be 12 percent more productive. In one study, engaged employees were found to be 38 percent more likely to produce at above-average rates. In another, organizations with engaged employees had a 19 percent higher than average shareholder return, while organizations with disengaged employees experienced shareholder return that was 44 percent below average. Engaged companies were found to outperform disengaged companies by up to 202 percent. Companies with strong focus on culture were found to have an average 13.9 percent turnover rate, while those with a low focus experience were found to have a 48.4 percent turnover rate. === Flexible job environment and work–life balance === The number of professionals working mobile or remote has risen considerably since 2010. While estimates vary, one study found that number of organizations with mobile or remote employees is expected to rise from 24 percent in 2012 to 89 percent by 2020. Other research has estimated that by 2020, 105.4 million professionals will work remotely in America, about 72.3 percent of the total workforce. This change has been linked to a rise in social technologies, including biometrics, wearables, near-field communications, and augmented reality. Social employees have also put a greater emphasis on work–life balance, with many believing that advances in technology can directly support efforts in this area. Purported benefits of this shift include a more flexible workforce, reduced business costs, and greater organizational leverage in attracting and retaining top talent. === Buys into the brand's story === In 2009, thought leader Simon Sinek presented a speech called "How Great Leaders Inspire Action" at a TEDxPugetSound event. Sinek's central argument in this speech was, "People don't buy what you do. They buy why you do it." This concept—that the story behind a business or product offering is a more compelling sales tool than the product itself—is frequently cited in social media marketing as a way to build authentic connections with stakeholders. However, others have argued that for employees to share a brand's story authentically, they must be engaged in that story themselves, and as a result, many companies have made storytelling part of their culture programs. === Collaborative === An implicit tenet in social business is that social technologies aren't a barrier to productivity, but rather a path to increased connectivity. The shift in enterprise software systems like IBM Connections to incorporate social communication models, such as mentions, wikis, and newsfeeds, reflects the changing communication dynamics within business. With an increase in diversity and sophistication in collaborative software platforms, social organizations have sought to find new creative ways to utilize these tools and secure employee buy-in around them. Crowdsourcing has also become popular in social businesses. Examples include AT&T's program The Innovation Pipeline (TIP), begun in 2009, which has generated over 28,000 ideas that have led to over 75 projects with funding exceeding $44 million. IBM has also put considerable resources into such processes, producing its social computing guidelines through employee crowdsourcing, as well as its Connections platform through the Technology Adoption Program (TAP), a more formalized crowdsourcing initiative. Another popular form of internal collaboration is the hack day, or hackathon. Organizations such as Netflix, Facebook, and IBM use hack days to pull employees out of their day-to-day work environments and encourage them to collaborate in nontraditional ways in an attempt to drive disruptive innovation. Social employees are often encouraged to seek external collaboration opportunities with customers and prospects. For example, Procter & Gamble introduced the Live Well Collaborative to connect with external stakeholders and develop products and services for the 50+ demographic. === Social listener === A social listener is someone who engages in social listening, or social media monitoring, for professional means. Social employees can use social media monitoring for a variety of reasons, including professional development, industry news and trends, and gauging market sentiment. Some have argued that social listening is one of the most important components of social business, as it enables organizations to collect rich market data, make more informed strategic decisions, and respond to customer needs more authentically. === Customer-centric === Advocates of customer-centricity in social business argue that social media has changed the dynamic from one-way brand messaging to shared interactions between brand and customer. Brand and customer engagement is seen as a means of creating more lasting connections with customers and prospects and empowering them to become brand promoters. Customer-centric interactions are seen to have distinct value to brands, as research shows that prospects are far more likely to trust brand-related messaging from a friend or family member than they are from a brand. As a means of building social employees, some social advocates have also called for a broader definition of customer to include the employees themselves. In the book The Pursuit of Social Business Excellence, authors Vala Afshar and Brad Martin made the following argument: A social business operates with the guiding principle that each employee's responsi

    Read more →
  • MDS matrix

    MDS matrix

    An MDS matrix (maximum distance separable) is a matrix representing a function with certain diffusion properties that have useful applications in cryptography. Technically, an m × n {\displaystyle m\times n} matrix A {\displaystyle A} over a finite field K {\displaystyle K} is an MDS matrix if it is the transformation matrix of a linear transformation f ( x ) = A x {\displaystyle f(x)=Ax} from K n {\displaystyle K^{n}} to K m {\displaystyle K^{m}} such that no two different ( m + n ) {\displaystyle (m+n)} -tuples of the form ( x , f ( x ) ) {\displaystyle (x,f(x))} coincide in n {\displaystyle n} or more components. Equivalently, the set of all ( m + n ) {\displaystyle (m+n)} -tuples ( x , f ( x ) ) {\displaystyle (x,f(x))} is an MDS code, i.e., a linear code that reaches the Singleton bound. Let A ~ = ( I n A ) {\displaystyle {\tilde {A}}={\begin{pmatrix}\mathrm {I} _{n}\\\hline \mathrm {A} \end{pmatrix}}} be the matrix obtained by joining the identity matrix I n {\displaystyle \mathrm {I} _{n}} to A {\displaystyle A} . Then a necessary and sufficient condition for a matrix A {\displaystyle A} to be MDS is that every possible n × n {\displaystyle n\times n} submatrix obtained by removing m {\displaystyle m} rows from A ~ {\displaystyle {\tilde {A}}} is non-singular. This is also equivalent to the following: all the sub-determinants of the matrix A {\displaystyle A} are non-zero. Then a binary matrix A {\displaystyle A} (namely over the field with two elements) is never MDS unless it has only one row or only one column with all components 1 {\displaystyle 1} . Reed–Solomon codes have the MDS property and are frequently used to obtain the MDS matrices used in cryptographic algorithms. Serge Vaudenay suggested using MDS matrices in cryptographic primitives to produce what he called multipermutations, not-necessarily linear functions with this same property. These functions have what he called perfect diffusion: changing t {\displaystyle t} of the inputs changes at least m − t + 1 {\displaystyle m-t+1} of the outputs. He showed how to exploit imperfect diffusion to cryptanalyze functions that are not multipermutations. MDS matrices are used for diffusion in such block ciphers as AES, SHARK, Square, Twofish, Anubis, KHAZAD, Manta, Hierocrypt, Kalyna, Camellia and HADESMiMC, and in the stream cipher MUGI and the cryptographic hash function Whirlpool, Poseidon.

    Read more →
  • Vision transformer

    Vision transformer

    A vision transformer (ViT) is a transformer designed for computer vision. A ViT decomposes an input image into a series of patches (rather than text into tokens), serializes each patch into a vector, and maps it to a smaller dimension with a single matrix multiplication. These vector embeddings are then processed by a transformer encoder as if they were token embeddings. ViTs were designed as alternatives to convolutional neural networks (CNNs) in computer vision applications. They have different inductive biases, training stability, and data efficiency. Compared to CNNs, ViTs are less data efficient, but have higher capacity. Some of the largest modern computer vision models are ViTs, such as one with 22B parameters. Subsequent to its publication, many variants were proposed, with hybrid architectures with both features of ViTs and CNNs. ViTs have found application in image recognition, image segmentation, weather prediction, and autonomous driving. == History == Transformers were introduced in Attention Is All You Need (2017), and have found widespread use in natural language processing. A 2019 paper applied ideas from the Transformer to computer vision. Specifically, they started with a ResNet, a standard convolutional neural network used for computer vision, and replaced all convolutional kernels by the self-attention mechanism found in a Transformer. It resulted in superior performance. However, it is not a Vision Transformer. In 2020, an encoder-only Transformer was adapted for computer vision, yielding the ViT, which reached state of the art in image classification, overcoming the previous dominance of CNN. The masked autoencoder (2022) extended ViT to work with unsupervised training. The vision transformer and the masked autoencoder, in turn, stimulated new developments in convolutional neural networks. Subsequently, there was cross-fertilization between the previous CNN approach and the ViT approach. In 2021, some important variants of the Vision Transformers were proposed. These variants are mainly intended to be more efficient, more accurate or better suited to a specific domain. Two studies improved efficiency and robustness of ViT by adding a CNN as a preprocessor. The Swin Transformer achieved state-of-the-art results on some object detection datasets such as COCO, by using convolution-like sliding windows of attention mechanism, and the pyramid process in classical computer vision. == Overview == The basic architecture, used by the original 2020 paper, is as follows. In summary, it is a BERT-like encoder-only Transformer. The input image is of type R H × W × C {\displaystyle \mathbb {R} ^{H\times W\times C}} , where H , W , C {\displaystyle H,W,C} are height, width, channel (RGB). It is then split into square-shaped patches of type R P × P × C {\displaystyle \mathbb {R} ^{P\times P\times C}} . For each patch, the patch is pushed through a linear operator, to obtain a vector ("patch embedding"). The position of the patch is also transformed into a vector by "position encoding" (the paper tried no embedding, 1D embedding, 2D embedding, and relative embedding: 1D was adopted). The two vectors are added, then pushed through several Transformer encoders. The attention mechanism in a ViT repeatedly transforms representation vectors of image patches, incorporating more and more semantic relations between image patches in an image. This is analogous to how in natural language processing, as representation vectors flow through a transformer, they incorporate more and more semantic relations between words, from syntax to semantics. The above architecture turns an image into a sequence of vector representations. To use these for downstream applications, an additional head needs to be trained to interpret them. For example, to use it for classification, one can add a shallow MLP on top of it that outputs a probability distribution over classes. The original paper uses a linear-GeLU-linear-softmax network. == Variants == === Original ViT === The original ViT was an encoder-only Transformer supervise-trained to predict the image label from the patches of the image. As in the case of BERT, it uses a special token in the input side, and the corresponding output vector is used as the only input of the final output MLP head. The special token is an architectural hack to allow the model to compress all information relevant for predicting the image label into one vector. Transformers found their initial applications in natural language processing tasks, as demonstrated by language models such as BERT and GPT-3. By contrast the typical image processing system uses a convolutional neural network (CNN). Well-known projects include Xception, ResNet, EfficientNet, DenseNet, and Inception. Transformers measure the relationships between pairs of input tokens (words in the case of text strings), termed attention. The cost is quadratic in the number of tokens. For images, the basic unit of analysis is the pixel. However, computing relationships for every pixel pair in a typical image is prohibitive in terms of memory and computation. Instead, ViT computes relationships among pixels in various small sections of the image (e.g., 16x16 pixels), at a drastically reduced cost. The sections (with positional embeddings) are placed in a sequence. The embeddings are learnable vectors. Each section is arranged into a linear sequence and multiplied by the embedding matrix. The result, with the position embedding is fed to the transformer. === Architectural improvements === ==== Pooling ==== After the ViT processes an image, it produces some embedding vectors. These must be converted to a single class probability prediction by some kind of network. In the original ViT and Masked Autoencoder, they used a dummy [CLS] token, in emulation of the BERT language model. The output at [CLS] is the classification token, which is then processed by a LayerNorm-feedforward-softmax module into a probability distribution. Global average pooling (GAP) does not use the dummy token, but simply takes the average of all output tokens as the classification token. It was mentioned in the original ViT as being equally good. Multihead attention pooling (MAP) applies a multiheaded attention block to pooling. Specifically, it takes as input a list of vectors x 1 , x 2 , … , x n {\displaystyle x_{1},x_{2},\dots ,x_{n}} , which might be thought of as the output vectors of a layer of a ViT. The output from MAP is M u l t i h e a d e d A t t e n t i o n ( Q , V , V ) {\displaystyle \mathrm {MultiheadedAttention} (Q,V,V)} , where q {\displaystyle q} is a trainable query vector, and V {\displaystyle V} is the matrix with rows being x 1 , x 2 , … , x n {\displaystyle x_{1},x_{2},\dots ,x_{n}} . This was first proposed in the Set Transformer architecture. Later papers demonstrated that GAP and MAP both perform better than BERT-like pooling. A variant of MAP was proposed as class attention, which applies MAP, then feedforward, then MAP again. Re-attention was proposed to allow training deep ViT. It changes the multiheaded attention module. === Masked Autoencoder === The Masked Autoencoder took inspiration from denoising autoencoders and context encoders. It has two ViTs put end-to-end. The first one ("encoder") takes in image patches with positional encoding, and outputs vectors representing each patch. The second one (called "decoder", even though it is still an encoder-only Transformer) takes in vectors with positional encoding and outputs image patches again. ==== Training ==== During training, input images (224px x 224 px in the original implementation) are split along a designated number of lines on each axis, producing image patches. A certain percentage of patches are selected to be masked out by mask tokens, while all others are retained in the image. The network is tasked with reconstructing the image from the remaining unmasked patches. Mask tokens in the original implementation are learnable vector quantities. A linear projection with positional embeddings is then applied to the vector of unmasked patches. Experiments varying mask ratio on networks trained on the ImageNet-1K dataset found 75% mask ratios achieved high performance on both finetuning and linear-probing of the encoder's latent space. The MAE processes only unmasked patches during training, increasing the efficiency of data processing in the encoder and lowering the memory usage of the transformer. A less computationally-intensive ViT is used for the decoder in the original implementation of the MAE. Masked patches are added back to the output of the encoder block as mask tokens and both are fed into the decoder. A reconstruction loss is computed for the masked patches to assess network performance. ==== Prediction ==== In prediction, the decoder architecture is discarded entirely. The input image is split into patches by the same algorithm as in training, but no patches are masked out. A linear projection wi

    Read more →
  • Protecting Kids From Social Media Act

    Protecting Kids From Social Media Act

    Protecting Kids on Social Media Act or HB 1891 is an American law that was introduced by William Lamberth of Sumner County, Tennessee and was signed into law by Tennessee's governor on May 2, 2024. The bill requires social media websites such as X, YouTube, TikTok, Facebook and others to verify the age of users and if those users are under 18, they must have parental consent. == Progress == The law passed the Tennessee State Legislature with little opposition: the bill had only two no votes in the House from Aftyn Behn and Vincent B. Dixie, and it had zero no votes in the Senate. == Bill summary == Every social media company must verify the age of new users after the law takes effect, and if the user had created an account before the law took effect, they must verify the age of the person attempting to access the account within 14 days. If the new user or the user who originally owned an account is under 18 years of age, they must get parental consent and the third party or social media company must not retain the data from the age verification process or obtaining parental consent. Parents who are account holders of those under 18 can view the privacy settings, set daily time restrictions, and implement breaks during which the minor cannot access the account. The law is enforced by the Attorney General of Tennessee and went into effect on January 1, 2025. == Lawsuit == On October 3, 2024, the trade association NetChoice filed a lawsuit against Tennessee Attorney General Jonathan Skrmetti in the Middle District Court of Tennessee, claiming that the law violates the First Amendment. The Judge for the case is William L. Campbell Jr. An initial case management conference was originally scheduled for December 4, 2024, however it was delayed because of the Supreme Court case United States v. Skrmetti, recommending that the conference be delayed after January 20, 2025. On February 14, 2025, Judge Eli Richardson denied NetChoice's motion for a temporary restraining order because it would disrupt the status quo of the case.

    Read more →
  • Data marketplace

    Data marketplace

    Data marketplace is an online platform for sharing and consuming data in the form of data assets or data products. Part of the data management stack, it aims to bring together data producers and data consumers (including business users and AI) in a single space, with the objective of increasing access to understandable, high-quality data. Included within its Data Marketplaces and Exchange (DME) category by Gartner, data marketplaces can provide data internally within an organization, externally with partners, or as open data. == Concept == Digitization has dramatically increased data volumes within organizations, with IDC predicting that by 2025 the world will contain 175 zettabytes of data. This has created a need to both manage this data and provide access to it to enable business intelligence and data analysis. However, data is often scattered within multiple systems (such as data warehouses and data lakes), and is in formats that are only understandable by technical experts, such as data scientists. According to IDC, 81% of IT leaders cite data silos as a major barrier to digital transformation. This means that data is not freely available to business users or external audiences such as partners or citizens, limiting its value, and holding back AI deployments. Data marketplaces solve this issue, providing seamless, self-service access to high-quality data in an understandable, secure and auditable manner. They break down data silos, reduce friction in data access, and enable a broader range of users, including non-technical profiles, to find, understand, and consume data autonomously. Data assets on the marketplace can be raw data, data visualizations or data products. Data marketplaces combine data management functions such as data governance with the user-friendly experience offered by e-commerce marketplaces in order to increase the usage of data. These include features such as powerful search engines, feedback, ratings, subscriptions and product description sheets. According to Gartner, data marketplaces provide infrastructure, transactional capabilities, and services for both consumers and providers of data assets. == History and timeline == Data marketplaces have evolved since they first emerged in terms of both their scope and usage. === 2000s === With the rise of the internet, data brokers began collecting, aggregating, distributing and selling personal, financial and marketing data to third parties online. Data marketplaces were deployed to monetize this data, making it discoverable and accessible to users, either through subscriptions or one-off purchases. At the same time, regulations, such as the US Open Government Initiative of 2009 and others around the world mandated greater transparency and data sharing with the public. Data sharing portals were created by public and government bodies to make this information available through self-service to all users. === 2010s === Due to the growth of big data and cloud platforms, cloud-based data exchange platforms emerged. These were offered by major infrastructure providers, and included Amazon Web Services (AWS) Data Exchange, Snowflake Data Marketplace, and the Google Cloud Platform. These platforms moved beyond simple data brokerage or open data by providing structured, catalogued data sharing between organizations. === 2020s === Driven by a need to increase internal data sharing with both business users and AI, organizations are now looking to adopt internal data marketplaces. These aim to democratize data consumption by providing seamless access for all employees and AI to trusted data, including data products, through an intuitive, e-commerce style experience. According to Gartner analyst Richa Jha, "by providing a single, governed platform for discovering, sharing, and scaling data products, data marketplaces drive productivity, collaboration, and ROI across the enterprise." == Data marketplaces within the overall data architecture == Data marketplaces provide a consumption and collaboration layer for data. That means they complement and integrate with other parts of the overall data architecture, including: === Data warehouses and data lakes === Data marketplaces connect to data sources, such as data warehouses or data lakes, to provide intuitive access to the data stored within them, enabling data to be shared and distributed to non-technical audiences. Access can be direct, with data and data products stored within the data marketplace or virtualized. === Data catalog === A data catalog provides a technical inventory of an organization's data estate. It collects technical information on all available data assets within an organization, based on metadata descriptions. This ensures traceability, and supports compliance and governance requirements. Unlike a data marketplace, a data catalog does not provide access to data, and is designed to be used by data professionals, rather than the business. This means it lacks an intuitive, understandable interface and is consequently not easily accessible by business users. === Data mesh === Data mesh is an architecture and framework for data management, first defined by Zhamak Dehghani in 2019. It aims to decentralize data ownership to delegate responsibility, empowering teams and focusing on delivering data to users in the form of self-service data products. The data marketplace is a central pillar of data mesh, providing intuitive access to these data products, and creating a collaboration space for data owners and data consumers. === Data product === Data products are high-value, consumable data assets that package high-quality data and associated tools to enable seamless usage by business users at scale. First defined by McKinsey in 2022, they have an identified owner, a service level agreement (SLA), and a reusability logic. == Core components of a data marketplace == A data marketplace typically includes specific core components: === E-commerce style interface === An e-commerce style experience that engages non-technical users, minimizes the need for training and builds confidence and trust in data. Look and feel should be customizable to incorporate corporate design guidelines to ensure consistency with other organizational applications. === Built-in data catalog === As in a standalone data catalog, this indexes all available data, based on metadata that includes type, source, owner, freshness, and quality level. === Discovery and search engine === This enables users to search, filter, explore and discover available data intuitively. As in an e-commerce marketplace, it should be intelligent, and provide relevant results based on natural language queries. === Access control and security management === Data marketplaces will contain data that needs to be protected under regulations such as the General Data Protection Regulation (GDPR) in Europe, the California Consumer Privacy Act (CCPA) in the United States, and sector-specific frameworks in industries such as finance and healthcare. To ensure both security and compliance while maximizing data consumption, the data marketplace should include granular access management and a full audit trail. === Semantic layer and business glossary === Different parts of the business are likely to use different terms to describe data. This leads to inconsistencies and an inability to share data across systems and teams. The semantic layer and business glossary standardize a shared vocabulary and common definitions of business indicators and concepts, providing a single language for data across the business and for AI agents. === Data governance mechanisms === These enforce corporate data governance policies, ensuring data traceability through data lineage, quality certification, usage monitoring, and continuous improvement through user feedback loops. === Collaboration features === As on an e-commerce website, a data marketplace should provide collaboration features that bring together data users and data owners. This includes the ability to rate data products, share use cases, and provide feedback to data owners, creating a community around data and supporting a data-driven culture. == Types of data marketplace == While they share the same underlying technology, data marketplaces can be deployed in three broad ways: === Internal data marketplaces === These bring together data from across an organization and make it available via self-service to employees from across the business. They aim to widen access to data and consequently to improve decision-making and reporting, increase performance and maximize efficiency. === Ecosystem data marketplaces === These extend sharing beyond a single organization, enabling multiple partners (public institutions, industry players, research bodies) to share and consume data within a governed framework. Data can be provided by all parties or simply by one organization and consumed by others. Ecosystem data marketplaces are particularly relevant in

    Read more →