AI Code Janitor

AI Code Janitor — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Nobody (username)

    Nobody (username)

    In many Unix variants, "nobody" is the conventional name of a user identifier which owns no files, is in no privileged groups, and has no abilities except those which every other user has. It is normally not enabled as a user account, i.e. has no home directory or login credentials assigned. Some systems also define an equivalent group "nogroup". == Uses == The pseudo-user "nobody" and group "nogroup" are used, for example, in the NFSv4 implementation of Linux by idmapd, if a user or group name in an incoming packet does not match any known username on the system. It was once common to run daemons as nobody, especially on servers, in order to limit the damage that could be done by a malicious user who gained control of them. However, the usefulness of this technique is reduced if more than one daemon is run like this, because then gaining control of one daemon would provide control of them all. The reason is that processes owned by the same user have the ability to send signals to each other and use debugging facilities to read or even modify each other's memory. Modern practice, as recommended by the Linux Standard Base, is to create a separate user account for each daemon.

    Read more →
  • Data exhaust

    Data exhaust

    Data exhaust (also exhaust data) is the trail of data generated as a by-product of users' online activity, behaviour, and transactions, rather than data they deliberately create or submit. It forms part of a broader category of unconventional data that also includes geospatial, network, and time-series data, and may be useful for predictive analytics. Data exhaust can take the form of cookies, temporary files, log files, clickstream records and stored preferences. Actions such as visiting a web page, following a link, or dwelling on an element may all generate exhaust data that is recorded without the user's active awareness. Unlike primary content — which the user intentionally creates — exhaust data is a passive side effect of interaction. A bank, for example, might treat the amounts and parties involved in a transaction as primary data, while secondary data could include whether the transaction was carried out at a cash machine rather than a branch. == Uses == Data exhaust collected by companies is often information that is not immediately useful in isolation, but can be aggregated and analysed to improve products, personalise content, identify trends, and support quality control. Companies may also store exhaust data for future analysis or sell it to third parties. Shoshana Zuboff has described this practice as a core mechanism of what she terms surveillance capitalism, in which behavioural data generated by users is converted into predictive products. Kosciejew notes that large quantities of often raw data are collected in this way, much of which is never analysed. == Medical exhaust data == Many medical devices — including pacemakers, dialysis machines and surgical cameras — generate exhaust data as a by-product of their operation. The majority of this data is never captured or analysed, and is typically discarded once a procedure ends or a device completes its routine monitoring cycle. The potential use of data generated by implanted devices such as pacemakers raises additional legal and ethical questions around ownership and consent. Using electronic health records for research also creates challenges because of the volume of data involved, creating a need for automated algorithms to process it. == Privacy and regulation == The collection and distribution of data exhaust is not in itself illegal in most jurisdictions, but its use raises questions of privacy and informed consent. Steps commonly taken to address these concerns include data anonymisation, offering users an opt-out from the sale of their data, and publishing explicit privacy policies that disclose what data is collected and how it is used.

    Read more →
  • Social bot

    Social bot

    A social bot, refers to fully or partially automated social media accounts designed to perform most regular users’ actions, such as liking, posting content, and chatting with other users. Although their levels of autonomy vary, and often include a human-in-the-loop, social bots can use artificial intelligence to perform social media actions and can use large language models to mimic human dialogue. Social bots can operate alone or in groups that coordinate messaging as part of a network of coordinated inauthentic behavior. Social bots are often used to perform ad fraud by artificially boosting viewership and engagement metrics and to spread disinformation on social media. == Uses == Social bots are used for a large number of purposes on a variety of social media platforms, including Twitter, Instagram, Facebook, and YouTube. One common use of social bots is to inflate a social media user's apparent popularity, usually by artificially manipulating their engagement metrics with large volumes of fake likes, reposts, or replies. Social bots can similarly be used to artificially inflate a user's follower count with fake followers, creating a false perception of a larger and more influential online following than is the case. The use of social bots to create the impression of a large social media influence allows individuals, brands, and organizations to attract a higher number of human followers and boost their online presence. Fake engagement can be bought and sold in the black market of social media engagement. Corporations typically use automated customer service agents on social media to affordably manage high levels of support requests. Social bots are used to send automated responses to users’ questions, sometimes prompting the user to private message the support account with additional information. The increased use of automated support bots and virtual assistants has led to some companies laying off customer-service staff. Social bots are also often used to influence public opinion. Autonomous bot accounts can flood social media with large numbers of posts expressing support for certain products, companies, or political campaigns, creating the impression of organic grassroots support. This can create a false perception of the number of people who support a certain position, which may also have effects on the direction of stock prices or on elections. Messages with similar content can also influence fads or trends. Many social bots are also used to amplify phishing attacks. These malicious bots are used to trick a social media user into giving up their passwords or other personal data. This is usually accomplished by posting links claiming to direct users to news articles that would in actuality direct to malicious websites containing malware. Scammers often use URL shortening services such as TinyURL and bit.ly to disguise a link's domain address, increasing the likelihood of a user clicking the malicious link. The presence of fake social media followers and high levels of engagement help convince the victim that the scammer is in fact a trusted user. Social bots can be a tool for computational propaganda. Bots can also be used for algorithmic curation, algorithmic radicalization, and/or influence-for-hire, a term that refers to the selling of an account on social media platforms. == History == Bots have coexisted with computer technology since the earliest days of computing. Social bots have their roots in the 1950s with Alan Turing, whose work focused on machine intelligence with the development of the Turing Test. The following decades saw further progress made towards the goal of creating programs capable of mimicking human behavior, notably with Joseph Weizenbaum’s creation of ELIZA. Considered to be one of the first Chatbots, ELIZA could simulate natural conversations with human users through pattern matching. Its most famous script was DOCTOR, a simulation of a Rogerian psychotherapist that was programmed to chat with patients and respond to questions. With the growth of social media platforms in the early 2000s, these bots could be used to interact with much larger user groups in an inconspicuous manner. Early instances of autonomous agents on social media could be found on sites like MySpace, with social bots being used by marketing firms to inflate activity on a user’s page in an effort to make them appear more popular. Social bots have been observed on a large variety of social media websites, with Twitter being one of the most widely observed examples. The creation of Twitter bots is generally against the site’s terms of service when used to post spam or to automatically like and follow other users, but some degree of automation using Twitter’s API may be permitted if used for “entertainment, informational, or novelty purposes.” Other platforms such as Reddit and Discord also allow for the use of social bots as long as they are not used to violate policies regarding harmful content and abusive behavior. Social media platforms have developed their own automated tools to filter out messages that come from bots, although they cannot detect all bot messages. == Legal regulation == Due to the difficulty of recognizing social bots and separating them from "eligible" automation via social media APIs, it is unclear how legal regulation can be enforced. Social bots are expected to play a role in shaping public opinion by autonomously acting as influencers. Some social bots have been used to rapidly spread misinformation, manipulate stock markets, influence opinion on companies and brands, promote political campaigns, and engage in malicious phishing campaigns. In the United States, some states have started to implement legislation in an attempt to regulate the use of social bots. In 2019, California passed the Bolstering Online Transparency Act (the B.O.T. Act) to make it unlawful to use automated software to appear indistinguishable from humans for the purpose of influencing a social media user's purchasing and voting decisions. Other states such as Utah and Colorado have passed similar bills to restrict the use of social bots. The Artificial Intelligence Act (AI Act) in the European Union is the first comprehensive law governing the use of Artificial Intelligence. The law requires transparency in AI to prevent users from being tricked into believing they are communicating with another human. AI-generated content on social media must be clearly marked as such, preventing social bots from using AI in a manner that mimics human behavior. == Detection == The first generation of bots could sometimes be distinguished from real users by their often superhuman capacities to post messages. Later developments have succeeded in imprinting more "human" activity and behavioral patterns in the agent. With enough bots, it might be even possible to achieve artificial social proof. To unambiguously detect social bots as what they are, a variety of criteria must be applied together using pattern detection techniques, some of which are: cartoon figures as user pictures sometimes also random real user pictures are captured (identity fraud) reposting rate temporal patterns sentiment expression followers-to-friends ratio length of user names variability in (re)posted messages engagement rate (like/followers rate) analysis of the time series of social media posts Social bots are always becoming increasingly difficult to detect and understand. The bots' human-like behavior, ever-changing behavior of the bots, and the sheer volume of bots covering every platform may have been a factor in the challenges of removing them. Social media sites, like Twitter, are among the most affected, with CNBC reporting up to 48 million of the 319 million users (roughly 15%) were bots in 2017. Botometer (formerly BotOrNot) is a public Web service that checks the activity of a Twitter account and gives it a score based on how likely the account is to be a bot. The system leverages over a thousand features. An active method for detecting early spam bots was to set up honeypot accounts that post nonsensical content, which may get reposted (retweeted) by the bots. However, bots evolve quickly, and detection methods have to be updated constantly, because otherwise they may get useless after a few years. One method is the use of Benford's Law for predicting the frequency distribution of significant leading digits to detect malicious bots online. This study was first introduced at the University of Pretoria in 2020. Another method is artificial-intelligence-driven detection. Some of the sub-categories of this type of detection would be active learning loop flow, feature engineering, unsupervised learning, supervised learning, and correlation discovery. Some operations of bots work together in a synchronized way. For example, ISIS used Twitter to amplify its Islamic content by numerous orchestrated accounts which further pushed an item to the Hot List news, thus further a

    Read more →
  • Transmission security

    Transmission security

    Transmission security (TRANSEC) is the component of communications security (COMSEC) that results from the application of measures designed to protect transmissions from interception and exploitation by means other than cryptanalysis. Goals of transmission security include: Low probability of interception (LPI) Low probability of detection (LPD) Antijam — resistance to jamming (EPM or ECCM) This involves securing communication links from being compromised by techniques like jamming, eavesdropping, and signal interception. TRANSEC includes the use of frequency hopping, spread spectrum and the physical protection of communication links to obscure the patterns of transmission. It is particularly vital in military and government communication systems, where the security of transmitted data is critical to prevent adversaries from gathering intelligence or disrupting operations. TRANSEC is often implemented alongside COMSEC (Communications Security) to form a comprehensive approach to communication security. Methods used to achieve transmission security include frequency hopping and spread spectrum where the required pseudorandom sequence generation is controlled by a cryptographic algorithm and key. Such keys are known as transmission security keys (TSK). Modern U.S. and NATO TRANSEC-equipped radios include SINCGARS and HAVE QUICK.

    Read more →
  • Affinity (software)

    Affinity (software)

    Affinity is a graphics editor developed by Serif, a subsidiary of Canva. It is simultaneously a vector graphics editor, a raster graphics editor and a desktop publishing application. It was first released in 2025 as a successor to Serif's Affinity Designer, Affinity Photo and Affinity Publisher, uniting the three editors into one application. While the previous versions competed individually against Adobe's Illustrator, Photoshop, and InDesign, Affinity 3.0 integrates their functionality into a single application. It uses a freemium model monetized by AI features exclusive to Canva Pro subscribers. == Functionality == Affinity is divided into a number of workspaces ("studios"), which are equivalent to the previous suite of Affinity applications: "vector" for vector graphics (Designer), "pixel" for raster editing (Photo), and "layout" for desktop publishing (Publisher). Additionally, it introduces the ability to create custom workspaces. The application supports real-time previews and non-destructive editing, which are based on GPU acceleration. Supported file formats include Adobe Photoshop, InDesign and Illustrator files, PDF, SVG, and TIFF, as well as a custom .af file format. === Vector editing === === Raster editing === Affinity includes photo editing tools including adjustments, masks, blend modes, batch processing, and retouching facilities. Additionally, the application can develop RAW files, similar to Adobe Lightroom. === Desktop publishing === Publishing features include master pages, text styles, and advanced typography. === AI features === The application supports Canva's existing AI features, such as background removal and generative fill. This requires a Canva subscription. == Development == === Background and acquisition (2014–2024) === Serif launched the original Affinity suite starting with Affinity Designer in 2014, followed by Photo (2015) and Publisher (2019). The software gained popularity for its one-time purchase model, contrasting with Adobe's subscription-based Creative Cloud. In November 2022, Serif released Version 2 of the suite, introducing a "Universal License" that covered all three apps across all platforms. In March 2024, Canva acquired Serif for approximately A$580 million (£300 million). Following user backlash regarding a potential shift to subscriptions, Canva and Serif issued a joint "Pledge" committing to four key principles: fair pricing, no mandatory subscriptions, perpetual licenses for existing products, and continued development of Affinity as a standalone suite. === Unified release (2025) === In September 2025, Serif pulled all existing versions of Affinity Designer, Affinity Photo and Affinity Publisher from sale ahead an upcoming announcement on 30 October; also ahead of the announcement, the iPadOS versions of the Affinity suite became free on App Store. During a "Creative Freedom" keynote on 30 October 2025, Canva released a new version now simply branded as "Affinity" (also known as "Affinity by Canva"), and referred to internally as version 3.0. Version 3 drops the separate applications and integrates their functionality into a singular application, and adds the ability to export directly to the Canva platform. It also adds a Canva AI studio, including background removal, "Expand & Edit", and generative fill. As of version 3, Affinity has switched to a freemium model; it is now available at no charge to users, although access to Canva AI features are locked behind the existing Canva Pro subscription service. Serif stated that the perpetually-licensed version 2 will remain available to existing owners, although it will no longer be actively maintained. The new version is currently available for macOS and Windows only, with an iPadOS version to be released soon. == Reception == The change in business model by Canva in 2025 was met with mixed reception, including concerns about its incorporation of AI features. Some users were concerned that their projects would be used for machine learning purposes, or that future versions would suffer from a lack of maintenance or become adware. Additionally, some felt it turned Affinity into fundamentally subscription-based software, given the prevalence of these features in professional contexts. Affinity publicly stated on social media that it would remain "free forever", users' projects would not be used to train AI models, and that "Canva has built a sustainable business model that allows this kind of generosity. And when more professionals use Affinity, Canva can sell more seats into businesses."

    Read more →
  • Frame (networking)

    Frame (networking)

    A frame is a digital data transmission unit in computer networking and telecommunications. In packet switched systems, a frame is a simple container for a single network packet. In other telecommunications systems, a frame is a repeating structure supporting time-division multiplexing. A frame typically includes frame synchronization features consisting of a sequence of bits or symbols that indicate to the receiver the beginning and end of the payload data within the stream of symbols or bits it receives. If a receiver is connected to the system during frame transmission, it ignores the data until it detects a new frame synchronization sequence. == Packet switching == In the OSI model of computer networking, a frame is the protocol data unit at the data link layer. Frames are the result of the final layer of encapsulation before the data is transmitted over the physical layer. A frame is "the unit of transmission in a link layer protocol, and consists of a link layer header followed by a packet." Each frame is separated from the next by an interframe gap. A frame is a series of bits generally composed of frame synchronization bits, the packet payload, and a frame check sequence. Examples are Ethernet frames, Wi-Fi frames, 4G frames, Point-to-Point Protocol (PPP) frames, Fibre Channel frames, and V.42 modem frames. Often, frames of several different sizes are nested inside each other. For example, when using Point-to-Point Protocol (PPP) over asynchronous serial communication, the eight bits of each individual byte are framed by start and stop bits, the payload data bytes in a network packet are framed by the header and footer, and several packets can be framed with frame boundary octets. == Time-division multiplex == In telecommunications, specifically in time-division multiplex (TDM) and time-division multiple access (TDMA) variants, a frame is a cyclically repeated data block that consists of a fixed number of time slots, one for each logical TDM channel or TDMA transmitter. In this context, a frame is typically an entity at the physical layer. TDM application examples are SONET/SDH and the ISDN circuit-switched B-channel, while TDMA examples are Circuit Switched Data used in early cellular voice services. The frame is also an entity for time-division duplex, where the mobile terminal may transmit during some time slots and receive during others.

    Read more →
  • Personal web page

    Personal web page

    Personal web pages are World Wide Web pages created by an individual to contain content of a personal nature rather than content pertaining to a company, organization or institution. Personal web pages are primarily used for informative or entertainment purposes but can also be used for personal career marketing (by containing a list of the individual's skills, experience and a CV), social networking with other people with shared interests, or as a space for personal expression. These terms do not usually refer to just a single "page" or HTML file, but to a website—a collection of webpages and related files under a common URL or Web address. In strictly technical terms, a site's actual home page (index page) often only contains sparse content with some catchy introductory material and serves mostly as a pointer or table of contents to the more content-rich pages inside, such as résumés, family, hobbies, family genealogy, a web log/diary ("blog"), opinions, online journals and diaries or other writing, examples of written work, digital audio sound clips, digital video clips, digital photos, or information about a user's other interests. Many personal pages only include information of interest to friends and family of the author. However, some webpages set up by hobbyists or enthusiasts of certain subject areas can be valuable topical web directories. == History == In the 1990s, most Internet service providers (ISPs) provided a free small personal, user-created webpage along with free Usenet News service. These were all considered part of full Internet service. Also several free web hosting services such as GeoCities provided free web space for personal web pages. These free web hosting services would typically include web-based site management and a few pre-configured scripts to easily integrate an input form or guestbook script into the user's site. Early personal web pages were often called "home pages" and were intended to be set as a default page in a web browser's preferences, usually by their owner. These pages would often contain links, to-do lists, and other information their author found useful. In the days when search engines were in their infancy, these pages (and the links they contained) could be an important resource in navigating the web. Since the early 2000s, the rise of blogging and the development of user friendly web page designing software made it easier for amateur users who did not have computer programming or website designer training to create personal web pages. Some website design websites provided free ready-made blogging scripts, where all the user had to do was input their content into a template. At the same time, a personal web presence became easier with the increased popularity of social networking services, some with blogging platforms such as LiveJournal and Blogger. These websites provided an attractive and easy-to-use content management system for regular users. Most of the early personal websites were Web 1.0 style, in which a static display of text and images or photos was displayed to individuals who came to the page. About the only interaction that was possible on these early websites was signing the virtual "guestbook". With the collapse of the dot-com bubble in the late 1990s, the ISP industry consolidated, and the focus of web hosting services shifted away from the surviving ISP companies to independent Internet hosting services and to ones with other affiliations. For example, many university departments provided personal pages for professors and television broadcasters provided them for their on-air personalities. These free webpages served as a perquisite ("perk") for staff, while at the same time boosting the Web visibility of the parent organization. Web hosting companies either charge a monthly fee, or provide service that is "free" (advertising based) for personal web pages. These are priced or limited according to the total size of all files in bytes on the host's hard drive, or by bandwidth, (traffic), or by some combination of both. For those customers who continue to use their ISP for these services, national ISPs commonly continue to provide both disk space and help including ready-made drop-in scripts. With the rise of Web 2.0-style websites, both professional websites and user-created, amateur websites tended to contain interactive features, such as "clickable" links to online newspaper articles or favourite websites, the option to comment on content displayed on the website, the option to "tag" images, videos or links on the site, the option of "clicking" on an image to enlarge it or find out more information, the option of user participation for website guests to evaluate or review the pages, or even the option to create new user-generated content for others to see. A key difference between Web 1.0 personal webpages and Web 2.0 personal pages was while the former tended to be created by hackers, computer programmers and computer hobbyists, the latter were created by a much wider variety of users, including individuals whose main interests lay in hobbies or topics outside of computers (e.g., indie music fans, political activists, and social entrepreneurs). == Motivations == In a study done by Zinkhan, participants had four main reasons to create personal web pages. First, people use personal web pages as a portrayal of self, in a sense marketing themselves, since creators have the freedom to portray their own identities. Second, personal web pages are a way to interact with people who have similar interests as the creator, possible employers, or colleagues. Third, personal web pages can gain social acceptance with groups that the creator is interested in depending on the information that the creator reveals about themselves. Fourth, personal web pages can give creators a sense of connection to the world since these web pages are public and a way to introduce oneself to other people around the globe. People may maintain personal web pages to serve as a showcase for their skills in professional life, creative skills or self promotion of their business, charity or band. The use of personal web pages to display an individual's professional life has become more common in the 21st century. Mary Madden, an expert researcher on privacy and technology, did a study that found a tenth of American jobs require Personal web pages that advertise an individual online. Personal web pages have become a source of initial impression of possible employees used by employers. It can also be used to express opinions on issues ranging from news and politics to movies. Others may use their personal web page as a communication method. For example, an aspiring artist might give out business cards with their personal web page, and invite people to visit their page and see their artwork, "like" their page or sign their guestbook. A personal web page gives the owner generally more control on presence in search results and how they wish to be viewed online. It also allows more freedom in types and quantity of content than a social network profile offers, and can link various social media profiles with each other. It can be used to correct the record on something, or clear up potential confusion between you and someone with the same name. In the 2010s, some amateur writers, bands and filmmakers release digital versions of their stories, songs and short films online, with the aim of gaining an audience and becoming more well-known. While the huge number of aspiring artists posting their work online makes it unlikely for individuals and groups to become popular via the Internet, there are a small number of YouTube stars who were unknown until their online performances garnered them a huge audience. == Sites of academics == Academic professionals (especially at the college and university level), including professors and researchers, are often given online space for creating and storing personal web documents, including personal web pages, CVs and a list of their books, academic papers and conference presentations, on the websites of their employers. This goes back to the early decade of the World Wide Web and its original purpose of providing a quick and easy way for academics to share research papers and data. Researchers may have a personal website to share more information about themselves, about their academic activities and for sharing (unpublished) results of their research. This has been noted as part of the success of open-access repositories such as arXiv.

    Read more →
  • Stegomalware

    Stegomalware

    Stegomalware is a form of malicious software that leverages steganography techniques to conceal its code, configuration data, or command-and-control (C&C) communications within seemingly benign digital media such as images, audio files, videos, documents, or network traffic. It typically embeds encrypted or obfuscated payloads into digital media and only extracts and executes them at runtime, which makes traditional signature-based and sandbox-based detection significantly more difficult. Stegomalware has been observed in attacks ranging from advanced persistent threats (APTs) to financially motivated cybercrime, and is now the subject of dedicated academic surveys, research projects, and international law-enforcement initiatives. The key distinction between stegomalware and traditional obfuscated malware lies in the encoding location. After obfuscation, malicious code remains present within the executable and can theoretically be discovered through static analysis. In contrast, stegomalware hides the payload entirely within a cover medium (image, audio, etc.), remaining invisible until the malware dynamically extracts and executes it at runtime. == History == The term stegomalware was formally introduced by researchers Águila, Laskov, and others in the context of mobile malware and presented at the Inscrypt (Information Security and Cryptology) conference in 2014. This marked the first academic formalization of the concept, though earlier work had already identified that botnets and mobile malware could use steganography and covert channels for command-and-control communication over probabilistically unobservable channels. Since its introduction, stegomalware has evolved from a theoretical concern to a documented threat. In 2011, the APT operation known as "Operation Shady RAT" became one of the first documented cases of stegomalware in the wild, using digital images to hide Internet Protocol addresses and command-and-control server addresses. The same year, the Duqu malware (targeting industrial manufacturers) embedded victim data into JPEG image files before exfiltration, making the data transfer virtually undetectable to network-level security tools. From 2014 onwards, stegomalware became more prevalent in organized cybercrime and advanced persistent threat campaigns. Notable examples include Zeus/Zbot, which masked configuration data in images; Gatak/Stegoloader, which hid shellcode in PNG files; TeslaCrypt, which embedded C&C commands in JPEGs; and Cerber, which concealed ransomware payloads within images. By the 2010s, stegomalware had become established as a preferred evasion technique for espionage, financial theft, and ransomware distribution campaigns. Recent surveys (2020–2025) document that stegomalware has increasingly been exploited by adversaries targeting banks, enterprises, government agencies, educational institutions, and internet users via malvertising campaigns. The technique is now considered a sophisticated method of attack worthy of dedicated international law-enforcement attention. == Technical Characteristics and Definitions == Stegomalware operates through a three-component architecture: Stegotext (R): An innocent-looking digital asset (image, audio file, etc.) into which the malicious payload is embedded. Secret key (sk): A key used by the embedding and extraction algorithms, typically hardcoded into the malware. Payload (p): The actual malicious code, configuration data, or C&C commands hidden within the stegotext. The malware extracts the payload at runtime using the secret key and either executes it directly or uses it to download additional stages of the attack. Stegomalware can be classified into several types based on deployment method: Type 0 (Autonomous): Both the stegotext and extraction algorithm are embedded within the malware application itself. The malicious payload is extracted and executed locally without external communication. Type I (Update): The stegotext and secret key are downloaded from a remote server at runtime; only the extraction algorithm is included in the malware. This variant is more flexible, allowing attackers to push updated payloads. Type II (External Algorithm): Neither the stegotext nor the extraction algorithm are distributed with the malware; both are fetched from an attacker-controlled infrastructure, providing maximum flexibility and evasion. == Steganography techniques == === Spatial domain methods === Stegomalware predominantly uses steganographic methods designed for images, as images are the most common cover medium in the wild. The most basic spatial domain technique is Least Significant Bit (LSB) substitution, which replaces the least significant bits of pixel color values with payload bits. While simple and easy to implement, LSB is also relatively easy to detect through statistical analysis. More sophisticated spatial domain techniques include: HUGO (High Undetectable steGO) (2010): Minimizes detectable distortion by distributing the payload across multiple pixels, achieving embedding capacity with reduced statistical footprint. WOW (Wavelet Obtained Weights) (2012): Embeds data preferentially in textured regions of images where modifications are less perceptually noticeable. UNIWARD (Universal Wavelet Relative Distortion) (2014): Uses a universal distortion function applicable to multiple image formats, balancing payload capacity with undetectability. HILL (2014): Applies high-pass and low-pass filters to identify robust embedding regions. MiPOD (Minimizing the Power of Optimal Detector) (2016): Designed to minimize the power of theoretical optimal steganalysis detectors. === Transform domain methods === Transform domain techniques convert images into the frequency domain (e.g., using DCT or DWT) before embedding, allowing for more robust hiding in JPEG and other compressed formats: Embedding in DCT coefficients (used in JPEG compression) Embedding in DWT coefficients (used in lossless formats) Spread spectrum techniques, which distribute the payload across many frequency components Transform domain methods are generally more resistant to noise, compression, and image transformations than spatial methods. === Generative adversarial network (GAN) methods === Recent advances in machine learning have introduced GAN-based steganography, where a generative model produces stego images that minimize detectable artifacts: SGAN (Steganographic GAN) (2017): First GAN applied to steganography, using a generator, discriminator, and steganalysis network. ASDL-GAN (2017): Performs automatic steganographic distortion learning at the pixel level. SteganoGAN (2019): Improves upon earlier GAN models, achieving higher embedding capacity and robustness. HiGAN (Hiding Images GAN) (2020): Enables hiding one image within another while maintaining visual plausibility. GAN-based approaches are more resilient to standard steganalysis attacks but remain an emerging threat requiring further research. == Notable malware campaigns == Stegomalware has been documented in numerous high-profile cyber attacks and campaigns. Notable examples include: Operation Shady RAT (2011): Used digital images to hide command-and-control server addresses in targeted espionage. Duqu (2011): Embedded victim data into JPEG files to exfiltrate industrial control system information. Zeus/Zbot (2014): Masked banking configuration data inside JPEG files exploited via malvertising. Gatak/Stegoloader (2015): Hid shellcode in PNG files for software licensing attacks and bot command execution. TeslaCrypt (2015): Embedded C&C commands and ransomware keys in JPEG images. Cerber (2016): Concealed executable ransomware code in JPEG files distributed via phishing. DNSChanger (2016): Embedded malicious code in PNG files for DNS hijacking campaigns. Sundown Exploit Kit (2017): Distributed exploit code in PNG files via malvertising. AdGholas (2017): Used JPEG steganography to distribute ransomware via malvertising. Synccrypt (2017): Hidden ransomware components in JPEG-steganographic encrypted archives. ZeroT/PlugX (2017): Hid Remote Access Trojan payloads in BMP files for espionage. Loki Bot (2018): Concealed malware installers in JPEG and video files. Waterbug (APT28) (2019): Injected malicious DLLs into WAV audio files. Shlayer (macOS adware) (2019): Hid malicious URLs in JPEG files via malvertising. === Attack vectors === The most common attack vectors for stegomalware include: Phishing emails with malicious attachments or links Malvertising campaigns using malicious banner advertisements Exploit kits through compromised or malicious websites Legitimate application vulnerabilities (e.g., watering-hole attacks) Fake software distribution (cracked software, keygen tools) === Exploitation stages === Stegomalware typically serves one or more roles in attack lifecycles: Payload delivery: Stego images contain full executable code or shellcode. C&C communication: Hidden data contains server addresses or command instructio

    Read more →
  • Parasolid

    Parasolid

    Parasolid is a geometric modeling kernel originally developed by Shape Data Limited, now owned and developed by Siemens Digital Industries Software. It can be licensed by other companies for use in their 3D computer graphics software products. Parasolid's abilities include model creation and editing utilities such as Boolean modeling operators, feature modeling support, advanced surfacing, thickening and hollowing, blending and filleting, and sheet modeling. It also incorporates modeling with mesh surfaces and lattices. Parasolid also includes tools for direct model editing, including tapering, offsetting, geometry replacement and removing feature details with automated regeneration of surrounding data. Parasolid also provides wide-ranging graphical and rendering support, including hidden-line, wireframe and drafting, tessellation, and model data inquiries. To use Parasolid effectively, software developers need knowledge of CAD in general, computational geometry, and topology. Parasolid is available for Windows (32-bit, 64-bit and AArch64), Linux (64-bit and AArch64), macOS (Apple silicon and Intel), iOS, and Android. == Parasolid XT format == Parasolid parts are normally saved in XT format, which usually has the file extension .X_T. The format is documented and open. There is also a binary version of the format, usually with an .X_B extension, which is somewhat more compact. Both .X_T and .X_B are used for parts files. == Applications == It is used in many computer-aided design (CAD), computer-aided manufacturing (CAM), computer-aided engineering (CAE), product visualization, and CAD data exchange packages. Notable uses include:

    Read more →
  • Data governance

    Data governance

    Data governance is a term used on both a macro and a micro level. The former is a political concept and forms part of international relations and Internet governance; the latter is a data management concept and forms part of corporate/organizational data governance. Data governance involves delegating authority over data and exercising that authority through decision-making processes. It plays a role in enhancing the value of data assets. == Macro level == Data governance at the macro level involves regulating cross-border data flows among countries, which is more precisely termed international data governance. This field was first formed in the early 2000s, and consists of "norms, principles and rules governing various types of data." There have been several international groups established by research organizations that aim to grant access to their data. These groups that enable an exchange of data are, as a result, exposed to domestic and international legal interpretations that ultimately decide how data is used. However, as of 2023, there are no international laws or agreements specifically focused on data protection. == Data governance (Data Management) == Data governance is the set of principles, policies, and processes that guide the effective and responsible use of data within an organization. It creates a framework for decision making, accountability, and oversight across the data lifecycle, from creation and storage to sharing and disposal. Data governance is closely linked with data management, which provides the practical methods to carry out governance objectives. These methods include data quality assurance, metadata management, master data management, security controls, and compliance monitoring. Together, governance and management aim to maximize the value of data as a strategic asset, reduce risks from misuse or inaccuracy, and ensure compliance with regulatory, ethical, and business requirements. The importance of this discipline has grown with the rise of big data, cloud computing, and artificial intelligence, where consistent standards and stewardship are essential for privacy protection, interoperability, and informed decision making. == Data governance drivers == While data governance initiatives can be driven by a desire to improve data quality, they are often driven by C-level leaders responding to external regulations. In a recent report conducted by the CIO WaterCooler community, 54% stated the key driver was efficiencies in processes; 39% - regulatory requirements; and only 7% customer service. Examples of these regulations include Sarbanes–Oxley Act, Basel I, Basel II, HIPAA, GDPR, cGMP, and a number of data privacy regulations. To achieve compliance with these regulations, business processes and controls require formal management processes to govern the data subject to these regulations. Successful programs identify drivers that are meaningful to both supervisory and executive leadership. Common themes among the external regulations center on the need to manage risk. The risks can be financial misstatement, inadvertent release of sensitive data, or poor data quality for key decisions. Methods to manage these risks vary from industry to industry. Examples of commonly referenced best practices and guidelines include COBIT, ISO/IEC 38500, and others. The proliferation of regulations and standards creates challenges for data governance professionals, particularly when multiple regulations overlap the data being managed. Organizations often launch data governance initiatives to address these challenges. == Data governance initiatives (Dimensions) == Data governance initiatives improve the quality of data by assigning a team responsible for data's accuracy, completeness, consistency, timeliness, validity, and uniqueness. This team usually consists of executive leadership, project management, line-of-business managers, and data stewards. The team usually employs a methodology for tracking and improving enterprise data, such as Six Sigma, and tools for data mapping, profiling, cleansing, and monitoring data. Data governance initiatives may be aimed at achieving a number of objectives including offering better visibility to internal and external customers (such as supply chain management), compliance with regulatory law, improving operations after rapid company growth or corporate mergers, or to aid the efficiency of enterprise knowledge workers by reducing confusion and error and increasing their scope of knowledge. Many data governance initiatives are also inspired by past attempts to fix information quality at the departmental level, which can lead to incongruent and redundant data quality processes. Most large companies have many applications and databases that can not easily share information. Therefore, knowledge workers within large organizations may not have access to the data they need to best do their jobs. When they do have access to the data, the data quality may be poor. By setting up a data governance practice or corporate data authority (individual or area responsible for determining how to proceed, in the best interest of the business, when a data issue arises), these problems can be mitigated. == Implementation == Implementation of a data governance initiative may vary in scope as well as origin. Sometimes, an executive mandate will arise to initiate an enterprise-wide effort. Sometimes the mandate will be to create a pilot project or projects, limited in scope and objectives, aimed at either resolving existing issues or demonstrating value. Sometimes, an initiative originates from lower down in the organization's hierarchy and will be deployed in a limited scope to demonstrate value to potential sponsors higher up in the organization. The initial scope of an implementation can vary greatly as well, from review of a one-off IT system to a cross-organization initiative. == Data governance tools == Leaders of successful data governance programs declared at the Data Governance Conference in Orlando, FL, in December 2006, that data governance is about 80 to 95 percent communication. That stated, it is a given that many of the objectives of a data governance program must be accomplished with appropriate tools. Many vendors are now positioning their products as data governance tools. Due to the different focus areas of various data governance initiatives, a given tool may or may not be appropriate. Additionally, many tools that are not marketed as governance tools address governance needs and demands.

    Read more →
  • Visual cryptography

    Visual cryptography

    Visual cryptography is a cryptographic technique which allows visual information (pictures, text, etc.) to be encrypted in such a way that the decrypted information appears as a visual image. One of the best-known techniques has been credited to Moni Naor and Adi Shamir, who developed it in 1994. They demonstrated a visual secret sharing scheme, where a binary image was broken up into n shares so that only someone with all n shares could decrypt the image, while any n − 1 shares revealed no information about the original image. Each share was printed on a separate transparency, and decryption was performed by overlaying the shares. When all n shares were overlaid, the original image would appear. There are several generalizations of the basic scheme including k-out-of-n visual cryptography, and using opaque sheets but illuminating them by multiple sets of identical illumination patterns under the recording of only one single-pixel detector, which exposed the image. Using a similar idea, transparencies can be used to implement a one-time pad encryption, where one transparency is a shared random pad, and another transparency acts as the ciphertext. Normally, there is an expansion of space requirement in visual cryptography. But if one of the two shares is structured recursively, the efficiency of visual cryptography can be increased to 100%. Some antecedents of visual cryptography are in patents from the 1960s. Other antecedents are in the work on perception and secure communication. Visual cryptography can be used to protect biometric templates in which decryption does not require any complex computations. == Example == In this example, the binary image has been split into two component images. Each component image has a pair of pixels for every pixel in the original image. These pixel pairs are shaded black or white according to the following rule: if the original image pixel was black, the pixel pairs in the component images must be complementary; randomly shade one ■□, and the other □■. When these complementary pairs are overlapped, they will appear dark gray. On the other hand, if the original image pixel was white, the pixel pairs in the component images must match: both ■□ or both □■. When these matching pairs are overlapped, they will appear light gray. So, when the two component images are superimposed, the original image appears. However, without the other component, a component image reveals no information about the original image; it is indistinguishable from a random pattern of ■□ / □■ pairs. Moreover, if you have one component image, you can use the shading rules above to produce a counterfeit component image that combines with it to produce any image at all. == (2, n) visual cryptography sharing case == Sharing a secret with an arbitrary number of people, n, such that at least 2 of them are required to decode the secret is one form of the visual secret sharing scheme presented by Moni Naor and Adi Shamir in 1994. In this scheme we have a secret image which is encoded into n shares printed on transparencies. The shares appear random and contain no decipherable information about the underlying secret image, however if any 2 of the shares are stacked on top of one another the secret image becomes decipherable by the human eye. Every pixel from the secret image is encoded into multiple subpixels in each share image using a matrix to determine the color of the pixels. In the (2, n) case, a white pixel in the secret image is encoded using a matrix from the following set, where each row gives the subpixel pattern for one of the components: {all permutations of the columns of} : C 0 = [ 1 0 . . . 0 1 0 . . . 0 . . . 1 0 . . . 0 ] . {\displaystyle \mathbf {C_{0}=} {\begin{bmatrix}1&0&...&0\\1&0&...&0\\...\\1&0&...&0\end{bmatrix}}.} While a black pixel in the secret image is encoded using a matrix from the following set: {all permutations of the columns of} : C 1 = [ 1 0 . . . 0 0 1 . . . 0 . . . 0 0 . . . 1 ] . {\displaystyle \mathbf {C_{1}=} {\begin{bmatrix}1&0&...&0\\0&1&...&0\\...\\0&0&...&1\end{bmatrix}}.} For instance in the (2,2) sharing case (the secret is split into 2 shares and both shares are required to decode the secret) we use complementary matrices to share a black pixel and identical matrices to share a white pixel. Stacking the shares we have all the subpixels associated with the black pixel now black while 50% of the subpixels associated with the white pixel remain white. == Cheating the (2, n) visual secret sharing scheme == Horng et al. proposed a method that allows n − 1 colluding parties to cheat an honest party in visual cryptography. They take advantage of knowing the underlying distribution of the pixels in the shares to create new shares that combine with existing shares to form a new secret message of the cheaters choosing. We know that 2 shares are enough to decode the secret image using the human visual system. But examining two shares also gives some information about the 3rd share. For instance, colluding participants may examine their shares to determine when they both have black pixels and use that information to determine that another participant will also have a black pixel in that location. Knowing where black pixels exist in another party's share allows them to create a new share that will combine with the predicted share to form a new secret message. In this way a set of colluding parties that have enough shares to access the secret code can cheat other honest parties. == Visual steganography == 2×2 subpixels can also encode a binary image in each component image. For example, each white pixel of each component image could be represented by two black subpixels, while each black pixel represented by three black subpixels. When overlaid, each white pixel of the secret image is represented by three black subpixels, while each black pixel is represented by all four subpixels black. Each corresponding pixel in the component images is randomly rotated to avoid orientation leaking information about the secret image. == In popular culture == In "Do Not Forsake Me Oh My Darling", a 1967 episode of TV series The Prisoner, the protagonist uses a visual cryptography overlay of multiple transparencies to reveal a secret message – the location of a scientist friend who had gone into hiding.

    Read more →
  • TikTokification

    TikTokification

    TikTokification (also written TikTok-ification) is a term used to describe the widespread adoption of TikTok's short-form, vertical video format and its algorithmic content-delivery model across the broader social media landscape. The phenomenon encompasses the strategic and cultural changes made by competing platforms such as Instagram, YouTube, Facebook, Snapchat, and LinkedIn in response to TikTok's global dominance. Beyond platform design, the term is also used more broadly to describe shifts in media consumption habits, advertising strategies, and, more critically, the potential cognitive and psychological effects associated with constant short-form video consumption. == Background == === Origins of short-form video === The short-form video format predates TikTok. Vine, launched in 2013, popularised six-second looping videos before shutting down in 2017. TikTok itself, known as Douyin in the Chinese market, was created by the Chinese technology company ByteDance in September 2016. Following its international expansion and its 2018 merger with Musical.ly, TikTok grew rapidly. By 2020, the application had surpassed two billion total downloads worldwide, with over 800 million monthly active users. A key driver of TikTok's success was its recommendation algorithm. The platform's "For You Page" (FYP) serves content to users based on behaviour rather than follower count, making it possible for unknown creators to achieve widespread reach organically. Analysts noted that TikTok serves "fast, visually engaging, and authentic videos that feel more like entertainment than advertising," fundamentally reshaping consumer expectations of digital content. TikTok has been described as "the center of the internet for young people," where users go for entertainment, news, trends, and shopping. As of the mid-2020s, TikTok had approximately 1.12 billion monthly active users. == Platform responses == TikTok's success compelled nearly every major social media platform to restructure its product around short-form video. In 2020, Instagram launched Reels and YouTube launched Shorts, both directly in response to TikTok's growth. Platforms like Meta's Instagram Reels and Google's YouTube Shorts subsequently expanded aggressively, launching new features, creator tools, and even considering separate standalone applications to compete. LinkedIn, traditionally a professional networking site, began experimenting with TikTok-style short-form vertical video feeds. Facebook launched a singular unified video feed combining Reels, long videos, and live videos, similar in structure to TikTok's feed. Snapchat redesigned its application to combine Stories and Spotlight into a unified entertainment feed. YouTube extended its Shorts format to allow videos up to three minutes in length, up from the previous limit of sixty seconds, as of October 2024. Despite these adaptations, experts noted that none of TikTok's rivals had matched its algorithmic precision as of mid-2025. == Societal and cultural impact == === Media and journalism === News organisations have also been affected by TikTokification. Short-form video grew rapidly as a format for news content, driven in large part by TikTok's popularity. According to Pew Research Center, 17% of adults in the United States reported regularly getting news from TikTok in 2024, with 63% of teenagers saying they used the platform as a news source. In response, major publishers began creating bespoke short-form content for TikTok's audience, with organisations such as the BBC building dedicated internal TikTok teams. === Advertising and commerce === TikTokification has had significant effects on the advertising industry. US social video advertising spending was projected to surpass linear television advertising spending for the first time in 2025. Global social commerce sales were projected to reach approximately $900 billion in 2025, with platforms like Douyin and TikTok driving a large share of that growth. TikTok itself generated an estimated $23.6 billion in advertising revenue in 2024. Short-form video has been described as bridging the gap between brand awareness and direct conversion. Surveys have found that consumers trust user-generated content 8.7 times more than influencer content and 6.6 times more than branded content, prompting brands to favour creator-led video formats. === Attention spans and cognitive effects === A growing body of research has examined the cognitive consequences of heavy short-form video consumption, a set of effects sometimes referred to as "TikTok Brain." A large systematic review and meta-analysis published in Psychological Bulletin, analysing data from 98,299 participants across 71 studies, found that the more short-form video content a person watches, the poorer their cognitive performance in attention and inhibitory control. The review also found that greater engagement with short-form video was associated with higher levels of anxiety, depression, and stress, as well as sleep disturbances. The platform's inherent demand for engaging content has resulted in the proliferation of sludge content, a genre of split screen video with the main video on the top and an unrelated attention-grabbing video on the bottom, typically repetitive gameplay (notably of the endless runner mobile game Subway Surfers) or oddly satisfying videos, designed to maximize viewer retention in cases where the main video may appear uninteresting and would normally cause the viewer to skip it. Sludge content is often described as overstimulating, reflecting and contributing to declining attention spans, though the scholarly evidence supporting such claims is not conclusive. Dr. Yann Poncin, associate professor at the Child Study Center at Yale University, noted that "infinite scrolling and short-form video are designed to capture your attention in short bursts," contrasting this with earlier entertainment formats that guided audiences through longer narratives. Research suggests that children and teenagers may be particularly vulnerable, with early exposure to rapid frame changes potentially conditioning the brain's neural pathways to require constant stimulation, making it more challenging to engage with slower-paced activities. A separate study published in Nature Communications by researchers at the Technical University of Denmark documented a notable decrease in collective attention span over time, attributing it in part to the increasing volume and pace of content production and consumption online. Researchers caution, however, that the majority of relevant studies are cross-sectional, meaning they capture data at a single point in time and cannot establish causality. It remains possible that individuals with pre-existing conditions such as anxiety or attention deficits may be more likely to engage heavily with these platforms as a coping mechanism. === Academic and sociological analysis === Scholars have framed TikTokification within the context of the attention economy. A 2024 academic analysis described TikTok as representing "a new paradigm of social media communication" shaped by youth culture, mobile technology, and the economics of attention, in which spectators become active contributors to a shared content pipeline. The same analysis noted that TikTok "reflects a new mode of communication influenced by avant-garde cinema, the use of mobile technology, and the social habits of particular social groups." US social media users were projected to spend 61.1% of their time on social networks watching videos in 2025, up from 33.3% in 2019, before TikTok became widely popular, underscoring the scale of the behavioural shift. == Monetisation challenges == Despite high engagement levels, monetising short-form video has remained difficult for platforms and creators alike. Unlike long-form YouTube content, short clips offer limited space for advertisers to insert advertisements. YouTube Shorts pays approximately four cents per 1,000 views, considerably less than its long-form counterpart. From 2025 onward, platforms began introducing creator funds, advertisements, and AI-driven content recommendations as part of broader efforts to make short-form video economically sustainable for creators.

    Read more →
  • Attention (machine learning)

    Attention (machine learning)

    In machine learning, attention is a method that determines the importance of each component in a sequence relative to the other components in that sequence. In natural language processing, importance is represented by "soft" weights assigned to each word in a sentence. More generally, attention encodes vectors called token embeddings across a fixed-width sequence that can range from tens to millions of tokens in size. Unlike "hard" weights, which are computed during the backwards training pass, "soft" weights exist only in the forward pass and therefore change with every step of the input. Earlier designs implemented the attention mechanism in a serial recurrent neural network (RNN) language translation system, but a more recent design, namely the transformer, removed the slower sequential RNN and relied more heavily on the faster parallel attention scheme. Inspired by ideas about attention in humans, the attention mechanism was developed to address the weaknesses of using information from the hidden layers of recurrent neural networks. Recurrent neural networks favor information contained in words at the end of a sentence and thus deemed more recent, thereby tending to attenuate the significance and associated predictive weight assigned to information earlier in the sentence. Attention allows a token equal access to any part of a sentence directly, rather than only through the previous state. == History == Additional surveys of the attention mechanism in deep learning are provided by Niu et al. and Soydaner. The major breakthrough came with self-attention, where each element in the input sequence attends to all others, enabling the model to capture global dependencies. This idea was central to the Transformer architecture, which replaced recurrence with attention mechanisms. As a result, Transformers became the foundation for models like BERT, T5 and generative pre-trained transformers (GPT). == Overview == The modern era of machine attention was revitalized by grafting an attention mechanism (Fig 1. orange) to an Encoder-Decoder. Figure 2 shows the internal step-by-step operation of the attention block (A) in Fig 1. === Interpreting attention weights === In translating between languages, alignment is the process of matching words from the source sentence to words of the translated sentence. Networks that perform verbatim translation without regard to word order would show the highest scores along the (dominant) diagonal of the matrix. The off-diagonal dominance shows that the attention mechanism is more nuanced. Consider an example of translating I love you to French. On the first pass through the decoder, 94% of the attention weight is on the first English word I, so the network offers the word je. On the second pass of the decoder, 88% of the attention weight is on the third English word you, so it offers t'. On the last pass, 95% of the attention weight is on the second English word love, so it offers aime. In the I love you example, the second word love is aligned with the third word aime. Stacking soft row vectors together for je, t', and aime yields an alignment matrix: Sometimes, alignment can be multiple-to-multiple. For example, the English phrase look it up corresponds to cherchez-le. Thus, "soft" attention weights work better than "hard" attention weights (setting one attention weight to 1, and the others to 0), as we would like the model to make a context vector consisting of a weighted sum of the hidden vectors, rather than "the best one", as there may not be a best hidden vector. == Variants == Many variants of attention implement soft weights, such as fast weight programmers, or fast weight controllers (1992). A "slow" neural network outputs the "fast" weights of another neural network through outer products. The slow network learns by gradient descent. It was later renamed as "linearized self-attention". Bahdanau-style attention, also referred to as additive attention, Luong-style attention, which is known as multiplicative attention, Early attention mechanisms similar to modern self-attention were proposed using recurrent neural networks. However, the highly parallelizable self-attention was introduced in 2017 and successfully used in the Transformer model, positional attention and factorized positional attention. For convolutional neural networks, attention mechanisms can be distinguished by the dimension on which they operate, namely: spatial attention, channel attention, or combinations. These variants recombine the encoder-side inputs to redistribute those effects to each target output. Often, a correlation-style matrix of dot products provides the re-weighting coefficients. In the figures below, W is the matrix of context attention weights, similar to the formula in Overview section above. == Optimizations == === Flash attention === The size of the attention matrix is proportional to the square of the number of input tokens. Therefore, when the input is long, calculating the attention matrix requires a lot of GPU memory. Flash attention is an implementation that reduces the memory needs and increases efficiency without sacrificing accuracy. It achieves this by partitioning the attention computation into smaller blocks that fit into the GPU's faster on-chip memory, reducing the need to store large intermediate matrices and thus lowering memory usage while increasing computational efficiency. === FlexAttention === FlexAttention is an attention kernel developed by Meta that allows users to modify attention scores prior to softmax and dynamically chooses the optimal attention algorithm. == Applications == Attention is widely used in natural language processing, computer vision, and speech recognition. In NLP, it improves context understanding in tasks like question answering and summarization. In vision, visual attention helps models focus on relevant image regions, enhancing object detection and image captioning. === Attention maps as explanations for vision transformers === From the original paper on vision transformers (ViT), visualizing attention scores as a heat map (called saliency maps or attention maps) has become an important and routine way to inspect the decision making process of ViT models. One can compute the attention maps with respect to any attention head at any layer, while the deeper layers tend to show more semantically meaningful visualization. Attention rollout is a recursive algorithm to combine attention scores across all layers, by computing the dot product of successive attention maps. Because vision transformers are typically trained in a self-supervised manner, attention maps are generally not class-sensitive. When a classification head is attached to the ViT backbone, class-discriminative attention maps (CDAM) combines attention maps and gradients with respect to the class [CLS] token. Some class-sensitive interpretability methods originally developed for convolutional neural networks can be also applied to ViT, such as GradCAM, which back-propagates the gradients to the outputs of the final attention layer. Using attention as basis of explanation for the transformers in language and vision is not without debate. While some pioneering papers analyzed and framed attention scores as explanations, higher attention scores do not always correlate with greater impact on model performances. == Mathematical representation == === Standard scaled dot-product attention === For matrices: Q ∈ R m × d k , K ∈ R n × d k {\displaystyle Q\in \mathbb {R} ^{m\times d_{k}},K\in \mathbb {R} ^{n\times d_{k}}} and V ∈ R n × d v {\displaystyle V\in \mathbb {R} ^{n\times d_{v}}} , the scaled dot-product, or QKV attention, is defined as: Attention ( Q , K , V ) = softmax ( Q K T d k ) V ∈ R m × d v {\displaystyle {\text{Attention}}(Q,K,V)={\text{softmax}}\left({\frac {QK^{T}}{\sqrt {d_{k}}}}\right)V\in \mathbb {R} ^{m\times d_{v}}} where T {\displaystyle {}^{T}} denotes transpose and the softmax function is applied independently to every row of its argument. The matrix Q {\displaystyle Q} contains m {\displaystyle m} queries, while matrices K , V {\displaystyle K,V} jointly contain an unordered set of n {\displaystyle n} key-value pairs. Value vectors in matrix V {\displaystyle V} are weighted using the weights resulting from the softmax operation, so that the rows of the m {\displaystyle m} -by- d v {\displaystyle d_{v}} output matrix are confined to the convex hull of the points in R d v {\displaystyle \mathbb {R} ^{d_{v}}} given by the rows of V {\displaystyle V} . To understand the permutation invariance and permutation equivariance properties of QKV attention, let A ∈ R m × m {\displaystyle A\in \mathbb {R} ^{m\times m}} and B ∈ R n × n {\displaystyle B\in \mathbb {R} ^{n\times n}} be permutation matrices; and D ∈ R m × n {\displaystyle D\in \mathbb {R} ^{m\times n}} an arbitrary matrix. The softmax function is permutation equivariant in the sense that: softmax ( A D B ) = A softmax ( D ) B {\displays

    Read more →
  • Harvest now, decrypt later

    Harvest now, decrypt later

    Harvest now, decrypt later (HNDL) is a surveillance strategy that relies on the acquisition and long-term storage of currently unreadable encrypted data awaiting possible breakthroughs in decryption technology that would render it readable in the future—a hypothetical date referred to as Y2Q (a reference to Y2K), or Q-Day. The most common concern is the prospect of developments in quantum computing which would allow current strong encryption algorithms to be broken at some time in the future, making it possible to decrypt any stored material that had been encrypted using those algorithms. However, the improvement in decryption technology need not be due to a quantum-cryptographic advance; any other form of attack capable of enabling decryption would be sufficient. The existence of this strategy has led to concerns about the need to urgently deploy post-quantum cryptography; even though no practical quantum attacks yet exist, some data stored now may still remain sensitive even decades into the future. As of 2022, the U.S. federal government has proposed a roadmap for organizations to start migrating toward quantum-cryptography-resistant algorithms to mitigate these threats. This new version of Commercial National Security Algorithm Suite uses publicly-available algorithms and is allowed for government use up to the TOP SECRET level. == Terminology and scope == The term “harvest now, decrypt later” encompasses various surveillance or espionage operations in which ciphertext or encrypted communications are collected today with the view that they may one day be decrypted, given sufficient advances in computing power or cryptanalysis. The abbreviation HNDL is sometimes used in technical and policy documents. The “Y2Q” (or “Q-Day”) label draws an analogy to the Y2K date-change issue, emphasising a potential future point at which current cryptography may collapse. The strategy is particularly relevant for data with long confidentiality lifetimes, such as diplomatic communications, personal health records, critical infrastructure logs, or intellectual property. == Mitigation strategies == The primary defense against HNDL attacks is the transition to post-quantum cryptography (PQC), which utilizes algorithms believed to be secure against quantum computer attacks. However, because PQC protects the data payload digitally, rather than the transmission itself, the encrypted data can still be harvested and stored. A complementary approach involves physical layer security (also known as optical layer encryption or photonic shielding). Unlike algorithmic encryption, this method modifies the optical waveform itself—often by burying the signal within optical noise or using spectral phase encoding—to render the transmission unrecordable by standard receivers. By preventing the attacker from capturing a valid signal in the first place, this approach aims to eliminate the "harvest" phase of the threat. Commercial implementations of harvest-proof optical encryption have been developed by firms such as CyberRidge to secure long-haul fiber networks. Field trials have demonstrated 100 Gbps throughput over legacy DWDM networks using this method.

    Read more →
  • Information leakage

    Information leakage

    Information leakage happens whenever a system that is designed to be closed to an eavesdropper reveals some information to unauthorized parties nonetheless. In other words: Information leakage occurs when secret information correlates with, or can be correlated with, observable information. For example, when designing an encrypted instant messaging network, a network engineer without the capacity to crack encryption codes could see when messages are transmitted, even if he could not read them. == Risk vectors == A modern example of information leakage is the leakage of secret information via data compression, by using variations in data compression ratio to reveal correlations between known (or deliberately injected) plaintext and secret data combined in a single compressed stream. Another example is the key leakage that can occur when using some public-key systems when cryptographic nonce values used in signing operations are insufficiently random. Bad randomness cannot protect proper functioning of a cryptographic system, even in a benign circumstance, it can easily produce crackable keys that cause key leakage. Information leakage can sometimes be deliberate: for example, an algorithmic converter may be shipped that intentionally leaks small amounts of information, in order to provide its creator with the ability to intercept the users' messages, while still allowing the user to maintain an illusion that the system is secure. This sort of deliberate leakage is sometimes known as a subliminal channel. Generally, only very advanced systems employ defenses against information leakage. Following are the commonly implemented countermeasures : Use steganography to hide the fact that a message is transmitted at all. Use chaffing to make it unclear to whom messages are transmitted (but this does not hide from others the fact that messages are transmitted). For busy re-transmitting proxies, such as a Mixmaster node: randomly delay and shuffle the order of outbound packets - this will assist in disguising a given message's path, especially if there are multiple, popular forwarding nodes, such as are employed with Mixmaster mail forwarding. When a data value is no longer going to be used, erase it from the memory.

    Read more →