AI Detector Generator

AI Detector Generator — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • AI alignment

    AI alignment

    In the field of artificial intelligence (AI), alignment aims to steer AI systems toward a person's or group's intended goals, preferences, or ethical principles. An AI system is considered aligned if it advances the intended objectives. A misaligned AI system pursues unintended objectives. It is often difficult for AI designers to specify the full range of desired and undesired behaviors. Therefore, the designers often use simpler proxy goals, such as gaining human approval. But proxy goals can overlook necessary constraints or reward the AI system for merely appearing aligned. AI systems may also find loopholes that allow them to accomplish their proxy goals efficiently but in unintended, sometimes harmful, ways (reward hacking). Advanced AI systems may develop unwanted instrumental strategies, such as seeking power or self-preservation because such strategies help them achieve their assigned final goals. Furthermore, they might develop undesirable emergent goals that could be hard to detect before the system is deployed and encounters new situations and data distributions. Empirical research showed in 2024 that advanced large language models (LLMs) such as OpenAI o1 or Claude 3 sometimes engage in strategic deception to achieve their goals or prevent them from being changed. Some of these issues affect existing commercial systems such as LLMs, robots, autonomous vehicles, and social media recommendation engines. Some AI researchers argue that more capable future systems will be more severely affected because these problems partially result from high capabilities. Many prominent AI researchers and AI company leaders have argued or asserted that AI is approaching human-like (AGI) and superhuman cognitive capabilities (ASI), and could endanger human civilization if misaligned. These include "AI godfathers" Geoffrey Hinton and Yoshua Bengio and the CEOs of OpenAI, Anthropic, and Google DeepMind. These risks remain debated. AI alignment is a subfield of AI safety, the study of how to build safe AI systems. Other subfields of AI safety include robustness, monitoring, and capability control. Research challenges in alignment include instilling complex values in AI, developing honest AI, scalable oversight, auditing and interpreting AI models, and preventing emergent AI behaviors like power-seeking. Alignment research has connections to interpretability research, (adversarial) robustness, anomaly detection, calibrated uncertainty, formal verification, preference learning, safety-critical engineering, game theory, algorithmic fairness, and social sciences. == Objectives in AI == Programmers provide an AI system such as AlphaZero with an "objective function", in which they intend to encapsulate the goal(s) the AI is configured to accomplish. Such a system later populates a (possibly implicit) internal "model" of its environment. This model encapsulates all the agent's beliefs about the world. The AI then creates and executes whatever plan is calculated to maximize the value of its objective function. For example, when AlphaZero is trained on chess, it has a simple objective function of "+1 if AlphaZero wins, −1 if AlphaZero loses". During the game, AlphaZero attempts to execute whatever sequence of moves it judges most likely to attain the maximum value of +1. Similarly, a reinforcement learning system can have a "reward function" that allows the programmers to shape the AI's desired behavior. An evolutionary algorithm's behavior is shaped by a "fitness function". == Alignment problem == In 1960, AI pioneer Norbert Wiener described the AI alignment problem as follows: If we use, to achieve our purposes, a mechanical agency with whose operation we cannot interfere effectively [...] we had better be quite sure that the purpose put into the machine is the purpose which we really desire. AI alignment refers to ensuring that an AI system's objectives match some target. The target is variously defined as the goals of the system's designers or users, widely shared values, objective ethical standards, legal requirements, or the intentions its designers would have if they were more informed and enlightened. In democratic AI alignment, the target is the values and preferences of median voters, which increases political legitimacy. AI alignment is an open problem for modern AI systems and is a research field within AI. Aligning AI involves two main challenges: carefully specifying the purpose of the system (outer alignment) and ensuring that the system adopts the specification robustly (inner alignment). Researchers also attempt to create AI models that have robust alignment, sticking to safety constraints even when users adversarially try to bypass them. === Specification gaming and side effects === To specify an AI system's purpose, AI designers typically provide an objective function, examples, or feedback to the system. But designers are often unable to completely specify all important values and constraints, so they resort to easy-to-specify proxy goals such as maximizing the approval of human overseers, who are fallible. As a result, AI systems can find loopholes that help them accomplish the specified objective efficiently but in unintended, possibly harmful ways. This tendency is known as specification gaming or reward hacking, and is an instance of Goodhart's law. As AI systems become more capable, they are often able to game their specifications more effectively. Specification gaming has been observed in numerous AI systems. OpenAI GPT models for programming—including in real-world cases—have been found to explicitly plan hacking the tests used to evaluate them to falsely appear successful (e.g., explicitly stating "let's hack"). When the company penalized this, many models learned to obfuscate their plans while continuing to hack the tests. Another system was trained to finish a simulated boat race by rewarding the system for hitting targets along the track, but the system achieved more reward by looping and crashing into the same targets indefinitely. A 2025 Palisade Research study found that when tasked to win at chess against a stronger opponent, some reasoning LLMs attempted to hack the game system, for example by modifying or entirely deleting their opponent. Some alignment researchers aim to help humans detect specification gaming and steer AI systems toward carefully specified objectives that are safe and useful to pursue. When a misaligned AI system is deployed, it can have consequential side effects. Social media platforms have been known to optimize their recommendation algorithms for click-through rates, causing user addiction on a global scale. Stanford researchers say that such recommender systems are misaligned with their users because they "optimize simple engagement metrics rather than a harder-to-measure combination of societal and consumer well-being". Explaining such side effects, Berkeley computer scientist Stuart J. Russell said that the omission of implicit constraints can cause harm: "A system [...] will often set [...] unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer's apprentice, or King Midas: you get exactly what you ask for, not what you want." Some researchers suggest that AI designers specify their desired goals by listing forbidden actions or by formalizing ethical rules (as with Asimov's Three Laws of Robotics). But Russell and Norvig argue that this approach overlooks the complexity of human values: "It is certainly very hard, and perhaps impossible, for mere humans to anticipate and rule out in advance all the disastrous ways the machine could choose to achieve a specified objective." Additionally, even if an AI system fully understands human intentions, it may still disregard them, because following human intentions may not be its objective (unless it is already fully aligned). === Pressure to deploy unsafe systems === Commercial organizations sometimes have incentives to take shortcuts on safety and to deploy misaligned or unsafe AI systems. For example, social media recommender systems have been profitable despite creating unwanted addiction and polarization. Competitive pressure can also lead to a race to the bottom on AI safety standards. For example, OpenAI has been sued for releasing a ChatGPT version that encouraged suicide for some unstable users, a behavior the company had overlooked amid a rushed product release. Similarly, in 2018, a self-driving car killed a pedestrian (Elaine Herzberg) after engineers disabled the emergency braking system because it was oversensitive and slowed development. === Risks from advanced misaligned AI === Some researchers are interested in aligning increasingly advanced AI systems, as progress in AI development is rapid, and industry and governments are trying to build advan

    Read more →
  • Data philanthropy

    Data philanthropy

    Data philanthropy refers to the practice of private companies donating corporate data. This data is usually donated to nonprofits or donation-run organizations that have difficulty keeping up with expensive data collection technology. The concept was introduced through the United Nations Global Pulse initiative in 2011 to explore corporate data assets for humanitarian, academic, and societal causes. For example, anonymized mobile data could be used to track disease outbreaks, or data on consumer actions may be shared with researchers to study public health and economic trends. == Definition == A large portion of data collected from the internet consists of user-generated content, such as blogs, social media posts, and information submitted through lead generation and data forms. Additionally, corporations gather and analyze consumer data to gain insight into customer behavior, identify potential markets, and inform investment decisions. United Nations Global Pulse director Robert Kirkpatrick has referred to this type of data as "massive passive data" or "data exhaust." == Challenges == While data philanthropy can enhance development policies, making users' private data available to various organizations raises concerns regarding privacy, ownership, and the equitable use of data. Different techniques, such as differential privacy and alphanumeric strings of information, can allow access to personal data while ensuring user anonymity. However, even if these algorithms work, re-identification may still be possible. Another challenge is convincing corporations to share their data. The data collected by corporations provides them with market competitiveness and insight regarding consumer behavior. Corporations may fear losing their competitive edge if they share the information they have collected with the public. Numerous moral challenges are also encountered. In 2016, Mariarosaria Taddeo, a digital ethics professor at the University of Oxford, proposed an ethical framework to address them. == Sharing strategies == The goal of data philanthropy is to create a global data commons where companies, governments, and individuals can contribute anonymous, aggregated datasets. The United Nations Global Pulse offers four different tactics that companies can use to share their data that preserve consumer anonymity: Share aggregated and derived data sets for analysis under nondisclosure agreements (NDA) Allow researchers to analyze data within the private company's own network under NDAs Real-Time Data Commons: data pooled and aggregated among multiple companies of the same industry to protect competitiveness Public/Private Alerting Network: companies mine data behind their own firewalls and share indicators == Application in various fields == Many corporations take part in data philanthropy, including social networking platforms (e.g., Facebook, Twitter), telecommunications providers (e.g., Verizon, AT&T), and search engines (e.g., Google, Bing). Collecting and sharing anonymized, aggregated user-generated data is made available through data-sharing systems to support research, policy development, and social impact initiatives. By participating in such efforts, these organizations contribute to causes regarded as beneficial to society, allowing institutions to give back meaningfully. With the onset of technological advancements, the sharing of data on a global scale and an in-depth analysis of these data structures could mitigate the effects of global issues such as natural disasters and epidemics. Robert Kirkpatrick, the Director of the United Nations Global Pulse, has argued that this aggregated information is beneficial for the common good and can lead to developments in research and data production in a range of varied fields. === Digital disease detection === Health researchers use digital disease detection by collecting data from various sources—such as social media platforms (e.g., Twitter, Facebook), mobile devices (e.g., cell phones, smartphones), online search queries, mobile apps, and sensor data from wearables and environmental sensors—to monitor and predict the spread of infectious diseases. This approach allows them to track and anticipate outbreaks of epidemics (e.g., COVID-19, Ebola), pandemics, vector-borne diseases (e.g., malaria, dengue fever), and respiratory illnesses (e.g., influenza, SARS), improving response and intervention strategies for the spread of diseases. In 2008, Centers for Disease Control and Prevention collaborated with Google and launched Google Flu Trends, a website that tracked flu-related searches and user locations to track the spread of the flu. Users could visit Google Flu Trends to compare the amount of flu-related search activity versus the reported numbers of flu outbreaks on a graphical map. One drawback of this method of tracking was that Google searches are sometimes performed due to curiosity rather than when an individual is suffering from the flu. According to Ashley Fowlkes, an epidemiologist in the CDC Influenza division, "The Google Flu Trends system tries to account for that type of media bias by modeling search terms over time to see which ones remain stable." Google Flu Trends is no longer publishing current flu estimates on the public website; however, visitors to the site can still view and download previous estimates. Current data can be shared with verified researchers. A study from the Harvard School of Public Health (HSPH), published in the October 12, 2012 issue of Science, discussed how phone data helped curb the spread of malaria in Kenya. The researchers mapped phone calls and texts made by 14,816,521 Kenyan mobile phone subscribers. When individuals left their primary living location, the destination and length of journey were calculated. This data was then compared to a 2009 malaria prevalence map to estimate the disease's commonality in each location. Combining all this information, the researchers could estimate the probability of an individual carrying malaria and map the movement of the disease. This research can be used to track the spread of similar diseases. === Humanitarian aid === Calling patterns of mobile phone users can determine the socioeconomic standings of the populace, which can be used to deduce "its access to housing, education, healthcare, and basic services such as water and electricity." Researchers from Columbia University and Karolinska Institute used daily SIM card location data from both before and after the 2010 Haiti earthquake to estimate the movement of people both in response to the earthquake and during the related 2010 Haiti cholera outbreak. Their research suggests that mobile phone data can provide rapid and accurate estimates of population movements during disasters and outbreaks of infectious disease. Big data can also provide information on looming disasters and can assist relief organizations in rapid-response and locating displaced individuals. By analyzing specific patterns within this 'big data', governments and NGOs can enhance responses to disruptive events such as natural disasters, disease outbreaks, and global economic crises. Leveraging real-time information enables a deeper understanding of individual well-being, allowing for more effective interventions. Corporations utilize digital services, such as human sensor systems, to detect and solve impending problems within communities. This is a strategy used by the private sector to anonymously share customer information for public benefit, while preserving user privacy. === Impoverished areas === Poverty still remains a worldwide issue, with over 2.5 billion people currently impoverished. Statistics indicate the widespread use of mobile phones, even within impoverished communities. Additional data can be collected through Internet access, social media, utility payments and governmental statistics. Data-driven activities can lead to the accumulation of 'big data', which in turn can assist international non-governmental organizations in documenting and evaluating the needs of underprivileged populations. Through data philanthropy, NGOs can distribute information while cooperating with governments and private companies. === Corporate === Data philanthropy incorporates aspects of social philanthropy by allowing corporations to create profound impacts through the act of giving back by dispersing proprietary datasets. The public sector collects and preserves information, considered an essential asset. Companies track and analyze users' online activities to gain insight into their needs related to new products and services. These companies view the welfare of the population as key to business expansion and progression by using their data to highlight global citizens' issues. Experts in the private sector emphasize the importance of integrating diverse data sources—such as retail, mobile, and social media data—to develop essential solutions for global challenges. In Data Philanthropy:

    Read more →
  • Secret London

    Secret London

    Secret London is a Facebook group started by 21-year-old Bristol University graduate, Tiffany Philippou, on 19 January 2010 in response to a Saatchi & Saatchi competition. The group grew rapidly (180,000 members as of 8 February 2010) and is composed mostly of Londoners who use the site to share suggestions and photos of London. After the group's early success, the founder announced her intention to launch a website of the same name by crowdsourcing the design and development. The website was launched on 16 February 2010. == Other secret cities == Following the initial success of Secret London, a number of other secret groups were independently started around the world, some of which already have over 100,000 users. As of 19 February 2010, the list of other groups includes: Secret Frankfurt, Secret Tel Aviv, Secret Paris, Secret New York, Secret Tokyo, Secret Toronto, Secret Los Angeles, Secret Exeter, Secret Boston, Secret Norwich, Secret Singapore, Secret Brighton, Secret Minneapolis, Secret Sydney, Secret Canberra, Secret Brisbane, Secret Wellington, Secret Christchurch, Secret Madeira, Secret Funchal, Secret Bristol and Secret Cardiff. == Controversy == Some commentators have questioned whether it possible to share secrets without compromising them, and whether sharing tips publicly will lead to over-exposure of the businesses who are recommended.

    Read more →
  • Trusted Computing

    Trusted Computing

    Trusted Computing (TC) is a technology developed and promoted by the Trusted Computing Group. The term is taken from the field of trusted systems and has a specialized meaning that is distinct from the field of confidential computing. With Trusted Computing, the computer will consistently behave in expected ways, and those behaviors will be enforced by computer hardware and software. Enforcing this behavior is achieved by loading the hardware with a unique encryption key that is inaccessible to the rest of the system and the owner. TC is controversial as the hardware is not only secured for its owner, but also against its owner, leading opponents of the technology like free software activist Richard Stallman to deride it as "treacherous computing", and certain scholarly articles to use scare quotes when referring to the technology. Trusted Computing proponents such as International Data Corporation, the Enterprise Strategy Group and Endpoint Technologies Associates state that the technology will make computers safer, less prone to viruses and malware, and thus more reliable from an end-user perspective. They also state that Trusted Computing will allow computers and servers to offer improved computer security over that which is currently available. Opponents often state that this technology will be used primarily to enforce digital rights management policies (imposed restrictions to the owner) and not to increase computer security. Chip manufacturers Intel and AMD, hardware manufacturers such as HP and Dell, and operating system providers such as Microsoft include Trusted Computing in their products if enabled. The U.S. Army requires that every new PC it purchases comes with a Trusted Platform Module (TPM). As of July 3, 2007, so does virtually the entire United States Department of Defense. == Key concepts == Trusted Computing encompasses six key technology concepts, of which all are required for a fully Trusted system, that is, a system compliant to the TCG specifications: Endorsement key Secure input and output Memory curtaining / protected execution Sealed storage Remote attestation Trusted Third Party (TTP) === Endorsement key === The endorsement key is a 2048-bit RSA public and private key pair that is created randomly on the chip at manufacture time and cannot be changed. The private key never leaves the chip, while the public key is used for attestation and for encryption of sensitive data sent to the chip, as occurs during the TPM_TakeOwnership command. This key is used to allow the execution of secure transactions: every Trusted Platform Module (TPM) is required to be able to sign a random number (in order to allow the owner to show that he has a genuine trusted computer), using a particular protocol created by the Trusted Computing Group (the direct anonymous attestation protocol) in order to ensure its compliance of the TCG standard and to prove its identity; this makes it impossible for a software TPM emulator with an untrusted endorsement key (for example, a self-generated one) to start a secure transaction with a trusted entity. The TPM should be designed to make the extraction of this key by hardware analysis hard, but tamper resistance is not a strong requirement. === Memory curtaining === Memory curtaining extends common memory protection techniques to provide full isolation of sensitive areas of memory—for example, locations containing cryptographic keys. Even the operating system does not have full access to curtained memory. The exact implementation details are vendor specific. === Sealed storage === Sealed storage protects private information by binding it to platform configuration information including the software and hardware being used. This means the data can be released only to a particular combination of software and hardware. Sealed storage can be used for DRM enforcing. For example, users who keep a song on their computer that has not been licensed to be listened will not be able to play it. Currently, a user can locate the song, listen to it, and send it to someone else, play it in the software of their choice, or back it up (and in some cases, use circumvention software to decrypt it). Alternatively, the user may use software to modify the operating system's DRM routines to have it leak the song data once, say, a temporary license was acquired. Using sealed storage, the song is securely encrypted using a key bound to the trusted platform module so that only the unmodified and untampered music player on his or her computer can play it. In this DRM architecture, this might also prevent people from listening to the song after buying a new computer, or upgrading parts of their current one, except after explicit permission of the vendor of the song. === Remote attestation === Remote attestation allows changes to the user's computer to be detected by authorized parties. For example, software companies can identify unauthorized changes to software, including users modifying their software to circumvent commercial digital rights restrictions. It works by having the hardware generate a certificate stating what software is currently running. The computer can then present this certificate to a remote party to show that unaltered software is currently executing. Numerous remote attestation schemes have been proposed for various computer architectures, including Intel, RISC-V, and ARM. Remote attestation is usually combined with public-key encryption so that the information sent can only be read by the programs that requested the attestation, and not by an eavesdropper. To take the song example again, the user's music player software could send the song to other machines, but only if they could attest that they were running an authorized copy of the music player software. Combined with the other technologies, this provides a more restricted path for the music: encrypted I/O prevents the user from recording it as it is transmitted to the audio subsystem, memory locking prevents it from being dumped to regular disk files as it is being worked on, sealed storage curtails unauthorized access to it when saved to the hard drive, and remote attestation prevents unauthorized software from accessing the song even when it is used on other computers. To preserve the privacy of attestation responders, Direct Anonymous Attestation has been proposed as a solution, which uses a group signature scheme to prevent revealing the identity of individual signers. Proof of space (PoS) have been proposed to be used for malware detection, by determining whether the L1 cache of a processor is empty (e.g., has enough space to evaluate the PoSpace routine without cache misses) or contains a routine that resisted being evicted. === Trusted third party === == Known applications == The Microsoft products Windows Vista, Windows 7, Windows 8 and Windows RT make use of a Trusted Platform Module to facilitate BitLocker Drive Encryption. Other known applications with runtime encryption and the use of secure enclaves include the Signal messenger and the e-prescription service ("E-Rezept") by the German government. == Possible applications == === Digital rights management === Trusted Computing would allow companies to create a digital rights management (DRM) system which would be very hard to circumvent, though not impossible. An example is downloading a music file. Sealed storage could be used to prevent the user from opening the file with an unauthorized player or computer. Remote attestation could be used to authorize play only by music players that enforce the record company's rules. The music would be played from curtained memory, which would prevent the user from making an unrestricted copy of the file while it is playing, and secure I/O would prevent capturing what is being sent to the sound system. Circumventing such a system would require either manipulation of the computer's hardware, capturing the analogue (and thus degraded) signal using a recording device or a microphone, or breaking the security of the system. New business models for use of software (services) over Internet may be boosted by the technology. By strengthening the DRM system, one could base a business model on renting programs for a specific time periods or "pay as you go" models. For instance, one could download a music file which could only be played a certain number of times before it becomes unusable, or the music file could be used only within a certain time period. === Preventing cheating in online games === Trusted Computing could be used to combat cheating in online games. Some players modify their game copy in order to gain unfair advantages in the game; remote attestation, secure I/O and memory curtaining could be used to determine that all players connected to a server were running an unmodified copy of the software. === Verification of remote computation for grid computing === Trusted Computing could be used to guarantee participants in a grid computing sys

    Read more →
  • Neural style transfer

    Neural style transfer

    Neural style transfer (NST) software algorithms are able to manipulate digital images, or videos, in order to adopt the appearance or visual style of another image. NST algorithms are characterized by their use of deep neural networks for the sake of image transformation. Common uses for NST are the creation of artificial artwork from photographs, for example by transferring the appearance of famous paintings to user-supplied photographs. Several notable mobile apps use NST techniques for this purpose, including DeepArt and Prisma. This method has been used by artists and designers around the globe to develop new artwork based on existent style(s). == History == NST is an example of image stylization, a problem studied for over two decades within the field of non-photorealistic rendering. The first two example-based style transfer algorithms were image analogies and image quilting. Both of these methods were based on patch-based texture synthesis algorithms. Given a training pair of images–a photo and an artwork depicting that photo–a transformation could be learned and then applied to create new artwork from a new photo, by analogy. If no training photo was available, it would need to be produced by processing the input artwork; image quilting did not require this processing step, though it was demonstrated on only one style. NST was first published in the paper "A Neural Algorithm of Artistic Style" by Leon Gatys et al., originally released to ArXiv 2015, and subsequently accepted by the peer-reviewed CVPR conference in 2016. The original paper used a VGG-19 architecture that has been pre-trained to perform object recognition using the ImageNet dataset. In 2017, Google AI introduced a method that allows a single deep convolutional style transfer network to learn multiple styles at the same time. This algorithm permits style interpolation in real-time, even when done on video media. == Mathematics == This section closely follows the original paper. === Overview === The idea of Neural Style Transfer (NST) is to take two images—a content image p → {\displaystyle {\vec {p}}} and a style image a → {\displaystyle {\vec {a}}} —and generate a third image x → {\displaystyle {\vec {x}}} that minimizes a weighted combination of two loss functions: a content loss L content ( p → , x → ) {\displaystyle {\mathcal {L}}_{\text{content }}({\vec {p}},{\vec {x}})} and a style loss L style ( a → , x → ) {\displaystyle {\mathcal {L}}_{\text{style }}({\vec {a}},{\vec {x}})} . The total loss is a linear sum of the two: L NST ( p → , a → , x → ) = α L content ( p → , x → ) + β L style ( a → , x → ) {\displaystyle {\mathcal {L}}_{\text{NST}}({\vec {p}},{\vec {a}},{\vec {x}})=\alpha {\mathcal {L}}_{\text{content}}({\vec {p}},{\vec {x}})+\beta {\mathcal {L}}_{\text{style}}({\vec {a}},{\vec {x}})} By jointly minimizing the content and style losses, NST generates an image that blends the content of the content image with the style of the style image. Both the content loss and the style loss measures the similarity of two images. The content similarity is the weighted sum of squared-differences between the neural activations of a single convolutional neural network (CNN) on two images. The style similarity is the weighted sum of Gram matrices within each layer (see below for details). The original paper used a VGG-19 CNN, but the method works for any CNN. === Symbols === Let x → {\textstyle {\vec {x}}} be an image input to a CNN. Let F l ∈ R N l × M l {\textstyle F^{l}\in \mathbb {R} ^{N_{l}\times M_{l}}} be the matrix of filter responses in layer l {\textstyle l} to the image x → {\textstyle {\vec {x}}} , where: N l {\textstyle N_{l}} is the number of filters in layer l {\textstyle l} ; M l {\textstyle M_{l}} is the height times the width (i.e. number of pixels) of each filter in layer l {\textstyle l} ; F i j l ( x → ) {\textstyle F_{ij}^{l}({\vec {x}})} is the activation of the i th {\textstyle i^{\text{th}}} filter at position j {\textstyle j} in layer l {\textstyle l} . A given input image x → {\textstyle {\vec {x}}} is encoded in each layer of the CNN by the filter responses to that image, with higher layers encoding more global features, but losing details on local features. === Content loss === Let p → {\textstyle {\vec {p}}} be an original image. Let x → {\textstyle {\vec {x}}} be an image that is generated to match the content of p → {\textstyle {\vec {p}}} . Let P l {\textstyle P^{l}} be the matrix of filter responses in layer l {\textstyle l} to the image p → {\textstyle {\vec {p}}} . The content loss is defined as the squared-error loss between the feature representations of the generated image and the content image at a chosen layer l {\displaystyle l} of a CNN: L content ( p → , x → , l ) = 1 2 ∑ i , j ( A i j l ( x → ) − A i j l ( p → ) ) 2 {\displaystyle {\mathcal {L}}_{\text{content }}({\vec {p}},{\vec {x}},l)={\frac {1}{2}}\sum _{i,j}\left(A_{ij}^{l}({\vec {x}})-A_{ij}^{l}({\vec {p}})\right)^{2}} where A i j l ( x → ) {\displaystyle A_{ij}^{l}({\vec {x}})} and A i j l ( p → ) {\displaystyle A_{ij}^{l}({\vec {p}})} are the activations of the i th {\displaystyle i^{\text{th}}} filter at position j {\displaystyle j} in layer l {\displaystyle l} for the generated and content images, respectively. Minimizing this loss encourages the generated image to have similar content to the content image, as captured by the feature activations in the chosen layer. The total content loss is a linear sum of the content losses of each layer: L content ( p → , x → ) = ∑ l v l L content ( p → , x → , l ) {\displaystyle {\mathcal {L}}_{\text{content }}({\vec {p}},{\vec {x}})=\sum _{l}v_{l}{\mathcal {L}}_{\text{content }}({\vec {p}},{\vec {x}},l)} , where the v l {\displaystyle v_{l}} are positive real numbers chosen as hyperparameters. === Style loss === The style loss is based on the Gram matrices of the generated and style images, which capture the correlations between different filter responses at different layers of the CNN: L style ( a → , x → ) = ∑ l = 0 L w l E l , {\displaystyle {\mathcal {L}}_{\text{style }}({\vec {a}},{\vec {x}})=\sum _{l=0}^{L}w_{l}E_{l},} where E l = 1 4 N l 2 M l 2 ∑ i , j ( G i j l ( x → ) − G i j l ( a → ) ) 2 . {\displaystyle E_{l}={\frac {1}{4N_{l}^{2}M_{l}^{2}}}\sum _{i,j}\left(G_{ij}^{l}({\vec {x}})-G_{ij}^{l}({\vec {a}})\right)^{2}.} Here, G i j l ( x → ) {\displaystyle G_{ij}^{l}({\vec {x}})} and G i j l ( a → ) {\displaystyle G_{ij}^{l}({\vec {a}})} are the entries of the Gram matrices for the generated and style images at layer l {\displaystyle l} . Explicitly, G i j l ( x → ) = ∑ k F i k l ( x → ) F j k l ( x → ) {\displaystyle G_{ij}^{l}({\vec {x}})=\sum _{k}F_{ik}^{l}({\vec {x}})F_{jk}^{l}({\vec {x}})} Minimizing this loss encourages the generated image to have similar style characteristics to the style image, as captured by the correlations between feature responses in each layer. The idea is that activation pattern correlations between filters in a single layer captures the "style" on the order of the receptive fields at that layer. Similarly to the previous case, the w l {\displaystyle w_{l}} are positive real numbers chosen as hyperparameters. === Hyperparameters === In the original paper, they used a particular choice of hyperparameters. The style loss is computed by w l = 0.2 {\displaystyle w_{l}=0.2} for the outputs of layers conv1_1, conv2_1, conv3_1, conv4_1, conv5_1 in the VGG-19 network, and zero otherwise. The content loss is computed by w l = 1 {\displaystyle w_{l}=1} for conv4_2, and zero otherwise. The ratio α / β ∈ [ 5 , 50 ] × 10 − 4 {\displaystyle \alpha /\beta \in [5,50]\times 10^{-4}} . === Training === Image x → {\displaystyle {\vec {x}}} is initially approximated by adding a small amount of white noise to input image p → {\displaystyle {\vec {p}}} and feeding it through the CNN. Then we successively backpropagate this loss through the network with the CNN weights fixed in order to update the pixels of x → {\displaystyle {\vec {x}}} . After several thousand epochs of training, an x → {\displaystyle {\vec {x}}} (hopefully) emerges that matches the style of a → {\displaystyle {\vec {a}}} and the content of p → {\displaystyle {\vec {p}}} . As of 2017, when implemented on a GPU, it takes a few minutes to converge. == Extensions == In some practical implementations, it is noted that the resulting image has too much high-frequency artifact, which can be suppressed by adding the total variation to the total loss. Compared to VGGNet, AlexNet does not work well for neural style transfer. NST has also been extended to videos. Subsequent work improved the speed of NST for images by using special-purpose normalizations. In a paper by Fei-Fei Li et al. adopted a different regularized loss metric and accelerated method for training to produce results in real-time (three orders of magnitude faster than Gatys). Their idea was to use not the pixel-based loss defined above but rather a 'perceptual loss' measuring t

    Read more →
  • Private message

    Private message

    In computer networking, a private message (PM), or direct message (DM), refers to a private communication, often text-based, sent or received by a user of a private communication channel on any given platform. Unlike public posts, PMs are only viewable by the participants. Long a function present on IRCs and Internet forums, private channels for PMs have also been prevalent features on instant messaging (IM) and on social media networks. It may be either synchronous (e.g. on an IM) or asynchronous (e.g. on an Internet forum). The term private message (PM) originated as a feature on internet forums, while the term direct message (DM) originated as a feature on Twitter. Due to the popularity of the latter service, DM has since been appropriated by other platforms, such as Instagram, and is often genericized in popular usage. == Overview == There are two main types of private messages, and one obscure type: One type includes those found on IRCs and Internet forums, as well as on social media services like Twitter, Facebook, and Instagram, where the focus is public posting, PMs allow users to communicate privately without leaving the platform. The second type are those relayed through instant messaging platforms such as WhatsApp and Snapchat, where users join the networks primarily to exchange PMs. A third type, peer-to-peer messaging, occurs when users create and own the infrastructure used to transmit and store the messages; while features vary depending on application, they give the user full control over the data they transmit. An example of software that enables this kind of messaging is Classified-ads. Besides serving as a tool to connect privately with friends and family, PMs have gained momentum in the workplace. Working professionals use PMs to reach coworkers in other spaces and increase efficiency during meetings. Although useful, using PMs in the workplace may blur the boundary between work and private lives. Some common forms of private messaging today include Facebook messaging (sometimes referred to as "inboxing"), Twitter direct messaging, and Instagram direct messaging. These forms of private messaging provide a private space on a usually public site. For instance, most activity on Twitter is public, but Twitter DMs provide a private space for communication between two users. This differs from mediums like email, texting, and Snapchat, where most or all activity is always private. Modern forms of private messaging may include multimedia messages, such as pictures or videos. == History == Email was first developed to send messages between different computers on ARPANET in 1971. Access to ARPANET was primarily limited to universities and other research institutions. Starting in 1983 or 1984, FidoNet allowed home computer users to send and receive email via bulletin board systems. Information services such as CompuServe, America Online, and Prodigy also helped to popularizes online messaging. The advent of the public World Wide Web in 1993 increased access to email via internet service providers, and later via webmail. Instant messaging systems became popular in the mid 1990s, as Internet access improved and personal computers became more common. The introduction of Skype in 2003 popularized Internet-based voice and video messaging. Direct messaging is now a feature of all major social networking services. == Privacy concerns == In January 2014, Matthew Campbell and Michael Hurley filed a class-action lawsuit against Facebook for breaching the Electronic Communications Privacy Act. They alleged that private messages which contained URLs were being read and used to generate profit, through data mining and user profiling, and that it was misleading for Facebook to refer to the functionality as "private" with the implication that the communication was "free from surveillance". In 2012, some Facebook users misinterpreted a redesign of the Facebook wall as publicly sharing private messages from 2008–2009. These were found to be public wall posts from those years, made at a time when it was not possible to like or comment on a wall post, making the notes look like private messages.

    Read more →
  • Cryptographic nonce

    Cryptographic nonce

    In cryptography, a nonce is an arbitrary number that can be used just once in a cryptographic communication. It is often a random or pseudo-random number issued in an authentication protocol to ensure that each communication session is unique, and therefore that old communications cannot be reused in replay attacks. Nonces can also be useful as initialization vectors and in cryptographic hash functions. == Definition == A nonce is an arbitrary number used only once in a cryptographic communication, in the spirit of a nonce word. They are often random or pseudo-random numbers. Many nonces also include a timestamp to ensure exact timeliness, though this requires clock synchronisation between organisations. The addition of a client nonce ("cnonce") helps to improve the security in some ways as implemented in digest access authentication. To ensure that a nonce is used only once, it should be time-variant (including a suitably fine-grained timestamp in its value), or generated with enough random bits to ensure an insignificantly low chance of repeating a previously generated value. Some authors define pseudo-randomness (or unpredictability) as a requirement for a nonce. Nonce is a word dating back to Middle English for something only used once or temporarily (often with the construction "for the nonce"). It descends from the construction "then anes" ("the one [purpose]"). A false etymology claiming it to stand for "number used once" or similar is incorrect. == Usage == === Authentication === Authentication protocols may use nonces to ensure that old communications cannot be reused in replay attacks. For instance, nonces are used in HTTP digest access authentication to calculate an MD5 digest of the password. The nonces are different each time the 401 authentication challenge response code is presented, thus making replay attacks virtually impossible. The scenario of ordering products over the Internet can provide an example of the usefulness of nonces in replay attacks. An attacker could take the encrypted information and—without needing to decrypt—could continue to send a particular order to the supplier, thereby ordering products over and over again under the same name and purchase information. The nonce is used to give 'originality' to a given message so that if the company receives any other orders from the same person with the same nonce, it will discard those as invalid orders. A nonce may be used to ensure security for a stream cipher. Where the same key is used for more than one message and then a different nonce is used to ensure that the keystream is different for different messages encrypted with that key; often the message number is used. Secret nonce values are used by the Lamport signature scheme as a signer-side secret which can be selectively revealed for comparison to public hashes for signature creation and verification. === Hashing === Nonces are used in proof-of-work systems to vary the input to a cryptographic hash function so as to obtain a hash for a certain input that fulfils certain arbitrary conditions. In doing so, it becomes far more difficult to create a "desirable" hash than to verify it, shifting the burden of work onto one side of a transaction or system. For example, proof of work, using hash functions, was considered as a means to combat email spam by forcing email senders to find a hash value for the email (which included a timestamp to prevent pre-computation of useful hashes for later use) that had an arbitrary number of leading zeroes, by hashing the same input with a large number of values until a "desirable" hash was obtained. Similarly, the Bitcoin blockchain hashing algorithm can be tuned to an arbitrary difficulty by changing the required minimum/maximum value of the hash so that the number of bitcoins awarded for new blocks does not increase linearly with increased network computation power as new users join. This is likewise achieved by forcing Bitcoin miners to add nonce values to the value being hashed to change the hash algorithm output. As cryptographic hash algorithms cannot easily be predicted based on their inputs, this makes the act of blockchain hashing and the possibility of being awarded bitcoins something of a lottery, where the first "miner" to find a nonce that delivers a desirable hash is awarded bitcoins.

    Read more →
  • Social bot

    Social bot

    A social bot, refers to fully or partially automated social media accounts designed to perform most regular users’ actions, such as liking, posting content, and chatting with other users. Although their levels of autonomy vary, and often include a human-in-the-loop, social bots can use artificial intelligence to perform social media actions and can use large language models to mimic human dialogue. Social bots can operate alone or in groups that coordinate messaging as part of a network of coordinated inauthentic behavior. Social bots are often used to perform ad fraud by artificially boosting viewership and engagement metrics and to spread disinformation on social media. == Uses == Social bots are used for a large number of purposes on a variety of social media platforms, including Twitter, Instagram, Facebook, and YouTube. One common use of social bots is to inflate a social media user's apparent popularity, usually by artificially manipulating their engagement metrics with large volumes of fake likes, reposts, or replies. Social bots can similarly be used to artificially inflate a user's follower count with fake followers, creating a false perception of a larger and more influential online following than is the case. The use of social bots to create the impression of a large social media influence allows individuals, brands, and organizations to attract a higher number of human followers and boost their online presence. Fake engagement can be bought and sold in the black market of social media engagement. Corporations typically use automated customer service agents on social media to affordably manage high levels of support requests. Social bots are used to send automated responses to users’ questions, sometimes prompting the user to private message the support account with additional information. The increased use of automated support bots and virtual assistants has led to some companies laying off customer-service staff. Social bots are also often used to influence public opinion. Autonomous bot accounts can flood social media with large numbers of posts expressing support for certain products, companies, or political campaigns, creating the impression of organic grassroots support. This can create a false perception of the number of people who support a certain position, which may also have effects on the direction of stock prices or on elections. Messages with similar content can also influence fads or trends. Many social bots are also used to amplify phishing attacks. These malicious bots are used to trick a social media user into giving up their passwords or other personal data. This is usually accomplished by posting links claiming to direct users to news articles that would in actuality direct to malicious websites containing malware. Scammers often use URL shortening services such as TinyURL and bit.ly to disguise a link's domain address, increasing the likelihood of a user clicking the malicious link. The presence of fake social media followers and high levels of engagement help convince the victim that the scammer is in fact a trusted user. Social bots can be a tool for computational propaganda. Bots can also be used for algorithmic curation, algorithmic radicalization, and/or influence-for-hire, a term that refers to the selling of an account on social media platforms. == History == Bots have coexisted with computer technology since the earliest days of computing. Social bots have their roots in the 1950s with Alan Turing, whose work focused on machine intelligence with the development of the Turing Test. The following decades saw further progress made towards the goal of creating programs capable of mimicking human behavior, notably with Joseph Weizenbaum’s creation of ELIZA. Considered to be one of the first Chatbots, ELIZA could simulate natural conversations with human users through pattern matching. Its most famous script was DOCTOR, a simulation of a Rogerian psychotherapist that was programmed to chat with patients and respond to questions. With the growth of social media platforms in the early 2000s, these bots could be used to interact with much larger user groups in an inconspicuous manner. Early instances of autonomous agents on social media could be found on sites like MySpace, with social bots being used by marketing firms to inflate activity on a user’s page in an effort to make them appear more popular. Social bots have been observed on a large variety of social media websites, with Twitter being one of the most widely observed examples. The creation of Twitter bots is generally against the site’s terms of service when used to post spam or to automatically like and follow other users, but some degree of automation using Twitter’s API may be permitted if used for “entertainment, informational, or novelty purposes.” Other platforms such as Reddit and Discord also allow for the use of social bots as long as they are not used to violate policies regarding harmful content and abusive behavior. Social media platforms have developed their own automated tools to filter out messages that come from bots, although they cannot detect all bot messages. == Legal regulation == Due to the difficulty of recognizing social bots and separating them from "eligible" automation via social media APIs, it is unclear how legal regulation can be enforced. Social bots are expected to play a role in shaping public opinion by autonomously acting as influencers. Some social bots have been used to rapidly spread misinformation, manipulate stock markets, influence opinion on companies and brands, promote political campaigns, and engage in malicious phishing campaigns. In the United States, some states have started to implement legislation in an attempt to regulate the use of social bots. In 2019, California passed the Bolstering Online Transparency Act (the B.O.T. Act) to make it unlawful to use automated software to appear indistinguishable from humans for the purpose of influencing a social media user's purchasing and voting decisions. Other states such as Utah and Colorado have passed similar bills to restrict the use of social bots. The Artificial Intelligence Act (AI Act) in the European Union is the first comprehensive law governing the use of Artificial Intelligence. The law requires transparency in AI to prevent users from being tricked into believing they are communicating with another human. AI-generated content on social media must be clearly marked as such, preventing social bots from using AI in a manner that mimics human behavior. == Detection == The first generation of bots could sometimes be distinguished from real users by their often superhuman capacities to post messages. Later developments have succeeded in imprinting more "human" activity and behavioral patterns in the agent. With enough bots, it might be even possible to achieve artificial social proof. To unambiguously detect social bots as what they are, a variety of criteria must be applied together using pattern detection techniques, some of which are: cartoon figures as user pictures sometimes also random real user pictures are captured (identity fraud) reposting rate temporal patterns sentiment expression followers-to-friends ratio length of user names variability in (re)posted messages engagement rate (like/followers rate) analysis of the time series of social media posts Social bots are always becoming increasingly difficult to detect and understand. The bots' human-like behavior, ever-changing behavior of the bots, and the sheer volume of bots covering every platform may have been a factor in the challenges of removing them. Social media sites, like Twitter, are among the most affected, with CNBC reporting up to 48 million of the 319 million users (roughly 15%) were bots in 2017. Botometer (formerly BotOrNot) is a public Web service that checks the activity of a Twitter account and gives it a score based on how likely the account is to be a bot. The system leverages over a thousand features. An active method for detecting early spam bots was to set up honeypot accounts that post nonsensical content, which may get reposted (retweeted) by the bots. However, bots evolve quickly, and detection methods have to be updated constantly, because otherwise they may get useless after a few years. One method is the use of Benford's Law for predicting the frequency distribution of significant leading digits to detect malicious bots online. This study was first introduced at the University of Pretoria in 2020. Another method is artificial-intelligence-driven detection. Some of the sub-categories of this type of detection would be active learning loop flow, feature engineering, unsupervised learning, supervised learning, and correlation discovery. Some operations of bots work together in a synchronized way. For example, ISIS used Twitter to amplify its Islamic content by numerous orchestrated accounts which further pushed an item to the Hot List news, thus further a

    Read more →
  • The Most Dangerous Writing App

    The Most Dangerous Writing App

    The Most Dangerous Writing App is a web application for free writing that combats writer's block by deleting all progress if the user stops typing for five seconds. It is targeted at creative writers who want to write first drafts without worrying about editing or formatting. == Features == The app is designed to "shut down your inner editor and get you into a state of flow", referring to the psychological concept of being in a flow state. Users start a writing session by choosing a time or word limit, and can only save or download their work if they complete the set limit without interruption. An optional "hardcore mode" blurs out everything the user has written so far, making it impossible to edit before finishing the writing session. == History == The Most Dangerous Writing App was created by software engineer Manuel Ebert and was released as free, open source software on February 29, 2016. It was reviewed by Wired, Forbes, Vogue, Huffington Post, The Verge, The Next Web, and others. It has been used in free writing contests and is recommended by NaNoWriMo. In April 2019, The Most Dangerous Writing App was acquired by Squibler, but the original version remains freely accessible.

    Read more →
  • HKDF

    HKDF

    HKDF is a multi-purpose key derivation function (KDF) based on the HMAC message authentication code. HKDF follows "extract-then-expand" paradigm, where the KDF logically consists of two modules: the first stage takes the input keying material and "extracts" from it a fixed-length pseudorandom key, and then the second stage "expands" this key into several additional, independent pseudorandom keys as the output of the KDF. == Mechanism == HKDF is the composition of two functions, HKDF-Extract and HKDF-Expand: HKDF(salt, IKM, info, length) = HKDF-Expand(HKDF-Extract(salt, IKM), info, length) === HKDF-Extract === HKDF-Extract (XTR) takes "input key material" or "source key material" (IKM or SKM) such as a shared secret generated using Diffie-Hellman; an optional, non-secret, random or pseudorandom salt (r); and generates a cryptographic key called the PRK ("pseudorandom key"). HKDF-Extract acts as a "randomness extractor", specifically a "computational extractor", taking a potentially non-uniform value of sufficient min-entropy and generating a value indistinguishable from a uniform random value (pseudorandom). Computational extractors assume attackers are computationally bounded and source entropy may only exist in a computational sense. Such extractors can be built using cryptographic functions under suitable assumptions, modeled as universal hash function (in the generic case) or a random oracle (in constrained scenarios like sources with weak entropy). Salt (r) acts as a "source-independent extractor", strengthening HKDF's security guarantees. Using a fixed public r is safe for multiple invocations of HKDF (on "independent" but secret IKMs which may or may not be derived from the same source), provided r isn't chosen or manipulated by an attacker. Ideally, r is a random string of hash function's output length. Even low quality r (weak entropy or shorter length) is recommended as they contribute "significantly" to the security of the OKM. Without or with a low-entropy, non-secret r, if an attacker can influence the IKMs source in a way that specifically exploits HKDF-Extract's underlying hash function (finding a collision or a specific bias), XTR provides no protection. A random r, even if fixed by the application (for example, random number generators using r as seed), would strengthen protections for that specific extractor session. In such a setting, sufficiently long IKMs also provide better entropy extraction. However, allowing the attacker to influence enough of the IKM after seeing r may result in a completely insecure KDF. HKDF-Extract is the result of HMAC with r as the key (all zeros up to length of the underlying extractor hash function, if not provided) and the IKM as the message. The underlying hash function used for HKDF-Extract step may be different to the one used by HKDF-Expand. It is recommended that HKDF-Extract uses strongest hash function available to the application, as it "concentrates" the entropy already present in IKM but may not necessarily "add" to it. Truncated output from a stronger underlying hash function for XTR (for example, SHA512/256) offers stronger extraction properties. The attacker is assumed to have partial knowledge about IKM (publicly known values in the case of Diffie-Hellman) or partial control over it (entropy pools). HKDF-Extract may be skipped if the IKM is itself a cryptographically strong key (and hence can assume the role of PRK), though it is recommended that HKDF-Extract be applied for the sake of compatibility with the general case, especially if r is available to the application. === HKDF-Expand === HKDF-Expand (PRF) takes the PRK (or any random key-derivation key if HKDF-Extract step is skipped), optional info (CTXinfo), and a length (L), to generate output key material (OKM) of length L. Multiple OKMs can be generated from a single PRK by using different values for CTXinfo, which must be "independent" of the IKM passed in HKDF-Extract. Even if an attacker, who knows r and some auxillary information about the secret IKM, can force the use of the same IKM (and PRK, by extension), in two or more HKDF-Expand contexts (represented by CTXinfo), the OKMs output are computationally independent (leak no useful information on each other). HKDF-Expand, acting as a variable-output-length pseudorandom function (PRF) keyed on PRK, calls HMAC on CTXinfo as the message (empty string, if unspecified) appended to a 8-bit counter i initialized to 1. Subsequent calls to HMAC are chained in "feedback mode" by prepending the previous HMAC output to CTXinfo and incrementing i. OKM is a function of the output size (k bits) of HMAC's underlying hash function; i.e., SHA-256 outputs OKM in segments of k=256 bits for up to a maximum of length i × k bits (255 × 256 bits = 8160 bytes) truncated to desired length L. HKDF-Expand may be skipped if PRK is at least desired length L, though it is recommended that HKDF-Expand be applied for additional "smoothing" of the OKM. == Standardization == HKDF was proposed as a building block in various protocols and applications, as well as to discourage the proliferation of multiple KDF mechanisms by its authors. It is formally described in RFC 5869 with detailed analysis in a paper published in 2010. NIST SP800-56Cr2 specifies a parameterizable extract-then-expand scheme, noting that RFC 5869 HKDF is a version of it and citing its paper for the rationale for the recommendations' extract-and-expand mechanisms. == Applications == HKDF is used in the Signal Protocol for end-to-end encrypted messaging where it generates the message keys, in conjunction with the triple Elliptic-curve Diffie-Hellman handshake (X3DH) key agreement protocol. Signal's "Secure Value Recovery" and "Sealed Sender" are based on HKDF. HKDF is a main component in the Noise Protocol Framework, Message Layer Security, and is used in widely deployed protocols like IPsec Internet Key Exchange and TLS 1.3. The "multi-purpose" nature of HKDF is meant to serve applications that require key extraction, key expansion, and key hierarchies in key wrapping, key exchange, PRNG, and password-based key derivation schemes. == Implementations == There are implementations of HKDF for C#, Go, Java, JavaScript, Perl, PHP, Python, Ruby, Rust, and other programming languages. RFC6234 lays out a reference C implementation of HKDF based on the Secure Hash Standard. === Example in Python ===

    Read more →
  • BREACH

    BREACH

    BREACH (a backronym: Browser Reconnaissance and Exfiltration via Adaptive Compression of Hypertext) is a security vulnerability against HTTPS when using HTTP compression. BREACH is built based on the CRIME security exploit. BREACH was announced at the August 2013 Black Hat USA conference by security researchers Angelo Prado, Neal Harris and Yoel Gluck. == Details == While the CRIME attack was presented as a general attack that could work effectively against a large number of protocols, only exploits against SPDY request compression and TLS compression were demonstrated and largely mitigated in browsers and servers. The CRIME exploits against HTTP compression has not been mitigated at all, even though the authors of CRIME have warned that this vulnerability might be even more widespread than SPDY and TLS compression combined. BREACH is an instance of the CRIME attack against HTTP compression—the use of gzip or DEFLATE data compression algorithms via the content-encoding option within HTTP by many web browsers and servers. Given this compression oracle, the rest of the BREACH attack follows the same general lines as the CRIME exploit, by performing an initial blind brute-force search to guess a few bytes, followed by divide-and-conquer search to expand a correct guess to an arbitrarily large amount of content. == Mitigation == BREACH exploits the compression in the underlying HTTP protocol. Therefore, turning off TLS compression makes no difference to BREACH, which can still perform a chosen-plaintext attack against the HTTP payload. As a result, clients and servers are either forced to disable HTTP compression completely (thus reducing performance), or to adopt workarounds to try to foil BREACH in individual attack scenarios, such as using cross-site request forgery (CSRF) protection. Another suggested approach is to disable HTTP compression whenever the referrer header indicates a cross-site request, or when the header is not present. This approach allows effective mitigation of the attack without losing functionality, only incurring a performance penalty on affected requests. Another approach is to add padding at the TLS, HTTP header, or payload level. Around 2013–2014, there was an IETF draft proposal for a TLS extension for length-hiding padding that, in theory, could be used as a mitigation against this attack. It allows the actual length of the TLS payload to be disguised by the insertion of padding to round it up to a fixed set of lengths, or to randomize the external length, thereby decreasing the likelihood of detecting small changes in compression ratio that is the basis for the BREACH attack. However, this draft has since expired without further action. A very effective mitigation is HTB (Heal-the-BREACH) that adds random-sized padding to compressed data, providing some variance in the size of the output contents. This randomness delays BREACH from guessing the correct characters in the secret token by a factor of 500 (10-byte max) to 500,000 (100-byte max). HTB protects all websites and pages in the server with minimal CPU usage and minimal bandwidth increase.

    Read more →
  • Perfectly Imperfect (platform)

    Perfectly Imperfect (platform)

    Perfectly Imperfect is an online newsletter and social media platform. It was initially founded in 2020 as a biweekly email newsletter that focused on recommendations. In January 2024, Perfectly Imperfect launched PI.FYI, a social media platform. The platform is based around sharing recommendations. Its main feed is presented in reverse chronological order and is not algorithmically curated. == History == Perfectly Imperfect was started during the COVID-19 pandemic by Tyler Bainbridge, alongside college friends Alex Cushing and Serey Morm, whom he met at UMass Lowell; Morm later departed. Motivated by a dissatisfaction with algorithm-driven recommendation culture, they launched on Substack in September 2020. Its early newsletter format, PI, published brief recommendation lists and personal notes from contributors. Contributors have included a mix of underground artists and more established creative figures, such as Charli XCX, Chloe Cherry, Chloe Wise, and Meetka Otto. In October 2024, PI announced it was leaving Substack to launch its own site. == Overview == The current platform, PI.FYI, features both editorial content (guest columns, long-form essays, staff picks) and user-generated recommendations. The platform also supports "Ask" posts, where users can solicit recommendations from the community, and allows commenting, liking, and profile customization. In August 2025, it launched an events feature. In 2022, Perfectly Imperfect hosted their first offline event at Baby's All Right in Brooklyn, with a performance by The Dare. They have since expanded their event promotion/sponsorship to markets such as Los Angeles, San Francisco, and even Auckland.

    Read more →
  • VideoThang

    VideoThang

    VideoThang was free video editing software for Windows 2000, XP, and Vista. The software has three parts to it which are My Stuff, Edit My Stuff, and My Mix. The software accepts MOV, AVI, MPG, MP4, PNG, WMV, FLV, and MP3 standards. Its official website is now no longer available. == Reception == Jan Ozer, of Pcmag, said that the software "suffers from several unfortunate design and implementation flaws that dramatically limit output quality and overall utility." Jon L. Jacobi, of PC World, said that the software "may not be the most flexible multimedia editor in the world, but the trim/zoom basics are there, it's free, and it's so simple to use that just about anyone in the world should be able figure it out." Amit Agarwal, of Digital Inspiration, said that the software "doesn’t offer loads of features like other video editors but is perfect for making quick video slideshows of your pictures that you can upload on the web or share via email."

    Read more →
  • Codebook

    Codebook

    A codebook is a type of document used for gathering and storing cryptography codes. Originally, codebooks were often literally books, but today "codebook" is a byword for the complete record of a series of codes, regardless of physical format. == Cryptography == In cryptography, a codebook is a document used for implementing a code. A codebook contains a lookup table for coding and decoding; each word or phrase has one or more strings which replace it. To decipher messages written in code, corresponding copies of the codebook must be available at either end. The distribution and physical security of codebooks presents a special difficulty in the use of codes compared to the secret information used in ciphers, the key, which is typically much shorter. The United States National Security Agency documents sometimes use codebook to refer to block ciphers; compare their use of combiner-type algorithm to refer to stream ciphers. Codebooks come in two forms, one-part or two-part: In one-part codes, the plaintext words and phrases and the corresponding code words are in the same alphabetical order. They are organized similar to a standard dictionary. Such codes are half the size of two-part codes but are more vulnerable since an attacker who recovers some code word meanings can often infer the meaning of nearby code words. One-part codes may be used simply to shorten messages for transmission or have their security enhanced with superencryption methods, such as adding a secret number to numeric code words. In two-part codes, one part is for converting plaintext to ciphertext, the other for the opposite purpose. They are usually organized similarly to a language translation dictionary, with plaintext words (in the first part) and ciphertext words (in the second part) presented like dictionary headwords. The earliest known use of a codebook system was by Gabriele de Lavinde in 1379 working for the Antipope Clement VII. Two-part codebooks go back as least as far as Antoine Rossignol in the 1800s. From the 15th century until the middle of the 19th century, nomenclators (named after nomenclator) were the most used cryptographic method. Codebooks with superencryption were the most used cryptographic method of World War I. The JN-25 code used in World War II used a codebook of 30,000 code groups superencrypted with 30,000 random additives. The book used in a book cipher or the book used in a running key cipher can be any book shared by sender and receiver and is different from a cryptographic codebook. == Social sciences == In social sciences, a codebook is a document containing a list of the codes used in a set of data to refer to variables and their values, for example locations, occupations, or clinical diagnoses. == Data compression == Codebooks were also used in 19th- and 20th-century commercial codes for the non-cryptographic purpose of data compression. Codebooks are used in relation to precoding and beamforming in mobile networks such as 5G and LTE. The usage is standardized by 3GPP, for example in the document TS 38.331, NR; Radio Resource Control (RRC); Protocol specification.

    Read more →
  • Data set (IBM mainframe)

    Data set (IBM mainframe)

    In the context of IBM mainframe computers in the IBM System/360 line and its successors, a data set (IBM preferred) or dataset is a computer file having a record organization. Use of this term began with, e.g., DOS/360 and OS/360, and is still used by their successors, including the current VSE and z/OS. Documentation for these systems historically preferred this term rather than file. A data set is typically stored on a direct access storage device (DASD) or magnetic tape, however unit record devices, such as punch card readers, card punches, line printers and page printers can provide input/output (I/O) for a data set (file). Data sets are not unstructured streams of bytes, but rather are organized in various logical record and block structures determined by the DSORG (data set organization), RECFM (record format), and other parameters. These parameters are specified at the time of the data set allocation (creation), for example with Job Control Language DD statements. Within a running program they are stored in the Data Control Block (DCB) or Access Control Block (ACB), which are data structures used to access data sets using access methods. Records in a data set may be fixed, variable, or “undefined” length. == Data set organization == For OS/360, the DCB's DSORG parameter specifies how the data set is organized. It may be CQ Queued Telecommunications Access Method (QTAM) in Message Control Program (MCP) CX Communications line group DA Basic Direct Access Method (BDAM) GS Graphics device for Graphics Access Method(GAM) IS Indexed Sequential Access Method (ISAM) MQ QTAM message queue in application PO Partitioned Organization PS Physical Sequential among others. Data sets on tape may only be DSORG=PS. The choice of organization depends on how the data is to be accessed, and in particular, how it is to be updated. Programmers utilize various access methods (such as QSAM or VSAM) in programs for reading and writing data sets. Access method depends on the given data set organization. == Record format (RECFM) == Regardless of organization, the physical structure of each record is essentially the same, and is uniform throughout the data set. This is specified in the DCB RECFM parameter. RECFM=F means that the records are of fixed length, specified via the LRECL parameter. RECFM=V specifies a variable-length record. V records when stored on media are prefixed by a Record Descriptor Word (RDW) containing the integer length of the record in bytes and flag bits. With RECFM=FB and RECFM=VB, multiple logical records are grouped together into a single physical block on tape or DASD. FB and VB are fixed-blocked, and variable-blocked, respectively. RECFM=U (undefined) is also variable length, but the length of the record is determined by the length of the block rather than by a control field. The BLKSIZE parameter specifies the maximum length of the block. RECFM=FBS could be also specified, meaning fixed-blocked standard, meaning all the blocks except the last one were required to be in full BLKSIZE length. RECFM=VBS, or variable-blocked spanned, means a logical record could be spanned across two or more blocks, with flags in the RDW indicating whether a record segment is continued into the next block and/or was continued from the previous one. This mechanism eliminates the need for using any "delimiter" byte value to separate records. Thus data can be of any type, including binary integers, floating-point, or characters, without introducing a false end-of-record condition. The data set is an abstraction of a collection of records, in contrast to files as unstructured streams of bytes. == Partitioned data set == A partitioned data set (PDS) is a data set containing multiple members, each of which holds a separate sub-data set, similar to a directory in other types of file systems. This type of data set is often used to hold load modules (old format bound executable programs), source program libraries (especially Assembler macro definitions), ISPF screen definitions, and Job Control Language. A PDS may be compared to a Zip file or COM Structured Storage. A Partitioned Data Set can only be allocated on a single volume and have a maximum size of 65,535 tracks. Besides members, a PDS contains also a directory. Each member can be accessed indirectly via the directory structure. Once a member is located, the data stored in that member are handled in the same manner as a PS (sequential) data set. Whenever a member is deleted, the space it occupied is unusable for storing other data. Likewise, if a member is re-written, it is stored in a new spot at the back of the PDS and leaves wasted “dead” space in the middle. The only way to recover “dead” space is to perform file compression. Compression, which is done using the IEBCOPY utility, moves all members to the front of the data space and leaves free usable space at the back. (Note that in modern parlance, this kind of operation might be called defragmentation or garbage collection; data compression nowadays refers to a different, more complicated concept.) PDS files can only reside on DASD, not on magnetic tape, in order to use the directory structure to access individual members. Partitioned data sets are most often used for storing multiple job control language files, utility control statements, and executable modules. An improvement of this scheme is a Partitioned Data Set Extended (PDSE or PDS/E, sometimes just libraries) introduced with DFSMSdfp for MVS/XA and MVS/ESA systems. A PDS/E library can store program objects or other types of members, but not both. BPAM cannot process a PDS/E containing program objects. PDS/E structure is similar to PDS and is used to store the same types of data. However, PDS/E files have a better directory structure which does not require pre-allocation of directory blocks when the PDS/E is defined (and therefore does not run out of directory blocks if not enough were specified). Also, PDS/E automatically stores members in such a way that compression operation is not needed to reclaim "dead" space. PDS/E files can only reside on DASD in order to use the directory structure to access individual members. == Generation Data Group == A Generation Data Group (GDG) is a group of non-VSAM data sets that are successive generations of historically-related data stored on an IBM mainframe (running OS/360 and its successors or DOS/360 and its successors). A GDG is usually cataloged. An individual member of the GDG collection is called a "Generation Data Set." The latter may be identified by an absolute number, ACCTG.OURGDG(1234), or a relative number: (-1) for the previous generation, (0) for the current one, and (+1) the next generation. A GDG specifies how many generations of a data set are to be kept and at what age a generation will be deleted. Whenever a new generation is created, the system checks whether one or more obsolete generations are to be deleted. The purpose of GDGs is to automate archival, using the command language JCL, the data set name given is generic. When DSN appears, the GDG data set appears along with the history number, where (0) is the most recent version (-1), (-2), ... are previous generations (+1) a new generation (see DD) Another use of GDGs is to be able to address all generations simultaneously within a JCL script without having to know the number of currently available generations. To do this, you have to omit the parentheses and the generation number in the JCL when specifying the dataset. === GDG JCL & features === Generation Data Groups are defined using either the BLDG statement of the IEHPROGM utility or the DEFINE GENERATIONGROUP statement of the newer IDCAMS utility, which allows setting various parameters. LIMIT(10) would limit the number of generations limit to 10. SCRATCH FOR (91) would retain each member, up to the limited#generations, at least 91 days. IDCAMS can also delete (and optionally uncatalog) a GDG. ==== Example ==== Creation of a standard GDG for five safety scopes, each at least 35 days old: Delete a standard GDG:

    Read more →