AI Detector Eraser

AI Detector Eraser — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Capture the flag (cybersecurity)

    Capture the flag (cybersecurity)

    In computer security, Capture the Flag (CTF) is an exercise in which participants attempt to find text strings, called "flags", which are secretly hidden in purposefully vulnerable programs or websites. They can be used for both competitive or educational purposes. In two main variations of CTFs, participants either steal flags from other participants (attack/defense-style CTFs) or from organizers (jeopardy-style challenges). A mixed competition combines these two styles. Competitions can include hiding flags in hardware devices, they can be both online or in-person, and can be advanced or entry-level. The game is inspired by the traditional outdoor sport with the same name. CTFs are used as a tool for developing and refining cybersecurity skills, making them popular in both professional and academic settings. == Overview == Capture the Flag (CTF) is a cybersecurity competition that is used to test and develop computer security skills. It was first developed in 1996 at DEF CON, the largest cybersecurity conference in the United States which is hosted annually in Las Vegas, Nevada. The conference hosts a weekend of cybersecurity competitions, including their flagship CTF. Two popular CTF formats are jeopardy and attack-defense. Both formats test participant’s knowledge in cybersecurity, but differ in objective. In the Jeopardy format, participating teams must complete as many challenges of varying point values from a various categories such as cryptography, web exploitation, and reverse engineering. In the attack-defense format, competing teams must defend their vulnerable computer systems while attacking their opponent's systems. The exercise involves a diverse array of tasks, including exploitation and cracking passwords, but there is little evidence showing how these tasks translate into cybersecurity knowledge held by security experts. Recent research has shown that the Capture the Flag tasks mainly covered technical knowledge but lacked social topics like social engineering and awareness on cybersecurity. == Educational applications == CTFs have been shown to be an effective way to improve cybersecurity education through gamification. There are many examples of CTFs designed to teach cybersecurity skills to a wide variety of audiences, including PicoCTF, organized by the Carnegie Mellon CyLab, which is oriented towards high school students, and Arizona State University supported pwn.college. Beyond educational CTF events and resources, CTFs has been shown to be a highly effective way to instill cybersecurity concepts in the classroom. CTFs have been included in undergraduate computer science classes such as Introduction to Information Security at the National University of Singapore. CTFs are also popular in military academies. They are often included as part of the curriculum for cybersecurity courses, with the NSA organized Cyber Exercise culminating in a CTF competition between the US service academies and military colleges. == Competitions == Many CTF organizers register their competition with the CTFtime platform. This allows the tracking of the position of teams over time and across competitions. These include "Plaid Parliament of Pwning", "More Smoked Leet Chicken", "Dragon Sector", "dcua", "Eat, Sleep, Pwn, Repeat", "perfect blue", "organizers" and "Blue Water". Overall the "Plaid Parliament of Pwning" and "Dragon Sector" have both placed first worldwide the most with three times each. === Community competitions === Every year there are dozens of CTFs organized in a variety of formats. Many CTFs are associated with cybersecurity conferences such as DEF CON, various editions of SANS Institute's NetWars, HITCON, and BSides. The DEF CON CTF, an attack-defence CTF, is notable for being one of the oldest CTF competitions to exist, and has been variously referred to as the "World Series", "Superbowl", and "Olympics", of hacking by media outlets. The NYU Tandon hosted Cybersecurity Awareness Worldwide (CSAW) CTF is one of the largest open-entry competitions for students learning cybersecurity from around the world. In 2021, it hosted over 1200 teams during the qualification round. In addition to conference organized CTFs, many CTF clubs and teams organize CTF competitions. Many CTF clubs and teams are associated with universities, such as the CMU associated Plaid Parliament of Pwning, which hosts PlaidCTF, and the ASU associated Shellphish. Some community CTFs are online and open to all participants. The SANS Institute Holiday Hack Challenge and TryHackMe Advent of Cyber. === Government-supported competitions === Governmentally supported CTF competitions include the DARPA Cyber Grand Challenge and ENISA European Cybersecurity Challenge. In 2023, the US Space Force-sponsored Hack-a-Sat CTF competition included, for the first time, a live orbital satellite for participants to exploit. === Corporate-supported competitions === Corporations and other organizations sometimes use CTFs as a training or evaluation exercise, with benefits similar to those in educational settings. In addition to internal CTF exercises, some corporations such as Google and Tencent host publicly accessible CTF competitions. == In popular culture == In Mr. Robot, a qualification round for the DEF CON CTF competition is depicted in the season 3 opener "eps3.0_power-saver-mode.h". The logo for DEF CON can be seen in the background. In The Undeclared War, a CTF is depicted in the opening scene of the series as a recruitment exercise used by GCHQ. Go Go Squid!, a Chinese television series, is based around training for and competing in highly stylized CTF competitions .

    Read more →
  • List of cryptosystems

    List of cryptosystems

    A cryptosystem is a set of cryptographic algorithms that map ciphertexts and plaintexts to each other. == Private-key cryptosystems == Private-key cryptosystems use the same key for encryption and decryption. Caesar cipher Substitution cipher Enigma machine Data Encryption Standard Twofish Serpent Camellia Salsa20 ChaCha20 Blowfish CAST5 Kuznyechik RC4 3DES Skipjack Safer IDEA Advanced Encryption Standard, also known as AES and Rijndael. == Public-key cryptosystems == Public-key cryptosystems use a public key for encryption and a private key for decryption. Diffie–Hellman key exchange RSA encryption Rabin cryptosystem Schnorr signature ElGamal encryption Elliptic-curve cryptography Lattice-based cryptography McEliece cryptosystem Multivariate cryptography Isogeny-based cryptography

    Read more →
  • Hybrid cryptosystem

    Hybrid cryptosystem

    In cryptography, a hybrid cryptosystem is one which combines the convenience of a public-key cryptosystem with the efficiency of a symmetric-key cryptosystem. Public-key cryptosystems are convenient in that they do not require the sender and receiver to share a common secret in order to communicate securely. However, they often rely on complicated mathematical computations and are thus generally much more inefficient than comparable symmetric-key cryptosystems. In many applications, the high cost of encrypting long messages in a public-key cryptosystem can be prohibitive. This is addressed by hybrid systems by using a combination of both. A hybrid cryptosystem can be constructed using any two separate cryptosystems: a key encapsulation mechanism, which is a public-key cryptosystem a data encapsulation scheme, which is a symmetric-key cryptosystem The hybrid cryptosystem is itself a public-key system, whose public and private keys are the same as in the key encapsulation scheme. Note that for very long messages the bulk of the work in encryption/decryption is done by the more efficient symmetric-key scheme, while the inefficient public-key scheme is used only to encrypt/decrypt a short key value. == Implementations and standards == All practical implementations of public key cryptography today employ a hybrid system. Examples include the TLS protocol and the SSH protocol, that use a public-key mechanism for key exchange (such as Diffie-Hellman) and a symmetric-key mechanism for data encapsulation (such as AES). The OpenPGP file format and the PKCS#7 file format are other examples. Hybrid Public Key Encryption (HPKE, published as RFC 9180) is a modern standard for generic hybrid encryption. HPKE is used within multiple IETF protocols, including Messaging Layer Security (MLS), Oblivious DNS over HTTPS, Oblivious HTTP, Privacy Preserving Measurement, and TLS Encrypted Client Hello. Envelope encryption is an example of a usage of hybrid cryptosystems in cloud computing. In a cloud context, hybrid cryptosystems also enable centralized key management. == Example == To encrypt a message addressed to Alice in a hybrid cryptosystem, Bob does the following: Obtains Alice's public key. Generates a fresh symmetric key for the data encapsulation scheme. Encrypts the message under the data encapsulation scheme, using the symmetric key just generated. Encrypts the symmetric key under the key encapsulation scheme, using Alice's public key. Sends both of these ciphertexts to Alice. To decrypt this hybrid ciphertext, Alice does the following: Uses her private key to decrypt the symmetric key contained in the key encapsulation segment. Uses this symmetric key to decrypt the message contained in the data encapsulation segment. == Security == If both the key encapsulation and data encapsulation schemes in a hybrid cryptosystem are secure against adaptive chosen ciphertext attacks, then the hybrid scheme inherits that property as well. However, it is possible to construct a hybrid scheme secure against adaptive chosen ciphertext attacks even if the key encapsulation has a slightly weakened security definition (though the security of the data encapsulation must be slightly stronger). == Envelope encryption == Envelope encryption is term used for encrypting with a hybrid cryptosystem used by all major cloud service providers, often as part of a centralized key management system in cloud computing. Envelope encryption gives names to the keys used in hybrid encryption: Data Encryption Keys (abbreviated DEK, and used to encrypt data) and Key Encryption Keys (abbreviated KEK, and used to encrypt the DEKs). In a cloud environment, encryption with envelope encryption involves generating a DEK locally, encrypting one's data using the DEK, and then issuing a request to wrap (encrypt) the DEK with a KEK stored in a potentially more secure service. Then, this wrapped DEK and encrypted message constitute a ciphertext for the scheme. To decrypt a ciphertext, the wrapped DEK is unwrapped (decrypted) via a call to a service, and then the unwrapped DEK is used to decrypt the encrypted message. In addition to the normal advantages of a hybrid cryptosystem, using asymmetric encryption for the KEK in a cloud context provides easier key management and separation of roles, but can be slower. In cloud systems, such as Google Cloud Platform and Amazon Web Services, a key management system (KMS) can be available as a service. In some cases, the key management system will store keys in hardware security modules, which are hardware systems that protect keys with hardware features like intrusion resistance. This means that KEKs can also be more secure because they are stored on secure specialized hardware. Envelope encryption makes centralized key management easier because a centralized key management system only needs to store KEKs, which occupy less space, and requests to the KMS only involve sending wrapped and unwrapped DEKs, which use less bandwidth than transmitting entire messages. Since one KEK can be used to encrypt many DEKs, this also allows for less storage space to be used in the KMS. This also allows for centralized auditing and access control at one point of access.

    Read more →
  • Majal (organization)

    Majal (organization)

    Majal is a regional not-for-profit organization focused on "amplifying voices of dissent" throughout the Middle East and North Africa via digital media. Founded in Bahrain, the organization "creates platforms and web applications that promote freedom of expression and social justice." Majal, which relies on open source platforms, like WordPress and Ruby on Rails, was launched in 2006 by Esra'a Al Shafei as a simple group-blogging idea. However, it has changed course to focus on the development of unique applications and tools. == Objectives and means == Majal's content, in addition to its projects and applications, is free open source content to ensure right to access information for everyone. The organization uses a broad spectrum of social media tools, ranging from written blogs, podcasts, vlogs, comics, video animation and pictures to live broadcasting through radio. == Projects and applications == Majal runs various active projects that include Alliance for Kurdish Rights, The Muslim Network for Baháʼí Rights, a discussion tool for Arab LGBT youth and various Mobile apps. == Funding == Majal is funded through private donations and grants from non-governmental organizations, as well as any potential revenues earned through freelance development. Its primary funders are the Shuttleworth Foundation and the Omidyar Network. In 2008, Majal won the Berkman Award from the Berkman Klein Center for Internet & Society at Harvard University in the Human Rights/Global Advocacy category. This $10,000 award was Majal’s first source of funding. This award is presented to “people or institutions that have made a significant contribution to the Internet and its impact on society over the past decade.” In 2009, the March 18 Movement, a project of Majal, received the Think Social Award, which demonstrates how social media can be used to solve the world’s problems. Esra'a Al-Shafei was named a 2009 Echoing Green Fellow for Civil and Human Rights, a seed funding award for young entrepreneurs engaged in social change. Financially, the fellowship consists of a $60,000 stipend paid over two years. Most recently, MEY has received a grant from the Arab Fund for Arts and Culture for its Mideast Tunes website. == Awards == Winner of Human Rights Tulip 2014 Human Rights Tulip - Human rights - Government.nl Ashoka Changemakers Citizen Media competition in 2011 for their CrowdVoice project. Monaco Media Prize 2011 for Majal founder and director Esra'a Al Shafei in 2011. The BOBS Special Topic Human Rights award in 2011 for the Majal website Migrant Rights. ThinkSocial Award in 2009, as powerful model for how social media can be used to address global problems. Echoing Green, 2009 Fellowship. TEDGlobal 2009 Fellowship. Berkman Award for Internet Innovation from Berkman Klein Center for Internet & Society at Harvard Law School in 2008 for the outstanding contributions to the internet and its impact on society. The Global Journal selected Majal as one of the Top 100 NGOs in 2013. 2013-2014 Shuttleworth Foundation Fellowship. == Leadership == Majal team is led primarily by women. The organization was founded by Esra'a Al Shafei, a blogger from Bahrain in 2006. Ahmed Zidan of Egypt has served for over three years as the Editor-in-Chief of Majal Arabic, and is the co-founder of Ahwaa, and is also a podcaster. Other team members include Mona Kareem, Rima Kalush, Abir Ghattas, Namita Malhotra, and Vani Saraswathi. == 2011 Middle East and North Africa protests == Blogs and video played a role in the documentation of protests throughout the Middle East and North Africa during 2010-2011, also known as the Arab Spring. During this period, MEY's project, CrowdVoice (launched in 2010) helped curate and archive the large amounts of videos, images, and eye-witness reports being aggregated and crowdsourced from across the region. As a result, it had been censored temporarily in Yemen and is still censored in Bahrain. == Media coverage == Majal claims to have received various coverage from news agencies, TV satellite channels, radio stations, newspapers, magazines. For instance, Sky News, CNN, New York Times, BBC, The Guardian, NPR, Time, MTV political blog "Act", VH1, Daily Telegraph, Die Zeit, Frankfurter Rundschau FR-online, Toronto Star, TechCrunch, Rolling Stone Middle East, Abu Dhabi TV, Gulf News, Al-Hasnaa' magazine, ReadWriteWeb, Mashable, The Next Web, Radio Sawt Beirut International, Radio Farda among many others.

    Read more →
  • Alerts.in.ua

    Alerts.in.ua

    alerts.in.ua is an online service that visualizes information about air alerts and other threats on the map of Ukraine. == History == The idea of the site appeared in the first weeks of the 2022 Russian invasion of Ukraine, during the development of other projects related to alerting the population about alarms. So, on March 2, 2022, the "Lviv Siren" bot was created, which reported on air alarms in Lviv on Twitter. Later, the idea arose to monitor alarms all over Ukraine and display them on a map. However, the lack of a single official source reporting alarms made this task much more difficult. On March 15, 2022, the Ajax Systems company announced the creation of the official Telegram channel "Air Alarm". This channel receives signals from the "Air Alarm" application and instantly publishes messages about the start and end of alarms in different regions of Ukraine. This immediately solved the problem with the source of information and gave impetus to the further implementation of the project. On March 22, 2022, the first version of the "Air Alarm Map" website was published, located on the war.ukrzen.in.ua domain. The map quickly gained popularity in social networks. It, like several other similar projects, began to be widely distributed by the mass media: Suspilne, Novyi Kanal, UNIAN, DW, Fakty ICTV, Vikna TV, Ukrainian Radio, STB, Espresso, dev.ua, itc.ua and state bodies: Center for Countering Disinformation at the National Security and Defense Council of Ukraine, Verkhovna Rada of Ukraine, Khmelnytska OVA, etc. On April 8, 2022, the site moved to the alerts.in.ua domain, where it is still available today. On August 25, 2022, the service began monitoring local official channels in addition to the main "Air Alarm". On September 11, 2022, the English version of the site was published. On March 22, 2023, its own Android application was published. The project is actively developing and has its own community. == Description == The main part of the site is a map of Ukraine, on which the regions where an air alert or other threats have been declared are highlighted in real time. As of October 16, 2022, 5 types of threats are supported: Air alarm. The threat of artillery fire. The threat of street fighting. Chemical threat. Nuclear threat. Additionally, based on media reports, information is published about other dangerous events, such as explosions, demining, etc. On the site, you can view the history of announced alarms with links to sources. Alarm statistics for different time periods are also available. For developers, there is an API that allows you to develop your own services based on information about declared alarms. The site is available in Ukrainian, English, Polish and Japanese. == Use == The map is used by: To monitor the situation in the country and the region. To illustrate the alarms announced in the mass media: TSN, Ukrainian truth, Channel 24, Suspilne, RBC Ukraine, Gromadske, Glavkom. As a map of alarms in mobile applications, there is Alarm and AirAlert. As an API for its services, including alternative alarm maps, Telegram, Viber channels, Discord bots, IoT projects, etc. == Statistics == 89.5% of users use the map from a mobile phone, 10% from a PC and 1% from a tablet. Top 6 countries by visit: Ukraine, United States, Poland, Germany, Great Britain and Japan . == Alternative projects == eMap was created by the developer Vadym Klymenko. AlarmMap is an online from the Ukrainian office of Agroprep. The official map of air alarms was developed by Ajax Systems together with the developer Artem Lemeshev, Stfalcon with the support of the Ministry of Statistics.

    Read more →
  • Dynamic knowledge repository

    Dynamic knowledge repository

    The dynamic knowledge repository (DKR) is a concept developed by Douglas C. Engelbart as a primary strategic focus for allowing humans to address complex problems. He has proposed that a DKR will enable us to develop a collective IQ greater than any individual's IQ. References and discussion of Engelbart's DKR concept are available at the Doug Engelbart Institute. == Definition == A knowledge repository is a computerized system that systematically captures, organizes and categorizes an organization's knowledge. The repository can be searched and data can be quickly retrieved. The effective knowledge repositories include factual, conceptual, procedural and meta-cognitive techniques. The key features of knowledge repositories include communication forums. A knowledge repository can take many forms to "contain" the knowledge it holds. A customer database is a knowledge repository of customer information and insights – or electronic explicit knowledge. A Library is a knowledge repository of books – physical explicit knowledge. A community of experts is a knowledge repository of tacit knowledge or experience. The nature of the repository only changes to contain/manage the type of knowledge it holds. A repository (as opposed to an archive) is designed to get knowledge out. It should therefore have some rules of structure, classification, taxonomy, record management, etc., to facilitate user engagement.

    Read more →
  • Knapsack problem

    Knapsack problem

    The knapsack problem is the following problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine which items to include in the collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most valuable items. The problem often arises in resource allocation where the decision-makers have to choose from a set of non-divisible projects or tasks under a fixed budget or time constraint, respectively. The knapsack problem has been studied for more than a century, with early works dating back to 1897. The subset sum problem is a special case of the decision and 0-1 problems where for each kind of item, the weight equals the value: w i = v i {\displaystyle w_{i}=v_{i}} . In the field of cryptography, the term knapsack problem is often used to refer specifically to the subset sum problem. The subset sum problem is one of Karp's 21 NP-complete problems. == Applications == Knapsack problems appear in real-world decision-making processes in a wide variety of fields, such as finding the least wasteful way to cut raw materials, selection of investments and portfolios, selection of assets for asset-backed securitization, and generating keys for the Merkle–Hellman and other knapsack cryptosystems. One early application of knapsack algorithms was in the construction and scoring of tests in which the test-takers have a choice as to which questions they answer. For small examples, it is a fairly simple process to provide the test-takers with such a choice. For example, if an exam contains 12 questions each worth 10 points, the test-taker need only answer 10 questions to achieve a maximum possible score of 100 points. However, on tests with a heterogeneous distribution of point values, it is more difficult to provide choices. Feuerman and Weiss proposed a system in which students are given a heterogeneous test with a total of 125 possible points. The students are asked to answer all of the questions to the best of their abilities. Of the possible subsets of problems whose total point values add up to 100, a knapsack algorithm would determine which subset gives each student the highest possible score. A 1999 study of the Stony Brook University Algorithm Repository showed that, out of 75 algorithmic problems related to the field of combinatorial algorithms and algorithm engineering, the knapsack problem was the 19th most popular and the third most needed after suffix trees and the bin packing problem. == Definition == The most common problem being solved is the 0-1 knapsack problem, which restricts the number x i {\displaystyle x_{i}} of copies of each kind of item to zero or one. Given a set of n {\displaystyle n} items numbered from 1 up to n {\displaystyle n} , each with a weight w i {\displaystyle w_{i}} and a value v i {\displaystyle v_{i}} , along with a maximum weight capacity W {\displaystyle W} , maximize ∑ i = 1 n v i x i {\displaystyle \sum _{i=1}^{n}v_{i}x_{i}} subject to ∑ i = 1 n w i x i ≤ W {\displaystyle \sum _{i=1}^{n}w_{i}x_{i}\leq W} and x i ∈ { 0 , 1 } {\displaystyle x_{i}\in \{0,1\}} . Here x i {\displaystyle x_{i}} represents the number of instances of item i {\displaystyle i} to include in the knapsack. Informally, the problem is to maximize the sum of the values of the items in the knapsack so that the sum of the weights is less than or equal to the knapsack's capacity. The bounded knapsack problem (BKP) removes the restriction that there is only one of each item, but restricts the number x i {\displaystyle x_{i}} of copies of each kind of item to a maximum non-negative integer value c {\displaystyle c} : maximize ∑ i = 1 n v i x i {\displaystyle \sum _{i=1}^{n}v_{i}x_{i}} subject to ∑ i = 1 n w i x i ≤ W {\displaystyle \sum _{i=1}^{n}w_{i}x_{i}\leq W} and x i ∈ { 0 , 1 , 2 , … , c } . {\displaystyle x_{i}\in \{0,1,2,\dots ,c\}.} The unbounded knapsack problem (UKP) places no upper bound on the number of copies of each kind of item and can be formulated as above except that the only restriction on x i {\displaystyle x_{i}} is that it is a non-negative integer. maximize ∑ i = 1 n v i x i {\displaystyle \sum _{i=1}^{n}v_{i}x_{i}} subject to ∑ i = 1 n w i x i ≤ W {\displaystyle \sum _{i=1}^{n}w_{i}x_{i}\leq W} and x i ∈ N . {\displaystyle x_{i}\in \mathbb {N} .} One example of the unbounded knapsack problem is given using the figure shown at the beginning of this article and the text "if any number of each book is available" in the caption of that figure. == Computational complexity == The knapsack problem is interesting from the perspective of computer science for many reasons: The decision problem form of the knapsack problem (Can a value of at least V be achieved without exceeding the weight W?) is NP-complete, thus there is no known algorithm that is both correct and fast (polynomial-time) in all cases. There is no known polynomial algorithm which can tell, given a solution, whether it is optimal (which would mean that there is no solution with a larger V). This problem is co-NP-complete. There is a pseudo-polynomial time algorithm using dynamic programming. There is a fully polynomial-time approximation scheme, which uses the pseudo-polynomial time algorithm as a subroutine, described below. Many cases that arise in practice, and "random instances" from some distributions, can nonetheless be solved exactly. There is a link between the "decision" and "optimization" problems in that if there exists a polynomial algorithm that solves the "decision" problem, then one can find the maximum value for the optimization problem in polynomial time by applying this algorithm iteratively while increasing the value of k. On the other hand, if an algorithm finds the optimal value of the optimization problem in polynomial time, then the decision problem can be solved in polynomial time by comparing the value of the solution output by this algorithm with the value of k. Thus, both versions of the problem are of similar difficulty. One theme in research literature is to identify what the "hard" instances of the knapsack problem look like, or viewed another way, to identify what properties of instances in practice might make them more amenable than their worst-case NP-complete behaviour suggests. The goal in finding these "hard" instances is for their use in public-key cryptography systems, such as the Merkle–Hellman knapsack cryptosystem. More generally, better understanding of the structure of the space of instances of an optimization problem helps to advance the study of the particular problem and can improve algorithm selection. Furthermore, notable is the fact that the hardness of the knapsack problem depends on the form of the input. If the weights and profits are given as integers, it is weakly NP-complete, while it is strongly NP-complete if the weights and profits are given as rational numbers. However, in the case of rational weights and profits it still admits a fully polynomial-time approximation scheme. === Unit-cost models === The NP-hardness of the Knapsack problem relates to computational models in which the size of integers matters (such as the Turing machine). In contrast, decision trees count each decision as a single step. Dobkin and Lipton show an 1 2 n 2 {\displaystyle {1 \over 2}n^{2}} lower bound on linear decision trees for the knapsack problem, that is, trees where decision nodes test the sign of affine functions. This was generalized to algebraic decision trees by Steele and Yao. If the elements in the problem are real numbers or rationals, the decision-tree lower bound extends to the real random-access machine model with an instruction set that includes addition, subtraction and multiplication of real numbers, as well as comparison and either division or remaindering ("floor"). This model covers more algorithms than the algebraic decision-tree model, as it encompasses algorithms that use indexing into tables. However, in this model all program steps are counted, not just decisions. An upper bound for a decision-tree model was given by Meyer auf der Heide who showed that for every n there exists an O(n4)-deep linear decision tree that solves the subset-sum problem with n items. Note that this does not imply any upper bound for an algorithm that should solve the problem for any given n. == Solving == Several algorithms are available to solve knapsack problems, based on the dynamic programming approach, the branch and bound approach or hybridizations of both approaches. === Dynamic programming in-advance algorithm === The unbounded knapsack problem (UKP) places no restriction on the number of copies of each kind of item. Besides, here we assume that x i > 0 {\displaystyle x_{i}>0} m [ w ′ ] = max ( ∑ i = 1 n v i x i ) {\displaystyle m[w']=\max \left(\sum _{i=1}^{n}v_{i}x_{i}\right)} subject to ∑

    Read more →
  • List of network buses

    List of network buses

    List of electrical characteristics of single collision domain segment "slow speed" network buses: The number of nodes can be limited by either number of available addresses or bus capacitance. None of the above use any analog domain modulation techniques like MLT-3 encoding, PAM-5 etc. PSI5 designed with automation applications in mind is a bit unusual in that it uses Manchester code.

    Read more →
  • Hexagonal sampling

    Hexagonal sampling

    A multidimensional signal is a function of M independent variables where M ≥ 2 {\displaystyle M\geq 2} . Real world signals, which are generally continuous time signals, have to be discretized (sampled) in order to ensure that digital systems can be used to process the signals. It is during this process of discretization where sampling comes into picture. Although there are many ways of obtaining a discrete representation of a continuous time signal, periodic sampling is by far the simplest scheme. Theoretically, sampling can be performed with respect to any set of points. But practically, sampling is carried out with respect to a set of points that have a certain algebraic structure. Such structures are called lattices. Mathematically, the process of sampling an N {\displaystyle N} -dimensional signal can be written as: w ( t ^ ) = w ( V . n ^ ) {\displaystyle w({\hat {t}})=w(V.{\hat {n}})} where t ^ {\displaystyle {\hat {t}}} is continuous domain M-dimensional vector (M-D) that is being sampled, n ^ {\displaystyle {\hat {n}}} is an M-dimensional integer vector corresponding to indices of a sample, and V is an N × N {\displaystyle N\times N} sampling matrix. == Motivation == Multidimensional sampling provides the opportunity to look at digital methods to process signals. Some of the advantages of processing signals in the digital domain include flexibility via programmable DSP operations, signal storage without the loss of fidelity, opportunity for encryption in communication, lower sensitivity to hardware tolerances. Thus, digital methods are simultaneously both powerful and flexible. In many applications, they act as less expensive alternatives to their analog counterparts. Sometimes, the algorithms implemented using digital hardware are so complex that they have no analog counterparts. Multidimensional digital signal processing deals with processing signals represented as multidimensional arrays such as 2-D sequences or sampled images.[1] Processing these signals in the digital domain permits the use of digital hardware where in signal processing operations are specified by algorithms. As real world signals are continuous time signals, multidimensional sampling plays a crucial role in discretizing the real world signals. The discrete time signals are in turn processed using digital hardware to extract information from the signal. == Preliminaries == === Region of Support === The region outside of which the samples of the signal take zero values is known as the Region of support (ROS). From the definition, it is clear that the region of support of a signal is not unique. === Fourier transform === The Fourier transform is a tool that allows us to simplify mathematical operations performed on the signal. The transform basically represents any signal as a weighted combination of sinusoids. The Fourier and the inverse Fourier transform of an M-dimensional signal can be defined as follows: X a ( Ω ^ ) = ∫ − ∞ + ∞ x a ( t ^ ) e − j Ω ^ T t ^ d t ^ {\displaystyle X_{a}({\hat {\Omega }})=\int _{-\infty }^{+\infty }\!x_{a}({\hat {t}})e^{-j{\hat {\Omega }}^{T}{\hat {t}}}d{\hat {t}}} x a ( t ^ ) = 1 2 π M ∫ − ∞ + ∞ X ( Ω ^ ) e ( j Ω ^ T t ^ ) d Ω ^ {\displaystyle x_{a}({\hat {t}})={\frac {1}{2\pi ^{M}}}\int _{-\infty }^{+\infty }\!X({\hat {\Omega }})e^{(j{\hat {\Omega }}^{T}{\hat {t}})}\,\mathrm {d} {\hat {\Omega }}} The cap symbol ^ indicates that the operation is performed on vectors. The Fourier transform of the sampled signal is observed to be a periodic extension of the continuous time Fourier transform of the signal. This is mathematically represented as: X ( ω ) = 1 | d e t ( V ) | ∑ k X a ( Ω ^ − U k ) {\displaystyle X(\omega )={\frac {1}{|det(V)|}}\sum _{k}\!X_{a}({\hat {\Omega }}-Uk)} where ω = V ~ Ω {\displaystyle \omega ={\tilde {V}}\Omega } and U = 2 π V ~ {\displaystyle U=2\pi {\tilde {V}}} is the periodicity matrix where ~ denotes matrix transposition. Thus sampling in the spatial domain results in periodicity in the Fourier domain. === Aliasing === A band limited signal may be periodically replicated in many ways. If the replication results in an overlap between replicated regions, the signal suffers from aliasing. Under such conditions, a continuous time signal cannot be perfectly recovered from its samples. Thus in order to ensure perfect recovery of the continuous signal, there must be zero overlap multidimensional sampling of the replicated regions in the transformed domain. As in the case of 1-dimensional signals, aliasing can be prevented if the continuous time signal is sampled at an adequate sufficiently high rate. === Sampling density === It is a measure of the number of samples per unit area. It is defined as: S . D = 1 | d e t ( V ) | = | d e t ( U ) | 4 π 2 {\displaystyle S.D={\frac {1}{|det(V)|}}={\frac {|det(U)|}{4\pi ^{2}}}} . The minimum number of samples per unit area required to completely recover the continuous time signal is termed as optimal sampling density. In applications where memory or processing time are limited, emphasis must be given to minimizing the number of samples required to represent the signal completely. == Existing approaches == For a bandlimited waveform, there are infinitely many ways the signal can be sampled without producing aliases in the Fourier domain. But only two strategies are commonly used: rectangular sampling and hexagonal sampling. === Rectangular and Hexagonal sampling === In rectangular sampling, a 2-dimensional signal, for example, is sampled according to the following V matrix: V r e c t = [ T 1 0 0 T 2 ] {\displaystyle V_{rect}={\begin{bmatrix}T1&0\\0&T2\end{bmatrix}}} where T1 and T2 are the sampling periods along the horizontal and vertical direction respectively. In hexagonal sampling, the V matrix assumes the following general form: V h e x = [ T 1 T 1 − T 2 T 2 ] {\displaystyle V_{hex}={\begin{bmatrix}T1&T1\\-T2&T2\end{bmatrix}}} The difference in the efficiency of the two schemes is highlighted using a bandlimited signal with a circular region of support of radius R. The circle can be inscribed in a square of length 2R or a regular hexagon of length 2 R 3 {\displaystyle {\frac {2R}{\sqrt {3}}}} . Consequently, the region of support is now transformed into a square and a hexagon respectively. If these regions are periodically replicated in the frequency domain such that there is zero overlap between any two regions, then by periodically replicating the square region of support, we effectively sample the continuous signal on a rectangular lattice. Similarly periodic replication of the hexagonal region of support maps to sampling the continuous signal on a hexagonal lattice. From U, the periodicity matrix, we can calculate the optimal sampling density for both the rectangular and hexagonal schemes. It is found that in order to completely recover the circularly band-limited signal, the hexagonal sampling scheme requires 13.4% fewer samples than the rectangular sampling scheme. The reduction may appear to be of little significance for a 2-dimensional signal. But as the dimensionality of the signal increases, the efficiency of the hexagonal sampling scheme will become far more evident. For instance, the reduction achieved for an 8-dimensional signal is 93.8%. To highlight the importance of the obtained result [2], try and visualize an image as a collection of infinite number of samples. The primary entity responsible for vision, i.e. the photoreceptors (rods and cones) are present on the retina of all mammals. These cells are not arranged in rows and columns. By adapting a hexagonal sampling scheme, our eyes are able to process images much more efficiently. The importance of hexagonal sampling lies in the fact that the photoreceptors of the human vision system lie on a hexagonal sampling lattice and, thus, perform hexagonal sampling.[3] In fact, it can be shown that the hexagonal sampling scheme is the optimal sampling scheme for a circularly band-limited signal. == Applications == === Aliasing effects minimized by the use of optimal sampling grids === Recent advances in the CCD technology has made hexagonal sampling feasible for real life applications. Historically, because of technology constraints, detector arrays were implemented only on 2-dimensional rectangular sampling lattices with rectangular shape detectors. But the super [CCD] detector introduced by Fuji has an octagonal shaped pixel in a hexagonal grid. Theoretically, the performance of the detector was greatly increased by introducing an octagonal pixel. The number of pixels required to represent the sample was reduced and there was significant improvement in the Signal-to-Noise Ratio (SNR) when compared with that of a rectangular pixel. But the drawback of using hexagonal pixels is that the associated fill factor will be less than 82%. An alternative method would be to interpolate hexagonal pixels in such a manner that we ultimately end up with a rectangular grid. The Spot 5 satellite incorporates a

    Read more →
  • Unknown key-share attack

    Unknown key-share attack

    As defined by Blake-Wilson & Menezes (1999), an unknown key-share (UKS) attack on an authenticated key agreement (AK) or authenticated key agreement with key confirmation (AKC) protocol is an attack whereby an entity A {\displaystyle A} ends up believing she shares a key with B {\displaystyle B} , and although this is in fact the case, B {\displaystyle B} mistakenly believes the key is instead shared with an entity E ≠ A {\displaystyle E\neq A} . In other words, in a UKS, an opponent, say Eve, coerces honest parties Alice and Bob into establishing a secret key where at least one of Alice and Bob does not know that the secret key is shared with the other. For example, Eve may coerce Bob into believing he shares the key with Eve, while he actually shares the key with Alice. The “key share” with Alice is thus unknown to Bob.

    Read more →
  • European Grid Infrastructure

    European Grid Infrastructure

    EGI (originally an initialism for European Grid Infrastructure) is a federation of computing and storage resource providers that deliver advanced computing and data analytics services for research and innovation. The Federation is governed by its participants represented in the EGI Council and coordinated by the EGI Foundation. As of 2024, the EGI Federation supports 160 scientific communities worldwide and over 95,000 users in their intensive data analysis. The most significant scientific communities supported by EGI in 2022 were Medical and Health Sciences, High Energy Physics, and Engineering and Technology. The EGI Federation provideds services through over 150 data centres, of which 25 are cloud sites, in 43 countries and 64 Research Infrastructures (4 of which are members of the Federation). == Name == Originally, EGI stood for European Grid Infrastructure. This reflected its focus on providing access to high-throughput computing resources across Europe using Grid computing techniques. However, as EGI's service offerings expanded beyond traditional grid computing, particularly with the incorporation of federated cloud services, the original meaning of the acronym became less accurate. To emphasise the broader scope of EGI's services and avoid any confusion associated with the outdated term "grid," it is recommended to refer to EGI simply as EGI. == Structure == === EGI Federation === The EGI Federation delivers a scalable digital research infrastructure (e-infrastructure), empowering tens of thousands of researchers across diverse scientific disciplines. Through the EGI Federation, researchers gain access to advanced computing and data analytics capabilities, including large-scale data analysis, while benefiting from the collaborative efforts of hundreds of service providers from both public and private sectors, consolidating resources from Europe and beyond. Overall, the EGI Federation offers a range of services, encompassing distributed high-throughput computing and cloud computing, storage and data management capabilities, co-development of new solutions, expert support, and comprehensive training opportunities. This ecosystem propels collaboration, scientific progress and innovation. === EGI Foundation === The EGI Foundation is the coordinating body of the EGI Federation. It was established in 2010 with headquarters in Amsterdam, Netherlands. The Foundation coordinates the research and innovation efforts of its members, spanning technical areas critical to data-intensive science, including large-scale data processing and analysis, distributed Artificial Intelligence/Machine Learning, federated Identity and access management and the application of digital twins for research. The day-to-day running of the EGI Foundation is supervised by the Executive Board. The board’s members work closely with the EGI Director on operational, technical and financial issues. The Executive Board’s members are appointed by the EGI Council for a two-year term. === EGI Council === The EGI Council is responsible for defining the strategic direction of the EGI Federation. The Council acts as the senior decision-making and supervisory authority of the EGI Foundation, with a mandate to define the strategic direction of the entire EGI ecosystem. === EGI Services === EGI offers a suite of services to support data-intensive research. These services include compute resources, orchestration tools, storage and data management solutions, training programmes, security and identity services, and applications. Compute resources encompass cloud compute, cloud container compute, high-throughput compute, and software distribution. Orchestration tools include the Workload Manager and infrastructure manager. Storage and data management solutions include online storage, data transfer, and DataHub. Training programmes cover FitSM, ISO 27001, and general training infrastructure. EGI Check-in and Secrets Store are key security and identity services, while applications such as Notebooks and Replay enhance research productivity. In addition to services for Research, EGI also provides services for Federation and Business. Services for Federation are designed to help resource providers and user communities collaborate and share resources. EGI also offers a range of services to support businesses in their digital transformation. Through the EGI Digital Innovation Hub (EGI DIH), companies can access advanced computing resources, networking, funding and training opportunities, collaborate with research institutions, and test solutions before investing. == History == In 2002, the first large-scale experimental facility was successfully demonstrated by the DataGrid project under the lead of CERN with tens of technical architects from the major High Energy Physics institutes in the world. For the first time, distributed computing was applied to data-intensive processing. It aimed at developing a large-scale computational grid to facilitate distributed data-intensive scientific computing across High Energy Physics, Earth Observation, and Biology science applications. On 28 February 2003, the first software release of LCG-MW was published. gLite, the Lightweight Middleware for Grid Computing and LCG, Large Hadron Collider Computing Grid, are the cornerstone of the Worldwide LHC Computing Grid, which expanded over time towards the EGI Federation. 2004 marks the year of the first pilot infrastructure, seeing the participation of CERN and data centres in the United Kingdom, Spain, Germany, the Netherlands, France, Canada, Russia, Bulgaria, the Asia-Pacific region and Switzerland. Over the years, the infrastructure has grown into a federation of 128 data centres and 25 cloud providers serving more than 95,000 users worldwide. In 2004, the first data processing tasks started being formally recorded in a central accounting system. The EGI Accounting Portal provides the accounting data for Compute, Storage and Data services gathered from the data centres of the EGI Federation. A few years later, in 2010, EGI was established as the coordinating body of the EGI Federation to build an integrated pan-European infrastructure to support European research communities primarily. In the same year, EGI launched the flagship project EGI Inspire. That project brought together European organisations to establish a sustainable European Grid Infrastructure for large-scale data analysis. The success of the project was due to the adoption of a distributed computing model to solve big data problems. Moreover, EGI-Inspire harmonised operational policies across its federation of affiliated data centres and cloud service providers worldwide, integrating e-infrastructures from 57 countries. The EGI Federation was the first to apply federation to cloud provisioning, opening a new avenue in large-scale interactive data analysis. In 2015, within EGI Engage, opening a new avenue in large-scale interactive data analysis. The EGI Federated Cloud is an IaaS-type cloud, incorporating academic and private clouds and virtualised resources built using open standards. Its development is driven by the needs of the scientific community, resulting in a novel research e-infrastructure that relies on well-established federated operational services, making EGI a dependable resource for scientific endeavours. In 2015, EGI, EUDAT, GÉANT, LIBER and OpenAIRE published a position paper on a 'European Open Science Cloud for Research'. With the EOSC-hub project in 2016, EGI started contributing in practice to shaping the services for the EOSC. The work continued with a series of projects, like EOSC Enhance, EOSC Life and EOSC Synergy. With EGI-ACE and its contribution to EOSC Future, EGI has continued developing the EOSC Core. In early 2024, EGI started providing services to the EOSC EU Node, and with EOSC Beyond it will provide new EOSC Core capabilities and pilot additional national and thematic nodes. In October 2024, EUDAT, GÉANT, OpenAIRE, PRACE and EGI signed a Memorandum of Understanding establishing the European e-Infrastructures Assembly. This collaboration will bolster the position and promote the services of e-Infrastructures, empowering researchers across Europe to drive innovation and advance scientific discovery.

    Read more →
  • Strong cryptography

    Strong cryptography

    Strong cryptography or cryptographically strong are general terms used to designate the cryptographic algorithms that, when used correctly, provide a very high (usually insurmountable) level of protection against any eavesdropper, including the government agencies. There is no precise definition of the boundary line between the strong cryptography and (breakable) weak cryptography, as this border constantly shifts due to improvements in hardware and cryptanalysis techniques. These improvements eventually place the capabilities once available only to the NSA within the reach of a skilled individual, so in practice there are only two levels of cryptographic security, "cryptography that will stop your kid sister from reading your files, and cryptography that will stop major governments from reading your files" (Bruce Schneier). The strong cryptography algorithms have high security strength, for practical purposes usually defined as a number of bits in the key. For example, the United States government, when dealing with export control of encryption, considered as of 1999 any implementation of the symmetric encryption algorithm with the key length above 56 bits or its public key equivalent to be strong and thus potentially a subject to the export licensing. To be strong, an algorithm needs to have a sufficiently long key and be free of known mathematical weaknesses, as exploitation of these effectively reduces the key size. At the beginning of the 21st century, the typical security strength of the strong symmetrical encryption algorithms is 128 bits (slightly lower values still can be strong, but usually there is little technical gain in using smaller key sizes). Demonstrating the resistance of any cryptographic scheme to attack is a complex matter, requiring extensive testing and reviews, preferably in a public forum. Good algorithms and protocols are required (similarly, good materials are required to construct a strong building), but good system design and implementation is needed as well: "it is possible to build a cryptographically weak system using strong algorithms and protocols" (just like the use of good materials in construction does not guarantee a solid structure). Many real-life systems turn out to be weak when the strong cryptography is not used properly, for example, random nonces are reused A successful attack might not even involve algorithm at all, for example, if the key is generated from a password, guessing a weak password is easy and does not depend on the strength of the cryptographic primitives. A user can become the weakest link in the overall picture, for example, by sharing passwords and hardware tokens with the colleagues. == Background == The level of expense required for strong cryptography originally restricted its use to the government and military agencies, until the middle of the 20th century the process of encryption required a lot of human labor and errors (preventing the decryption) were very common, so only a small share of written information could have been encrypted. US government, in particular, was able to keep a monopoly on the development and use of cryptography in the US into the 1960s. In the 1970, the increased availability of powerful computers and unclassified research breakthroughs (Data Encryption Standard, the Diffie-Hellman and RSA algorithms) made strong cryptography available for civilian use. Mid-1990s saw the worldwide proliferation of knowledge and tools for strong cryptography. By the 21st century the technical limitations were gone, although the majority of the communication were still unencrypted. At the same the cost of building and running systems with strong cryptography became roughly the same as the one for the weak cryptography. The use of computers changed the process of cryptanalysis, famously with Bletchley Park's Colossus. But just as the development of digital computers and electronics helped in cryptanalysis, it also made possible much more complex ciphers. It is typically the case that use of a quality cipher is very efficient, while breaking it requires an effort many orders of magnitude larger - making cryptanalysis so inefficient and impractical as to be effectively impossible. == Cryptographically strong algorithms == This term "cryptographically strong" is often used to describe an encryption algorithm, and implies, in comparison to some other algorithm (which is thus cryptographically weak), greater resistance to attack. But it can also be used to describe hashing and unique identifier and filename creation algorithms. See for example the description of the Microsoft .NET runtime library function Path.GetRandomFileName. In this usage, the term means "difficult to guess". An encryption algorithm is intended to be unbreakable (in which case it is as strong as it can ever be), but might be breakable (in which case it is as weak as it can ever be) so there is not, in principle, a continuum of strength as the idiom would seem to imply: Algorithm A is stronger than Algorithm B which is stronger than Algorithm C, and so on. The situation is made more complex, and less subsumable into a single strength metric, by the fact that there are many types of cryptanalytic attack and that any given algorithm is likely to force the attacker to do more work to break it when using one attack than another. There is only one known unbreakable cryptographic system, the one-time pad, which is not generally possible to use because of the difficulties involved in exchanging one-time pads without them being compromised. So any encryption algorithm can be compared to the perfect algorithm, the one-time pad. The usual sense in which this term is (loosely) used, is in reference to a particular attack, brute force key search — especially in explanations for newcomers to the field. Indeed, with this attack (always assuming keys to have been randomly chosen), there is a continuum of resistance depending on the length of the key used. But even so there are two major problems: many algorithms allow use of different length keys at different times, and any algorithm can forgo use of the full key length possible. Thus, Blowfish and RC5 are block cipher algorithms whose design specifically allowed for several key lengths, and who cannot therefore be said to have any particular strength with respect to brute force key search. Furthermore, US export regulations restrict key length for exportable cryptographic products and in several cases in the 1980s and 1990s (e.g., famously in the case of Lotus Notes' export approval) only partial keys were used, decreasing 'strength' against brute force attack for those (export) versions. More or less the same thing happened outside the US as well, as for example in the case of more than one of the cryptographic algorithms in the GSM cellular telephone standard. The term is commonly used to convey that some algorithm is suitable for some task in cryptography or information security, but also resists cryptanalysis and has no, or fewer, security weaknesses. Tasks are varied, and might include: generating randomness encrypting data providing a method to ensure data integrity Cryptographically strong would seem to mean that the described method has some kind of maturity, perhaps even approved for use against different kinds of systematic attacks in theory and/or practice. Indeed, that the method may resist those attacks long enough to protect the information carried (and what stands behind the information) for a useful length of time. But due to the complexity and subtlety of the field, neither is almost ever the case. Since such assurances are not actually available in real practice, sleight of hand in language which implies that they are will generally be misleading. There will always be uncertainty as advances (e.g., in cryptanalytic theory or merely affordable computer capacity) may reduce the effort needed to successfully use some attack method against an algorithm. In addition, actual use of cryptographic algorithms requires their encapsulation in a cryptosystem, and doing so often introduces vulnerabilities which are not due to faults in an algorithm. For example, essentially all algorithms require random choice of keys, and any cryptosystem which does not provide such keys will be subject to attack regardless of any attack resistant qualities of the encryption algorithm(s) used. == Legal issues == Widespread use of encryption increases the costs of surveillance, so the government policies aim to regulate the use of the strong cryptography. In the 2000s, the effect of encryption on the surveillance capabilities was limited by the ever-increasing share of communications going through the global social media platforms, that did not use the strong encryption and provided governments with the requested data. Murphy talks about a legislative balance that needs to be struck between the power of the government that are broad enough to be able to follow the qui

    Read more →
  • Contrastive Language-Image Pre-training

    Contrastive Language-Image Pre-training

    Contrastive Language-Image Pre-training (CLIP) is a technique for training a pair of neural network models, one for image understanding and one for text understanding, using a contrastive objective. This method has enabled broad applications across multiple domains, including cross-modal retrieval, text-to-image generation, and aesthetic ranking. == Algorithm == The CLIP method trains a pair of models contrastively. One model takes in a piece of text as input and outputs a single vector representing its semantic content. The other model takes in an image and similarly outputs a single vector representing its visual content. The models are trained so that the vectors corresponding to semantically similar text-image pairs are close together in the shared vector space, while those corresponding to dissimilar pairs are far apart. To train a pair of CLIP models, one would start by preparing a large dataset of image-caption pairs. During training, the models are presented with batches of N {\displaystyle N} image-caption pairs. Let the outputs from the text and image models be respectively v 1 , . . . , v N , w 1 , . . . , w N {\displaystyle v_{1},...,v_{N},w_{1},...,w_{N}} . Two vectors are considered "similar" if their dot product is large. The loss incurred on this batch is the multi-class N-pair loss, which is a symmetric cross-entropy loss over similarity scores: − 1 N ∑ i ln ⁡ e v i ⋅ w i / T ∑ j e v i ⋅ w j / T − 1 N ∑ j ln ⁡ e v j ⋅ w j / T ∑ i e v i ⋅ w j / T {\displaystyle -{\frac {1}{N}}\sum _{i}\ln {\frac {e^{v_{i}\cdot w_{i}/T}}{\sum _{j}e^{v_{i}\cdot w_{j}/T}}}-{\frac {1}{N}}\sum _{j}\ln {\frac {e^{v_{j}\cdot w_{j}/T}}{\sum _{i}e^{v_{i}\cdot w_{j}/T}}}} In essence, this loss function encourages the dot product between matching image and text vectors ( v i ⋅ w i {\displaystyle v_{i}\cdot w_{i}} ) to be high, while discouraging high dot products between non-matching pairs. The parameter T > 0 {\displaystyle T>0} is the temperature, which is parameterized in the original CLIP model as T = e − τ {\displaystyle T=e^{-\tau }} where τ ∈ R {\displaystyle \tau \in \mathbb {R} } is a learned parameter. Other loss functions are possible. For example, Sigmoid CLIP (SigLIP) proposes the following loss function: L = 1 N ∑ i , j ∈ 1 : N f ( ( 2 δ i , j − 1 ) ( e τ w i ⋅ v j + b ) ) {\displaystyle L={\frac {1}{N}}\sum _{i,j\in 1:N}f((2\delta _{i,j}-1)(e^{\tau }w_{i}\cdot v_{j}+b))} where f ( x ) = ln ⁡ ( 1 + e − x ) {\displaystyle f(x)=\ln(1+e^{-x})} is the negative log sigmoid loss, and the Dirac delta symbol δ i , j {\displaystyle \delta _{i,j}} is 1 if i = j {\displaystyle i=j} else 0. == CLIP models == While the original model was developed by OpenAI, subsequent models have been trained by other organizations as well. === Image model === The image encoding models used in CLIP are typically vision transformers (ViT). The naming convention for these models often reflects the specific ViT architecture used. For instance, "ViT-L/14" means a "vision transformer large" (compared to other models in the same series) with a patch size of 14, meaning that the image is divided into 14-by-14 pixel patches before being processed by the transformer. The size indicator ranges from B, L, H, G (base, large, huge, giant), in that order. Other than ViT, the image model is typically a convolutional neural network, such as ResNet (in the original series by OpenAI), or ConvNeXt (in the OpenCLIP model series by LAION). Since the output vectors of the image model and the text model must have exactly the same length, both the image model and the text model have fixed-length vector outputs, which in the original report is called "embedding dimension". For example, in the original OpenAI model, the ResNet models have embedding dimensions ranging from 512 to 1024, and for the ViTs, from 512 to 768. Its implementation of ViT was the same as the original one, with one modification: after position embeddings are added to the initial patch embeddings, there is a LayerNorm. Its implementation of ResNet was the same as the original one, with 3 modifications: In the start of the CNN (the "stem"), they used three stacked 3x3 convolutions instead of a single 7x7 convolution, as suggested by. There is an average pooling of stride 2 at the start of each downsampling convolutional layer (they called it rect-2 blur pooling according to the terminology of ). This has the effect of blurring images before downsampling, for antialiasing. The final convolutional layer is followed by a multiheaded attention pooling. ALIGN a model with similar capabilities, trained by researchers from Google used EfficientNet, a kind of convolutional neural network. === Text model === The text encoding models used in CLIP are typically Transformers. In the original OpenAI report, they reported using a Transformer (63M-parameter, 12-layer, 512-wide, 8 attention heads) with lower-cased byte pair encoding (BPE) with 49152 vocabulary size. Context length was capped at 76 for efficiency. Like GPT, it was decoder-only, with only causally-masked self-attention. Its architecture is the same as GPT-2. Like BERT, the text sequence is bracketed by two special tokens [SOS] and [EOS] ("start of sequence" and "end of sequence"). Take the activations of the highest layer of the transformer on the [EOS], apply LayerNorm, then a final linear map. This is the text encoding of the input sequence. The final linear map has output dimension equal to the embedding dimension of whatever image encoder it is paired with. These models all had context length 77 and vocabulary size 49408. ALIGN used BERT of various sizes. == Dataset == === WebImageText === The CLIP models released by OpenAI were trained on a dataset called "WebImageText" (WIT) containing 400 million pairs of images and their corresponding captions scraped from the internet. The total number of words in this dataset is similar in scale to the WebText dataset used for training GPT-2, which contains about 40 gigabytes of text data. The dataset contains 500,000 text-queries, with up to 20,000 (image, text) pairs per query. The text-queries were generated by starting with all words occurring at least 100 times in English Wikipedia, then extended by bigrams with high mutual information, names of all Wikipedia articles above a certain search volume, and WordNet synsets. The dataset is private and has not been released to the public, and there is no further information on it. ==== Data preprocessing ==== For the CLIP image models, the input images are preprocessed by first dividing each of the R, G, B values of an image by the maximum possible value, so that these values fall between 0 and 1, then subtracting by [0.48145466, 0.4578275, 0.40821073], and dividing by [0.26862954, 0.26130258, 0.27577711]. The rationale was that these are the mean and standard deviations of the images in the WebImageText dataset, so this preprocessing step roughly whitens the image tensor. These numbers slightly differ from the standard preprocessing for ImageNet, which uses [0.485, 0.456, 0.406] and [0.229, 0.224, 0.225]. If the input image does not have the same resolution as the native resolution (224×224 for all except ViT-L/14@336px, which has 336×336 resolution), then the input image is first scaled by bicubic interpolation, so that its shorter side is the same as the native resolution, then the central square of the image is cropped out. === Others === ALIGN used over one billion image-text pairs, obtained by extracting images and their alt-tags from online crawling. The method was described as similar to how the Conceptual Captions dataset was constructed, but instead of complex filtering, they only applied a frequency-based filtering. Later models trained by other organizations had published datasets. For example, LAION trained OpenCLIP with published datasets LAION-400M, LAION-2B, and DataComp-1B. == Training == In the original OpenAI CLIP report, they reported training 5 ResNet and 3 ViT (ViT-B/32, ViT-B/16, ViT-L/14). Each was trained for 32 epochs. The largest ResNet model took 18 days to train on 592 V100 GPUs. The largest ViT model took 12 days on 256 V100 GPUs. All ViT models were trained on 224×224 image resolution. The ViT-L/14 was then boosted to 336×336 resolution by FixRes, resulting in a model. They found this was the best-performing model. In the OpenCLIP series, the ViT-L/14 model was trained on 384 A100 GPUs on the LAION-2B dataset, for 160 epochs for a total of 32B samples seen. == Applications == === Cross-modal retrieval === CLIP's cross-modal retrieval enables the alignment of visual and textual data in a shared latent space, allowing users to retrieve images based on text descriptions and vice versa, without the need for explicit image annotations. In text-to-image retrieval, users input descriptive text, and CLIP retrieves images with matching embeddings. In image-to-text retrieval, images are used to find related text content. CLIP’s ability to connect vis

    Read more →
  • Knapsack cryptosystems

    Knapsack cryptosystems

    Knapsack cryptosystems are cryptosystems whose security is based on the hardness of solving the knapsack problem. They remain quite unpopular because simple versions of these algorithms have been broken for several decades. However, that type of cryptosystem is a good candidate for post-quantum cryptography. The most famous knapsack cryptosystem is the Merkle-Hellman Public Key Cryptosystem, one of the first public key cryptosystems, published the same year as the RSA cryptosystem. However, this system has been broken by several attacks: one from Shamir, one by Adleman, and the low density attack. However, there exist modern knapsack cryptosystems that are considered secure so far: among them is Nasako-Murakami 2006. Knapsack cryptosystems, when not subject to classical cryptoanalysis, are believed to be difficult even for quantum computers. That is not the case for systems that rely on factoring large integers, like RSA, or computing discrete logarithms, like ECDSA, problems solved in polynomial time with Shor's algorithm.

    Read more →
  • Critical security parameter

    Critical security parameter

    In cryptography, a critical security parameter (CSP) is information that is either user or system defined and is used to operate a cryptography module in processing encryption functions including cryptographic keys and authentication data, such as passwords, the disclosure or modification of which can compromise the security of a cryptographic module or the security of the information protected by the module.

    Read more →