AI Data Jobs Near Me

AI Data Jobs Near Me — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Digital transaction management

    Digital transaction management

    Digital transaction management (DTM) is a category of cloud services designed to digitally manage document-based transactions. DTM removes the friction inherent in transactions that involve people, documents, and data to create faster, easier, more convenient, and secure processes. DTM goes beyond content and document management to include e-signatures, authentication and non-repudiation; enabling co-browsing between the customer and the business; document transfer and certification; secure archiving that goes beyond records management; and a variety of meta-processes around managing electronic transactions and the documents associated with them. DTM standards are proposed and managed by the xDTM Standard Association Aragon Research has estimated that "by YE 2016, 70% of large enterprises will have a DTM initiative underway or fully implemented."

    Read more →
  • Bitmap index

    Bitmap index

    A bitmap index is a special kind of database index that uses bitmaps. Bitmap indexes have traditionally been considered to work well for low-cardinality columns, which have a modest number of distinct values, either absolutely, or relative to the number of records that contain the data. The extreme case of low cardinality is Boolean data (e.g., does a resident in a city have internet access?), which has two values, True and False. Bitmap indexes use bit arrays (commonly called bitmaps) and answer queries by performing bitwise logical operations on these bitmaps. Bitmap indexes have a significant space and performance advantage over other structures for query of such data. Their drawback is they are less efficient than the traditional B-tree indexes for columns whose data is frequently updated: consequently, they are more often employed in read-only systems that are specialized for fast query - e.g., data warehouses, and generally unsuitable for online transaction processing applications. Some researchers argue that bitmap indexes are also useful for moderate or even high-cardinality data (e.g., unique-valued data) which is accessed in a read-only manner, and queries access multiple bitmap-indexed columns using the AND, OR or XOR operators extensively. Bitmap indexes are also useful in data warehousing applications for joining a large fact table to smaller dimension tables such as those arranged in a star schema. == Example == Continuing the internet access example, a bitmap index may be logically viewed as follows: On the left, Identifier refers to the unique number assigned to each resident, HasInternet is the data to be indexed, the content of the bitmap index is shown as two columns under the heading bitmaps. Each column in the left illustration under the Bitmaps header is a bitmap in the bitmap index. In this case, there are two such bitmaps, one for "has internet" Yes and one for "has internet" No. It is easy to see that each bit in bitmap Y shows whether a particular row refers to a person who has internet access. This is the simplest form of bitmap index. Most columns will have more distinct values. For example, the sales amount is likely to have a much larger number of distinct values. Variations on the bitmap index can effectively index this data as well. We briefly review three such variations. Note: Many of the references cited here are reviewed at (John Wu (2007)). For those who might be interested in experimenting with some of the ideas mentioned here, many of them are implemented in open source software such as FastBit, the Lemur Bitmap Index C++ Library, the Roaring Bitmap Java library and the Apache Hive Data Warehouse system. == Compression == For historical reasons, bitmap compression and inverted list compression were developed as separate lines of research, and only later were recognized as solving essentially the same problem. Software can compress each bitmap in a bitmap index to save space. There has been considerable amount of work on this subject. Though there are exceptions such as Roaring bitmaps, Bitmap compression algorithms typically employ run-length encoding, such as the Byte-aligned Bitmap Code, the Word-Aligned Hybrid code, the Partitioned Word-Aligned Hybrid (PWAH) compression, the Position List Word Aligned Hybrid, the Compressed Adaptive Index (COMPAX), Enhanced Word-Aligned Hybrid (EWAH) and the COmpressed 'N' Composable Integer SEt (CONCISE). These compression methods require very little effort to compress and decompress. More importantly, bitmaps compressed with BBC, WAH, COMPAX, PLWAH, EWAH and CONCISE can directly participate in bitwise operations without decompression. This gives them considerable advantages over generic compression techniques such as LZ77. BBC compression and its derivatives are used in a commercial database management system. BBC is effective in both reducing index sizes and maintaining query performance. BBC encodes the bitmaps in bytes, while WAH encodes in words, better matching current CPUs. "On both synthetic data and real application data, the new word aligned schemes use only 50% more space, but perform logical operations on compressed data 12 times faster than BBC." PLWAH bitmaps were reported to take 50% of the storage space consumed by WAH bitmaps and offer up to 20% faster performance on logical operations. Similar considerations can be done for CONCISE and Enhanced Word-Aligned Hybrid. The performance of schemes such as BBC, WAH, PLWAH, EWAH, COMPAX and CONCISE is dependent on the order of the rows. A simple lexicographical sort can divide the index size by 9 and make indexes several times faster. The larger the table, the more important it is to sort the rows. Reshuffling techniques have also been proposed to achieve the same results of sorting when indexing streaming data. == Encoding == Basic bitmap indexes use one bitmap for each distinct value. It is possible to reduce the number of bitmaps used by using a different encoding method. For example, it is possible to encode C distinct values using log(C) bitmaps with binary encoding. This reduces the number of bitmaps, further saving space, but to answer any query, most of the bitmaps have to be accessed. This makes it potentially not as effective as scanning a vertical projection of the base data, also known as a materialized view or projection index. Finding the optimal encoding method that balances (arbitrary) query performance, index size and index maintenance remains a challenge. Without considering compression, Chan and Ioannidis analyzed a class of multi-component encoding methods and came to the conclusion that two-component encoding sits at the kink of the performance vs. index size curve and therefore represents the best trade-off between index size and query performance. == Binning == For high-cardinality columns, it is useful to bin the values, where each bin covers multiple values and build the bitmaps to represent the values in each bin. This approach reduces the number of bitmaps used regardless of encoding method. However, binned indexes can only answer some queries without examining the base data. For example, if a bin covers the range from 0.1 to 0.2, then when the user asks for all values less than 0.15, all rows that fall in the bin are possible hits and have to be checked to verify whether they are actually less than 0.15. The process of checking the base data is known as the candidate check. In most cases, the time used by the candidate check is significantly longer than the time needed to work with the bitmap index. Therefore, binned indexes exhibit irregular performance. They can be very fast for some queries, but much slower if the query does not exactly match a bin. == History == The concept of bitmap index was first introduced by Professor Israel Spiegler and Rafi Maayan in their research "Storage and Retrieval Considerations of Binary Data Bases", published in 1985. The first commercial database product to implement a bitmap index was Computer Corporation of America's Model 204. Patrick O'Neil published a paper about this implementation in 1987. This implementation is a hybrid between the basic bitmap index (without compression) and the list of Row Identifiers (RID-list). Overall, the index is organized as a B+tree. When the column cardinality is low, each leaf node of the B-tree would contain long list of RIDs. In this case, it requires less space to represent the RID-lists as bitmaps. Since each bitmap represents one distinct value, this is the basic bitmap index. As the column cardinality increases, each bitmap becomes sparse and it may take more disk space to store the bitmaps than to store the same content as RID-lists. In this case, it switches to use the RID-lists, which makes it a B+tree index. == In-memory bitmaps == One of the strongest reasons for using bitmap indexes is that the intermediate results produced from them are also bitmaps and can be efficiently reused in further operations to answer more complex queries. Many programming languages support this as a bit array data structure. For example, Java has the BitSet class and .NET have the BitArray class. Some database systems that do not offer persistent bitmap indexes use bitmaps internally to speed up query processing. For example, PostgreSQL versions 8.1 and later implement a "bitmap index scan" optimization to speed up arbitrarily complex logical operations between available indexes on a single table. For tables with many columns, the total number of distinct indexes to satisfy all possible queries (with equality filtering conditions on either of the fields) grows very fast, being defined by this formula: C n [ n 2 ] ≡ n ! ( n − [ n 2 ] ) ! [ n 2 ] ! {\displaystyle \mathbf {C} _{n}^{\left[{\frac {n}{2}}\right]}\equiv {\frac {n!}{\left(n-\left[{\frac {n}{2}}\right]\right)!\left[{\frac {n}{2}}\right]!}}} . A bitmap index scan combines expressions on different indexes, thus requiring only one index per column t

    Read more →
  • Symmetric Boolean function

    Symmetric Boolean function

    In mathematics, a symmetric Boolean function is a Boolean function whose value does not depend on the order of its input bits, i.e., it depends only on the number of ones (or zeros) in the input. For this reason they are also known as Boolean counting functions. There are 2n+1 symmetric n-ary Boolean functions. Instead of the truth table, traditionally used to represent Boolean functions, one may use a more compact representation for an n-variable symmetric Boolean function: the (n + 1)-vector, whose i-th entry (i = 0, ..., n) is the value of the function on an input vector with i ones. Mathematically, the symmetric Boolean functions correspond one-to-one with the functions that map n+1 elements to two elements, f : { 0 , 1 , . . . , n } → { 0 , 1 } {\displaystyle f:\{0,1,...,n\}\rightarrow \{0,1\}} . Symmetric Boolean functions are used to classify Boolean satisfiability problems. == Special cases == A number of special cases are recognized: Majority function: their value is 1 on input vectors with more than n/2 ones Threshold functions: their value is 1 on input vectors with k or more ones for a fixed k All-equal and not-all-equal function: their values is 1 when the inputs do (not) all have the same value Exact-count functions: their value is 1 on input vectors with k ones for a fixed k One-hot or 1-in-n function: their value is 1 on input vectors with exactly one one One-cold function: their value is 1 on input vectors with exactly one zero Congruence functions: their value is 1 on input vectors with the number of ones congruent to k mod m for fixed k, m Parity function: their value is 1 if the input vector has odd number of ones The n-ary versions of AND, OR, XOR, NAND, NOR and XNOR are also symmetric Boolean functions. == Properties == In the following, f k {\displaystyle f_{k}} denotes the value of the function f : { 0 , 1 } n → { 0 , 1 } {\displaystyle f:\{0,1\}^{n}\rightarrow \{0,1\}} when applied to an input vector of weight k {\displaystyle k} . === Weight === The weight of the function can be calculated from its value vector: | f | = ∑ k = 0 n ( n k ) f k {\displaystyle |f|=\sum _{k=0}^{n}{\binom {n}{k}}f_{k}} === Algebraic normal form === The algebraic normal form either contains all monomials of certain order m {\displaystyle m} , or none of them; i.e. the Möbius transform f ^ {\displaystyle {\hat {f}}} of the function is also a symmetric function. It can thus also be described by a simple (n+1) bit vector, the ANF vector f ^ m {\displaystyle {\hat {f}}_{m}} . The ANF and value vectors are related by a Möbius relation: f ^ m = ⨁ k 2 ⊆ m 2 f k {\displaystyle {\hat {f}}_{m}=\bigoplus _{k_{2}\subseteq m_{2}}f_{k}} where k 2 ⊆ m 2 {\displaystyle k_{2}\subseteq m_{2}} denotes all the weights k whose base-2 representation is covered by the base-2 representation of m (a consequence of Lucas’ theorem). Effectively, an n-variable symmetric Boolean function corresponds to a log(n)-variable ordinary Boolean function acting on the base-2 representation of the input weight. For example, for three-variable functions: f ^ 0 = f 0 f ^ 1 = f 0 ⊕ f 1 f ^ 2 = f 0 ⊕ f 2 f ^ 3 = f 0 ⊕ f 1 ⊕ f 2 ⊕ f 3 {\displaystyle {\begin{array}{lcl}{\hat {f}}_{0}&=&f_{0}\\{\hat {f}}_{1}&=&f_{0}\oplus f_{1}\\{\hat {f}}_{2}&=&f_{0}\oplus f_{2}\\{\hat {f}}_{3}&=&f_{0}\oplus f_{1}\oplus f_{2}\oplus f_{3}\end{array}}} So the three variable majority function with value vector (0, 0, 1, 1) has ANF vector (0, 0, 1, 0), i.e.: Maj ( x , y , z ) = x y ⊕ x z ⊕ y z {\displaystyle {\text{Maj}}(x,y,z)=xy\oplus xz\oplus yz} === Unit hypercube polynomial === The coefficients of the real polynomial agreeing with the function on { 0 , 1 } n {\displaystyle \{0,1\}^{n}} are given by: f m ∗ = ∑ k = 0 m ( − 1 ) | k | + | m | ( m k ) f k {\displaystyle f_{m}^{}=\sum _{k=0}^{m}(-1)^{|k|+|m|}{\binom {m}{k}}f_{k}} For example, the three variable majority function polynomial has coefficients (0, 0, 1, -2): Maj ( x , y , z ) = ( x y + x z + y z ) − 2 ( x y z ) {\displaystyle {\text{Maj}}(x,y,z)=(xy+xz+yz)-2(xyz)} == Examples ==

    Read more →
  • Talim (textiles)

    Talim (textiles)

    Talim (Kashmiri: تعليم, Kashmiri pronunciation: [t̪əːliːm], Urdu: تَعْلِیم, Arabic: تعليم, pronounced [taʕ.liːm] ) in textiles is a symbolic code and system of notation that facilitates the creation of intricate patterns in fabrics, such as shawls and carpets, and the written coded plans that include colour schemes and weaving instructions. The term is used in traditional hand-weaving in the Indian subcontinent. Talim was initially used to create certain types of patterns in Kashmiri shawls, and later came to be applied in the production of carpets. == Etymology and origin == The term talim, which refers to a symbolic code and system of notation used by shawl and carpet artisans in their weaving processes, came to the Urdu language from the Arabic noun taʻlim (تعليم), which means "authoritative instruction", "teaching", or "edification". It means the same in Urdu and Kashmiri. The Arabic noun originated from the second form of the Arabic root verb ʻalima (علم), which means "to know". According to a local belief in Kashmir, talim was introduced to them by Persian scholar and Sufi Muslim saint Mir Sayyid Ali Hamadani. The belief notwithstanding, talim might have originated from Kashmir; Amritsar was the only place outside of Srinagar where talim was used, by migrated Kashmiri artisans. == Technique == Whereas carpets are generally woven horizontally, providing weavers with a clear view of the progress they are making in creating designs, in Kashmir, carpets are woven vertically, so the weaver is reliant on the talim. The talim technique forms fabrics by passing the weft thread as per a given script that has design codes. Weavers use talim to weave the desired pattern with planned colours. Talim involves teamwork when applying the technique, as the process of creating intricate fabric designs in weaving begins with the Naqash (designer, who designs using pencils on graphs) meticulously crafting the design on a blank sheet of paper called a naska, and the master, Talim guru, making the colour codes and symbols for weft yarns that would interlace the warp to construct the desired design. He writes on a long strip of paper, in specific symbols, the colour codes, and the number of knots to be woven with each colour. Taraha guru collaborates with talim guru and is known as the artisan responsible for determining the colours. Talim uthana is a process or the act of "picking the codes" from the graph. A clerk called the Talim Navis would record the step-by-step instructions for these numbers and colours, and thousands of low-paid and interchangeable weavers would read or recite the record to carry out the design. Afterward, a talim copyist makes copies, which are needed when multiple looms weave the same product. The script, which has been encoded, is deciphered and translated according to the specific guidelines of weavers in order to incorporate the design that is included within it. Talim has been compared to "hieroglyphics" or as a "notational-cum-cryptographic system", as it is challenging to decipher and is unique to the shawls of Kashmir, which requires expertise to comprehend. According to researcher Gagan Deep Kaur, "The talim is widely held to be a trade secret of the community and has always been fiercely guarded by the owners." Those who use talim for shawl-making do not assign important tasks to women, because of the fear that the technique and knowledge may be divulged to other communities when the women are sent there to be married. The coded cards known as talim in the Kashmiri language were used for creating certain types of patterns in shawl weaving. The talim technique is employed in the creation of kani shawls, which originated from the Kanihama region of the Kashmir valley. Carpet weaving adapted the technique from shawl making. When Kashmiri artisans started to create carpets, they chose to continue using the talim rather than switching to a different method. The resurgence of the carpet industry in Amritsar during the last century resulted in the prevalent use of the talim technique among the local weavers, a majority of whom hailed from the region of Kashmir. === Recitation of codes === Talim was also communicated through recitation accompanied by a melodic chant or song. In traditional weaving practices, the use of chanting was common. The movement of the shuttles was synchronised with the song of the weaver, adding a musical rhythm to the instructions represented through hieroglyphics. The weaver's chant, "Two blue, one red, three yellow, two blue," served as a guide as they wove and replicated the designated pattern. == Usage == The first factories established in Amritsar around 1860 utilised Bokhara designs. However, Kashmiri weavers maintained their traditional techniques and employed the talim, instead of a cartoon, for tying knots. As a result, Amritsar became the second location in the Indian subcontinent to use the talim. The traditional weaving practices are still carried out in some parts of the Indian subcontinent. The exact date when talim was last used in the subcontinent varies depending on the region and the specific weaving community. Indian textile historian Jasleen Dhamija wrote in her 1989 book Handwoven Fabrics of India that there were still some weavers in the Kashmiri village of Kanihama who applied talim in weaving shawls. As of 2022, the carpet weavers in Kashmir were the only remaining users of talim in carpets, according to Zubair Ahmed, director of the Indian Institute of Carpet Technology. The institute aims to preserve traditional Kashmiri carpet designs by digitising talim and training weavers in the technique. == Gallery ==

    Read more →
  • Data exploration

    Data exploration

    Data exploration is an approach similar to initial data analysis, whereby a data analyst uses visual exploration to understand what is in a dataset and the characteristics of the data, rather than through traditional data management systems. These characteristics can include size or amount of data, completeness of the data, correctness of the data, possible relationships amongst data elements or files/tables in the data. Data exploration is typically conducted using a combination of automated and manual activities. Automated activities can include data profiling or data visualization or tabular reports to give the analyst an initial view into the data and an understanding of key characteristics. This is often followed by manual drill-down or filtering of the data to identify anomalies or patterns identified through the automated actions. Data exploration can also require manual scripting and queries into the data (e.g. using languages such as SQL or R) or using spreadsheets or similar tools to view the raw data. All of these activities are aimed at creating a mental model and understanding of the data in the mind of the analyst, and defining basic metadata (statistics, structure, relationships) for the data set that can be used in further analysis. Once this initial understanding of the data is had, the data can be pruned or refined by removing unusable parts of the data (data cleansing), correcting poorly formatted elements and defining relevant relationships across datasets. This process is also known as determining data quality. Data exploration can also refer to the ad hoc querying or visualization of data to identify potential relationships or insights that may be hidden in the data and does not require to formulate assumptions beforehand. Traditionally, this had been a key area of focus for statisticians, with John Tukey being a key evangelist in the field. Today, data exploration is more widespread and is the focus of data analysts and data scientists; the latter being a relatively new role within enterprises and larger organizations. == Interactive Data Exploration == This area of data exploration has become an area of interest in the field of machine learning. This is a relatively new field and is still evolving. As its most basic level, a machine-learning algorithm can be fed a data set and can be used to identify whether a hypothesis is true based on the dataset. Common machine learning algorithms can focus on identifying specific patterns in the data. Many common patterns include regression and classification or clustering, but there are many possible patterns and algorithms that can be applied to data via machine learning. By employing machine learning, it is possible to find patterns or relationships in the data that would be difficult or impossible to find via manual inspection, trial and error or traditional exploration techniques. == Software == Trifacta – a data preparation and analysis platform Paxata – self-service data preparation software Alteryx – data blending and advanced data analytics software Microsoft Power BI - interactive visualization and data analysis tool OpenRefine - a standalone open source desktop application for data clean-up and data transformation Tableau software – interactive data visualization software

    Read more →
  • Ciphertext

    Ciphertext

    In cryptography, ciphertext or cyphertext is the result of encryption performed on plaintext using an algorithm, called a cipher. Ciphertext is also known as encrypted or encoded information because it contains a form of the original plaintext that is unreadable by a human or computer without the proper cipher to decrypt it. This process prevents the loss of sensitive information via hacking. Decryption, the inverse of encryption, is the process of turning ciphertext into readable plaintext. Ciphertext is not to be confused with codetext, because the latter is a result of a code, not a cipher. == Conceptual underpinnings == Let m {\displaystyle m\!} be the plaintext message that Alice wants to secretly transmit to Bob and let E k {\displaystyle E_{k}\!} be the encryption cipher, where k {\displaystyle _{k}\!} is a cryptographic key. Alice must first transform the plaintext into ciphertext, c {\displaystyle c\!} , in order to securely send the message to Bob, as follows: c = E k ( m ) . {\displaystyle c=E_{k}(m).\!} In a symmetric-key system, Bob knows Alice's encryption key. Once the message is encrypted, Alice can safely transmit it to Bob (assuming no one else knows the key). In order to read Alice's message, Bob must decrypt the ciphertext using E k − 1 {\displaystyle {E_{k}}^{-1}\!} which is known as the decryption cipher, D k : {\displaystyle D_{k}:\!} D k ( c ) = D k ( E k ( m ) ) = m . {\displaystyle D_{k}(c)=D_{k}(E_{k}(m))=m.\!} Alternatively, in a non-symmetric key system, everyone, not just Alice and Bob, knows the encryption key; but the decryption key cannot be inferred from the encryption key. Only Bob knows the decryption key D k , {\displaystyle D_{k},} and decryption proceeds as D k ( c ) = m . {\displaystyle D_{k}(c)=m.} == Types of ciphers == The history of cryptography began thousands of years ago. Cryptography uses a variety of different types of encryption. Earlier algorithms were performed by hand and are substantially different from modern algorithms, which are generally executed by a machine. === Historical ciphers === Historical pen and paper ciphers used in the past are sometimes known as classical ciphers. They include: Substitution cipher: the units of plaintext are replaced with ciphertext (e.g., Caesar cipher and one-time pad) Polyalphabetic substitution cipher: a substitution cipher using multiple substitution alphabets (e.g., Vigenère cipher and Enigma machine) Polygraphic substitution cipher: the unit of substitution is a sequence of two or more letters rather than just one (e.g., Playfair cipher) Transposition cipher: the ciphertext is a permutation of the plaintext (e.g., rail fence cipher) Historical ciphers are not generally used as a standalone encryption technique because they are quite easy to crack. Many of the classical ciphers, with the exception of the one-time pad, can be cracked using brute force. === Modern ciphers === Modern ciphers are more secure than classical ciphers and are designed to withstand a wide range of attacks. An attacker should not be able to find the key used in a modern cipher, even if they know any specifics about the plaintext and its corresponding ciphertext. Modern encryption methods can be divided into the following categories: Private-key cryptography (symmetric key algorithm): one shared key is used for encryption and decryption Public-key cryptography (asymmetric key algorithm): two different keys are used for encryption and decryption In a symmetric key algorithm (e.g., DES, AES), the sender and receiver have a shared key established in advance: the sender uses the shared key to perform encryption; the receiver uses the shared key to perform decryption. Symmetric key algorithms can either be block ciphers or stream ciphers. Block ciphers operate on fixed-length groups of bits, called blocks, with an unvarying transformation. Stream ciphers encrypt plaintext digits one at a time on a continuous stream of data, with the transformation of successive digits varying during the encryption process. In an asymmetric key algorithm (e.g., RSA), there are two different keys: a public key and a private key. The public key is published, thereby allowing any sender to perform encryption. The private key is kept secret by the receiver, thereby allowing only the receiver to correctly perform decryption. == Cryptanalysis == Cryptanalysis (also referred to as codebreaking or cracking the code) is the study of applying various methodologies to obtain the meaning of encrypted information, without having access to the cipher required to correctly decrypt the information. This typically involves gaining an understanding of the system design and determining the cipher. Cryptanalysts can follow one or more attack models to crack a cipher, depending upon what information is available and the type of cipher being analyzed. Ciphertext is generally the most easily obtained part of a cryptosystem and therefore is an important part of cryptanalysis. === Attack models === Ciphertext-only: the cryptanalyst has access only to a collection of ciphertexts or code texts. This is the weakest attack model because the cryptanalyst has limited information. Modern ciphers rarely fail under this attack. Known-plaintext: the attacker has a set of ciphertexts to which they know the corresponding plaintext Chosen-plaintext attack: the attacker can obtain the ciphertexts corresponding to an arbitrary set of plaintexts of their own choosing Batch chosen-plaintext attack: where the cryptanalyst chooses all plaintexts before any of them are encrypted. This is often the meaning of an unqualified use of "chosen-plaintext attack". Adaptive chosen-plaintext attack: where the cryptanalyst makes a series of interactive queries, choosing subsequent plaintexts based on the information from the previous encryptions. Chosen-ciphertext attack: the attacker can obtain the plaintexts corresponding to an arbitrary set of ciphertexts of their own choosing Adaptive chosen-ciphertext attack Indifferent chosen-ciphertext attack Related-key attack: similar to a chosen-plaintext attack, except the attacker can obtain ciphertexts encrypted under two different keys. The keys are unknown, but the relationship between them is known (e.g., two keys that differ in the one bit). == Famous ciphertexts == The Babington Plot ciphers The Shugborough inscription The Zimmermann Telegram The Magic Words are Squeamish Ossifrage The cryptogram in "The Gold-Bug" Beale ciphers Kryptos Zodiac Killer ciphers

    Read more →
  • Data analysis

    Data analysis

    Data analysis is the process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today's business world, data analysis plays an important role in making decisions more scientific and helping businesses operate more effectively. It is widely used in fields such as business analytics, healthcare, and artificial intelligence to extract meaningful insights from data. Data mining is a particular data analysis technique that focuses on statistical modeling and knowledge discovery for predictive rather than purely descriptive purposes, while business intelligence covers data analysis that relies heavily on aggregation, focusing mainly on business information. In statistical applications, data analysis can be divided into descriptive statistics, exploratory data analysis (EDA), and confirmatory data analysis (CDA). EDA focuses on discovering new features in the data, while CDA focuses on confirming or falsifying existing hypotheses. Predictive analytics focuses on the application of statistical models for predictive forecasting or classification, while text analytics applies statistical, linguistic, and structural techniques to extract and classify information from textual sources, a variety of unstructured data. All of the above are varieties of data analysis. == Data analysis process == Data analysis is a process for obtaining raw data, and subsequently converting it into information useful for decision-making by users. Statistician John Tukey, defined data analysis in 1961, as:"Procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data." There are several phases, and they are iterative, in that feedback from later phases may result in additional work in earlier phases. === Data requirements === The data is necessary as inputs to the analysis, which is specified based upon the requirements of those directing the analytics (or customers, who will use the finished product of the analysis). The general type of entity upon which the data will be collected is referred to as an experimental unit (e.g., a person or population of people). Specific variables regarding a population (e.g., age and income) may be specified and obtained. Data may be numerical or categorical (i.e., a text label for numbers). === Data collection === Data may be collected from a variety of sources. A list of data sources are available for study & research. The requirements may be communicated by analysts to custodians of the data; such as, Information Technology personnel within an organization. Data collection or data gathering is the process of gathering and measuring information on targeted variables in an established system, which then enables one to answer relevant questions and evaluate outcomes. The data may also be collected from sensors in the environment, including traffic cameras, satellites, recording devices, etc. It may also be obtained through interviews, downloads from online sources, or reading documentation. === Data processing === Data integration is a precursor to data analysis: Data, when initially obtained, must be processed or organized for analysis. For instance, this may involve placing data into rows and columns in a table format (known as structured data) for further analysis, often through the use of spreadsheet (e.g. Excel) or statistical software. === Data cleaning === Once processed and organized, the data may be incomplete, contain duplicates, or contain errors. The need for data cleaning will arise from problems in the way that the data is entered and stored. Data cleaning is the process of preventing and correcting these errors. Common tasks include record matching, identifying inaccuracy of data, overall quality of existing data, deduplication, and column segmentation. Such data problems can also be identified through a variety of analytical techniques. For example; with financial information, the totals for particular variables may be compared against separately published numbers that are believed to be reliable. Unusual amounts, above or below predetermined thresholds, may also be reviewed. There are several types of data cleaning that are dependent upon the type of data in the set; this could be phone numbers, email addresses, employers, or other values. Quantitative data methods for outlier detection can be used to get rid of data that appears to have a higher likelihood of being input incorrectly. Text data spell checkers can be used to lessen the amount of mistyped words. However, it is harder to tell if the words are contextually (i.e., semantically and idiomatically) correct. === Exploratory data analysis === Once the datasets are cleaned, they can then begin to be analyzed using exploratory data analysis. The process of data exploration may result in additional data cleaning or additional requests for data; thus, the initialization of the iterative phases mentioned above. Descriptive statistics, such as the average, median, and standard deviation, are often used to broadly characterize the data. Data visualization is also used, in which the analyst is able to examine the data in a graphical format in order to obtain additional insights about messages within the data. === Modeling and algorithms === Mathematical formulas or mathematical models (supported by algorithms) may be applied to the data in order to identify relationships among the variables; for example, checking for correlation and by determining whether or not there is the presence of causality. In general terms, models may be developed to evaluate a specific variable based on other variable(s) contained within the dataset, with some residual error depending on the implemented model's accuracy (e.g., Data = Model + Error). Inferential statistics utilizes techniques that measure the relationships between particular variables. For example, regression analysis may be used to model whether a change in advertising (independent variable X), provides an explanation for the variation in sales (dependent variable Y), i.e. is Y a function of X? This can be described as (Y = aX + b + error), where the model is designed such that (a) and (b) minimize the error when the model predicts Y for a given range of values of X. === Data product === A data product is a computer application that takes data inputs and generates outputs, feeding them back into the environment. It may be based on a model or algorithm. For instance, an application that analyzes data about customer purchase history, and uses the results to recommend other purchases the customer might enjoy. === Communication === Once data is analyzed, it may be presented in many formats to the users of the analysis to support their requirements. The users may have feedback, which results in additional analysis. When determining how to communicate the results, the analyst may consider implementing a variety of data visualization techniques to help communicate the message more clearly and efficiently to the audience. Data visualization uses information displays (graphics such as, tables and charts) to help communicate key messages contained in the data. Tables are a valuable tool by enabling the ability of a user to query and focus on specific numbers; while charts (e.g., bar charts or line charts), may help explain the quantitative messages contained in the data. == Quantitative messages == Stephen Few described eight types of quantitative messages that users may attempt to communicate from a set of data, including the associated graphs. Time-series: A single variable is captured over a period of time, such as the unemployment rate over a 10-year period. A line chart may be used to demonstrate the trend. Ranking: Categorical subdivisions are ranked in ascending or descending order, such as a ranking of sales performance (the measure) by salespersons (the category, with each salesperson a categorical subdivision) during a single period. A bar chart may be used to show the comparison across the salespersons. Part-to-whole: Categorical subdivisions are measured as a ratio to the whole (i.e., a percentage out of 100%). A pie chart or bar chart can show the comparison of ratios, such as the market share represented by competitors in a market. Deviation: Categorical subdivisions are compared against a reference, such as a comparison of actual vs. budget expenses for several departments of a business for a given time period. A bar chart can show the comparison of the actual versus the reference amount. Frequency distribution:

    Read more →
  • Plaintext

    Plaintext

    In cryptography, plaintext usually means unencrypted information pending input into cryptographic algorithms, usually encryption algorithms. This usually refers to data that is transmitted or stored unencrypted. == Overview == With the advent of computing, the term plaintext expanded beyond human-readable documents to mean any data, including binary files, in a form that can be viewed or used without requiring a key or other decryption device. Information—a message, document, file, etc.—if to be communicated or stored in an unencrypted form is referred to as plaintext. Plaintext is used as input to an encryption algorithm; the output is usually termed ciphertext, particularly when the algorithm is a cipher. Codetext is less often used, and almost always only when the algorithm involved is actually a code. Some systems use multiple layers of encryption, with the output of one encryption algorithm becoming "plaintext" input for the next. == Secure handling == Insecure handling of plaintext can introduce weaknesses into a cryptosystem by letting an attacker bypass the cryptography altogether. Plaintext is vulnerable in use and in storage, whether in electronic or paper format. Physical security means the securing of information and its storage media from physical, attack—for instance by someone entering a building to access papers, storage media, or computers. Discarded material, if not disposed of securely, may be a security risk. Even shredded documents and erased magnetic media might be reconstructed with sufficient effort. If plaintext is stored in a computer file, the storage media, the computer and its components, and all backups must be secure. Sensitive data is sometimes processed on computers whose mass storage is removable, in which case physical security of the removed disk is vital. In the case of securing a computer, useful (as opposed to handwaving) security must be physical (e.g., against burglary, brazen removal under cover of supposed repair, installation of covert monitoring devices, etc.), as well as virtual (e.g., operating system modification, illicit network access, Trojan programs). Wide availability of keydrives, which can plug into most modern computers and store large quantities of data, poses another severe security headache. A spy (perhaps posing as a cleaning person) could easily conceal one, and even swallow it if necessary. Discarded computers, disk drives and media are also a potential source of plaintexts. Most operating systems do not actually erase anything— they simply mark the disk space occupied by a deleted file as 'available for use', and remove its entry from the file system directory. The information in a file deleted in this way remains fully present until overwritten at some later time when the operating system reuses the disk space. With even low-end computers commonly sold with many gigabytes of disk space and rising monthly, this 'later time' may be months later, or never. Even overwriting the portion of a disk surface occupied by a deleted file is insufficient in many cases. Peter Gutmann of the University of Auckland wrote a celebrated 1996 paper on the recovery of overwritten information from magnetic disks; areal storage densities have gotten much higher since then, so this sort of recovery is likely to be more difficult than it was when Gutmann wrote. Modern hard drives automatically remap failing sectors, moving data to good sectors. This process makes information on those failing, excluded sectors invisible to the file system and normal applications. Special software, however, can still extract information from them. Some government agencies (e.g., US NSA) require that personnel physically pulverize discarded disk drives and, in some cases, treat them with chemical corrosives. This practice is not widespread outside government, however. Garfinkel and Shelat (2003) analyzed 158 second-hand hard drives they acquired at garage sales and the like, and found that less than 10% had been sufficiently sanitized. The others contained a wide variety of readable personal and confidential information. See data remanence. Physical loss is a serious problem. The US State Department, Department of Defense, and the British Secret Service have all had laptops with secret information, including in plaintext, lost or stolen. Appropriate disk encryption techniques can safeguard data on misappropriated computers or media. On occasion, even when data on host systems is encrypted, media that personnel use to transfer data between systems is plaintext because of poorly designed data policy. For example, in October 2007, HM Revenue and Customs lost CDs that contained the unencrypted records of 25 million child benefit recipients in the United Kingdom. Modern cryptographic systems resist known plaintext or even chosen plaintext attacks, and so may not be entirely compromised when plaintext is lost or stolen. Older systems resisted the effects of plaintext data loss on security with less effective techniques—such as padding and Russian copulation to obscure information in plaintext that could be easily guessed.

    Read more →
  • AdTruth

    AdTruth

    AdTruth is a software product and the digital media division of 41st Parameter, a company headquartered in Scottsdale, Arizona, with regional offices in San Jose, California; London, England; and Munich, Germany. AdTruth allows marketers to recognize and reach target audiences across online devices. AdTruth software identifies users for targeting, tracking, performance tracking across digital media, including mobile and desktop, by analysing patterns in large numbers of advertisements served over the internet, rather than through the use of cookies. == History == AdTruth was founded in 2011 by Ori Eisen of 41st Parameter, to repurpose the company's fraud detection and prevention technology, for use within the advertising industry to accurately target intended audiences, particularly in mobile. Eisen was joined by James Lamberti in the role of vice president and general manager. In 2012 41st Parameter raised $13 million in Series D financing from Norwest Venture Partners, Kleiner Perkins Caufield & Byers, Jafco Ventures and Georgian Partners, bringing total funding to about $35 million. In May 2012, AdTruth hosted a meeting of digital media executives to discuss Apple’s UDID deprecation, with the intent of developing a device-neutral replacement standard. AdTruth joined the World Wide Web Consortium's Tracking Protection Working Group, which provides guidance for implementing and adhering to Do Not Track policies. AdTruth also worked with privacy firm Truste to create a privacy compliant Do Not Track-style mechanism for mobile. In 2013, the company Experian purchased 41st Parameter, acquiring AdTruth as part of the deal. == Product == AdTruth software helps marketers track, target and retarget consumers using more than 100 parameters, including milliseconds in differences in the internal clock setting, to recognize a particular device anonymously. AdTruth's technology uses non-UDID information to identify a wide range of devices for cookieless ad targeting. Its technology currently has about a 90 percent accuracy rate on iOS, higher on Android and desktop. AdTruth also has mobile web to app bridging capabilities as well as DeviceInsight technology, enabling marketers to identify users across mobile web and app content. 41st Parameter's patented AdTruth technology is being used by MdotM, in response to the deprecation of the UDID that included tracking and targeting capabilities. == Competitors == AdTruth's main competitor is BlueCava, which deploys a similar device-fingerprinting technology.

    Read more →
  • Honey encryption

    Honey encryption

    Honey encryption is a type of data encryption that "produces a ciphertext, which, when decrypted with an incorrect key as guessed by the attacker, presents a plausible-looking yet incorrect plaintext." == Creators == Ari Juels and Thomas Ristenpart of the University of Wisconsin, the developers of the encryption system, presented a paper on honey encryption at the 2014 Eurocrypt cryptography conference. == Method of protection == A brute-force attack involves repeated decryption with random keys; this is equivalent to picking random plaintexts from the space of all possible plaintexts with a uniform distribution. This is effective because even though the attacker is equally likely to see any given plaintext, most plaintexts are extremely unlikely to be legitimate i.e. the distribution of legitimate plaintexts is non-uniform. Honey encryption defeats such attacks by first transforming the plaintext into a space such that the distribution of legitimate plaintexts is uniform. Thus an attacker guessing keys will see legitimate-looking plaintexts frequently and random-looking plaintexts infrequently. This makes it difficult to determine when the correct key has been guessed. In effect, honey encryption "[serves] up fake data in response to every incorrect guess of the password or encryption key." The security of honey encryption relies on the fact that the probability of an attacker judging a plaintext to be legitimate can be calculated (by the encrypting party) at the time of encryption. This makes honey encryption difficult to apply in certain applications e.g. where the space of plaintexts is very large or the distribution of plaintexts is unknown. It also means that honey encryption can be vulnerable to brute-force attacks if this probability is miscalculated. For example, it is vulnerable to known-plaintext attacks: if the attacker has a crib that a plaintext must match to be legitimate, they will be able to brute-force even Honey Encrypted data if the encryption did not take the crib into account. == Example == An encrypted credit card number is susceptible to brute-force attacks because not every string of digits is equally likely. The number of digits can range from 13 to 19, though 16 is the most common. Additionally, it must have a valid IIN and the last digit must match the checksum. An attacker can also take into account the popularity of various services: an IIN from MasterCard is probably more likely than an IIN from Diners Club Carte Blanche. Honey encryption can protect against these attacks by first mapping credit card numbers to a larger space where they match their likelihood of legitimacy. Numbers with invalid IINs and checksums are not mapped at all (i.e. have probability 0 of legitimacy). Numbers from large brands like MasterCard and Visa map to large regions of this space, while less popular brands map to smaller regions, etc. An attacker brute-forcing such an encryption scheme would only see legitimate-looking credit card numbers when they brute-force, and the numbers would appear with the frequency the attacker would expect from the real world. == Application == Juels and Ristenpart aim to use honey encryption to protect data stored on password manager services. Juels stated that "password managers are a tasty target for criminals," and worries that "if criminals get a hold of a large collection of encrypted password vaults they could probably unlock many of them without too much trouble." Hristo Bojinov, CEO and founder of Anfacto, noted that "Honey Encryption could help reduce their vulnerability. But he notes that not every type of data will be easy to protect this way. … Not all authentication or encryption system yield themselves to being honeyed."

    Read more →
  • Caste census

    Caste census

    Caste census is a proposed census to be conducted in India by the Central Government of India. The proposed census was decided under the leadership of Prime Minister Narendra Modi by the cabinet committee of political affairs (CCPA) on 30 April 2025. It has been decided that a caste enumeration should be included with the forthcoming census. The exact time has not been declared yet. It is unclear that when the next census will be held. The decision of the cabinet was announced by the Central Railway Minister Ashwini Vaishnaw. It has been seen as a step that would help in drafting "equitable and targeted" policies by the present Central Government of India led by the Bhartiya Janta Party in India. The Central Home Minister Amit Shah has described the decision as a "historic decision". He has also described that the historic decision as “committed to social justice”. The leader of opposition Rahul Gandhi has welcomed the decision. He said "We have shown we can pressure govt" He has demanded a clear timeline for its completion. He has called it "The first step towards deep social reform". == Description == The caste census is a systematic recording of individuals’ caste identities during the nationwide census in the country. The Central minister Ashwini Vaishnaw expressed his view on the proposed census and said that it would "strengthen the social and economic structure of our society while the nation continues to progress”. The Caste census will happen for the first time in 100 years by the Central Government of India. It will be the part of the upcoming census in India. == History == According to Peabody, the first systematic caste-wise enumeration of households in the Indian subcontinent was conducted between 1658 and 1664 across seven districts of the then Marwar Kingdom, including Jodhpur city which was its capital. It was conducted by the then home minister Munhata Nainsi of the kingdom for the purpose of tax documentation. It was not to for classification of society or creation of social hierarchies but solving a tax related problem. During the period of the British rule in India, caste census was included in the decadal censuses to categorise the population by caste, religion and occupation. In 1871–72, the first detailed caste census was conducted by the government of British Raj in India. It was practiced between the period 1881 to 1931. The last caste census was conducted in the year 1931 in which 4,147 castes were recorded. The largest population in the whole of British India (including Pakistan and Bangladesh) was of Brahmins. The population of Brahmins was recorded more than 1.5 crores. After Brahmin community, the second place was of Jatav (Chamar)community. The population of Jatav was a little more than 1.23 crores. On the third place were Rajputs. The population of Rajputs was 81 lakhs. The Rajput caste was followed by the Kunbi caste of Maharashtra. The population of Kunbi caste was 64 lakhs and 34 thousands. The Kunbi caste was followed by Yadav (Ahir) caste. The population of Yadav (Ahir) community was 56 lakhs and 82 thousands. The Yadav (Ahir) caste was followed by Teli community. The population of Teli community was 42 lakhs and 58 thousands. The Teli community was followed by Gwala community. The population of the Gwala community was 40 lakhs. After the independence of India, the caste enumeration was stopped by the newly independent Government of India led by the prime minister Pandit Jawahar Lal Nehru in 1951. The caste enumeration was stopped to avoid reinforcing social divisions in the Indian society. But, there was an exception made for the enumeration of the Scheduled Castes (SCs) and Scheduled Tribes (STs) in the decadal censuses. Therefore, the enumeration of the Scheduled Castes and the Scheduled Tribes is being conducted in every census since 1951. In 1961, the Government of India permitted states for conducting their own surveys to compile OBC lists, but national caste census was not conducted.

    Read more →
  • BitFunnel

    BitFunnel

    BitFunnel is the search engine indexing algorithm and a set of components used in the Bing search engine, which were made open source in 2016. BitFunnel uses bit-sliced signatures instead of an inverted index in an attempt to reduce operations cost. == History == Progress on the implementation of BitFunnel was made public in early 2016, with the expectation that there would be a usable implementation later that year. In September 2016, the source code was made available via GitHub. A paper discussing the BitFunnel algorithm and implementation was released as through the Special Interest Group on Information Retrieval of the Association for Computing Machinery in 2017 and won the Best Paper Award. == Components == BitFunnel consists of three major components: BitFunnel – the text search/retrieval system itself WorkBench – a tool for preparing text for use in BitFunnel NativeJIT – a software component that takes expressions that use C data structures and transforms them into highly optimized assembly code == Algorithm == === Initial problem and solution overview === The BitFunnel paper describes the "matching problem", which occurs when an algorithm must identify documents through the usage of keywords. The goal of the problem is to identify a set of matches given a corpus to search and a query of keyword terms to match against. This problem is commonly solved through inverted indexes, where each searchable item is maintained with a map of keywords. In contrast, BitFunnel represents each searchable item through a signature. A signature is a sequence of bits which describe a Bloom filter of the searchable terms in a given searchable item. The bloom filter is constructed through hashing through several bit positions. === Theoretical implementation of bit-string signatures === The signature of a document (D) can be described as the logical-or of its term signatures: S D → = ⋃ t ∈ D S t → {\displaystyle {\overrightarrow {S_{D}}}=\bigcup _{t\in D}{\overrightarrow {S_{t}}}} Similarly, a query for a document (Q) can be defined as a union: S Q → = ⋃ t ∈ Q S t → {\displaystyle {\overrightarrow {S_{Q}}}=\bigcup _{t\in Q}{\overrightarrow {S_{t}}}} Additionally, a document D is a member of the set M' when the following condition is satisfied: S Q → ∩ S D → = S Q → {\displaystyle {\overrightarrow {S_{Q}}}\cap {\overrightarrow {S_{D}}}={\overrightarrow {S_{Q}}}} This knowledge is then combined to produce a formula where M' is identified by documents which match the query signature: M ′ = { D ∈ C ∣ S Q → ∩ S D → = S Q → } {\displaystyle M'=\left\{D\in C\mid {\overrightarrow {S_{Q}}}\cap {\overrightarrow {S_{D}}}={\overrightarrow {S_{Q}}}\right\}} These steps and their proofs are discussed in the 2017 paper. === Pseudocode for bit-string signatures === This algorithm is described in the 2017 paper. M ′ = ∅ foreach D ∈ C do if S D → ∩ S Q → = S Q → then M ′ = M ′ ∪ { D } endif endfor {\displaystyle {\begin{array}{l}M'=\emptyset \\{\texttt {foreach}}\ D\in C\ {\texttt {do}}\\\qquad {\texttt {if}}\ {\overrightarrow {S_{D}}}\cap {\overrightarrow {S_{Q}}}={\overrightarrow {S_{Q}}}\ {\texttt {then}}\\\qquad \qquad M'=M'\cup \{D\}\\\qquad {\texttt {endif}}\\{\texttt {endfor}}\end{array}}}

    Read more →
  • Super-resolution imaging

    Super-resolution imaging

    Super-resolution imaging (SR) is a class of techniques that improve the resolution of an imaging system. In optical SR the diffraction limit of systems is transcended, while in geometrical SR the resolution of digital imaging sensors is enhanced. In some radar and sonar imaging applications (e.g. magnetic resonance imaging (MRI), high-resolution computed tomography), subspace decomposition-based methods (e.g. MUSIC) and compressed sensing-based algorithms (e.g., SAMV) are employed to achieve SR over standard periodogram algorithm. Super-resolution imaging techniques are used in general image processing and in super-resolution microscopy. == Super-resolution principles == Several concepts are fundamental to super-resolution imaging: Diffraction limit: the capacity of an optical instrument to reproduce the details of an object in an image has limits that are imposed by laws of physics: the diffraction equations in the wave theory of light, or the uncertainty principle for photons in quantum mechanics. Information transfer can never be increased beyond this boundary, but packets outside the limits can be cleverly swapped for (or multiplexed with) some inside it. Super-resolution microscopy does not so much “break” as “circumvent” the diffraction limit. New procedures probing electro-magnetic disturbances at the molecular level (in the so-called near field) remain fully consistent with Maxwell's equations. Spatial frequency domain: A succinct expression of the diffraction limit is given in the spatial frequency domain. In Fourier optics light distributions are expressed as superpositions of a series of grating light patterns in a range of fringe widths - these widths represent the spatial frequencies. It is generally taught that diffraction theory stipulates an upper limit, the cut-off spatial-frequency, beyond which pattern elements fail to be transferred into the optical image, i.e., are not resolved. But in fact what is set by diffraction theory is the width of the passband, not a fixed upper limit. No laws of physics are broken when a spatial frequency band beyond the cut-off spatial frequency is swapped for one inside it: this has long been implemented in dark-field microscopy. Nor are information-theoretical rules broken when superimposing several bands, disentangling them in the received image needs assumptions of object invariance during multiple exposures, i.e., the substitution of one kind of uncertainty for another. Information: When the term super-resolution is used in techniques based on the inference of object details using a statistical treatment of the image within standard resolution limits (for example, averaging multiple exposures), it involves an exchange of one kind of information (extracting signal from noise) for another (the assumption that the target has remained invariant). Recent breakthroughs incorporate quantum-transformer hybrids into super-resolution, such as QUIET‑SR, a 2025 model that employs shifted quantum window attention within a transformer to enhance image detail while respecting diffraction and information-theory limits Similarly, frequency-integrated transformers (e.g., FIT) enrich super-resolution by explicitly combining spatial and frequency-domain information via FFT-based attention, improving reconstruction across scales Resolution and localization: True resolution involves the distinction of whether a target, e.g. a star or a spectral line, is single or double, ordinarily requiring separable peaks in the image. When a target is known to be single, its location can be determined with higher precision than the image width by finding the centroid (center of gravity) of its image light distribution. The word ultra-resolution had been proposed for this process but it did not catch on, and the high-precision localization procedure is typically referred to as super-resolution. == Techniques == === Optical or diffractive super-resolution === Substituting spatial-frequency bands: Though the bandwidth allowable by diffraction is fixed, it can be positioned anywhere in the spatial-frequency spectrum. Dark-field illumination in microscopy is an example. See also aperture synthesis. ==== Multiplexing spatial-frequency bands ==== An image is formed using the normal passband of the optical device. Then, some known light structure (for example, a set of light fringes) is superimposed on the target. The image now contains components resulting from the combination of the target and the superimposed light structure, e.g. moiré fringes, and carries information about target detail which simple unstructured illumination does not. The “superresolved” components, however, need disentangling to be revealed. For an example, see structured illumination (figure to left). ==== Multiple parameter use within traditional diffraction limit ==== If a target has no special polarization or wavelength properties, two polarization states or non-overlapping wavelength regions can be used to encode target details, one in a spatial-frequency band inside the cut-off limit the other beyond it. Both would use normal passband transmission but are then separately decoded to reconstitute target structure with extended resolution. ==== Probing near-field electromagnetic disturbance ==== Super-resolution microscopy is generally discussed within the realm of conventional optical imagery. However, modern technology allows the probing of electromagnetic disturbance within molecular distances of the source, which has superior resolution properties. See also evanescent waves and the development of the new super lens. === Geometrical or image-processing super-resolution === ==== Multi-exposure image noise reduction ==== When an image is degraded by noise, the resolution may be improved by averaging multiple exposures. See example on the right. ==== Single-frame deblurring ==== Known defects in a given imaging situation, such as defocus or aberrations, can sometimes be mitigated in whole or in part by suitable spatial-frequency filtering of even a single image. Such procedures all stay within the diffraction-mandated passband, and do not extend it. ==== Sub-pixel image localization ==== The location of a single source can be determined by computing the "center of gravity" (centroid) of the light distribution extending over several adjacent pixels (see figure on the left). Provided that there is enough light, this can be achieved with arbitrary precision, very much better than pixel width of the detecting apparatus and the resolution limit for the decision of whether the source is single or double. This technique, which requires the presupposition that all the light comes from a single source, is at the basis of what has become known as super-resolution microscopy, e.g. stochastic optical reconstruction microscopy (STORM), where fluorescent probes attached to molecules give nanoscale distance information. It is also the mechanism underlying visual hyperacuity. ==== Bayesian induction beyond traditional diffraction limit ==== Some object features, though beyond the diffraction limit, may be known to be associated with other object features that are within the limits and hence contained in the image. Then conclusions can be drawn, using statistical methods, from the available image data about the presence of the full object. The classical example is Toraldo di Francia's proposition of judging whether an image is that of a single or double star by determining whether its width exceeds the spread from a single star. This can be achieved at separations well below the classical resolution bounds, and requires the prior limitation to the choice "single or double?" The approach can take the form of extrapolating the image in the frequency domain, by assuming that the object is an analytic function, and that we can exactly know the function values in some interval. This method is severely limited by the ever-present noise in digital imaging systems, but it can work for radar, astronomy, microscopy or magnetic resonance imaging. More recently, a fast single image super-resolution algorithm based on a closed-form solution to ℓ 2 − ℓ 2 {\displaystyle \ell _{2}-\ell _{2}} problems has been proposed and demonstrated to accelerate most of the existing Bayesian super-resolution methods significantly. == Aliasing == Geometrical SR reconstruction algorithms are possible if and only if the input low resolution images have been under-sampled and therefore contain aliasing. Because of this aliasing, the high-frequency content of the desired reconstruction image is embedded in the low-frequency content of each of the observed images. Given a sufficient number of observation images, and if the set of observations vary in their phase (i.e. if the images of the scene are shifted by a sub-pixel amount), then the phase information can be used to separate the aliased high-frequency content from the true low-frequency content, and the full-resolution image can be accurate

    Read more →
  • Peñabot

    Peñabot

    Peñabot is the nickname for automated social media accounts allegedly used by the Mexican government of Enrique Peña Nieto and the PRI political party to keep unfavorable news from reaching the Mexican public. Peñabot accusations are related to the broader issue of fake news in the 21st century. == History of disinformation in Mexican politics == The PRI political party has been reported to use fake news since before Peña Nieto. The main tactic originally was to spread such propaganda through open radio and television networks. Such tactic was effective in Mexico, because newspaper readership is low and cable TV is largely limited to the middle classes; consequently, the country's two major television networks – Televisa and TV Azteca – exert a significant influence in national politics. Televisa itself, not only owns around two-thirds of the programming on Mexico's TV channels, making it not only Mexico's largest television network, but also is the largest media network in the Spanish-speaking world. == Peñabots == Analysts have given the name Peñabots to a suspected network of automated accounts on social media used by the Mexican government to spread pro-government propaganda and to marginalize dissenting opinions in social media. The bots were first noticed in the 2012 elections when they were used to disseminate opinions in support of Enrique Peña Nieto on social networks such as Twitter and Facebook. According to Aristegui Noticias, their usage went against articles 6 and 134 of the Mexican Constitution. Those used by Peña Nieto's government cost an estimated 80 million pesos monthly, which news outlets argued only helped the government spread fake support towards the president, but did not have a benefit towards Mexican people (with whom EPN was highly unpopular). Facebook held approximately 640,321 Peñabots, while Twitter had less. As of July 2017, Oxford Internet Institute's Computational Propaganda Research Project claimed many western democracies, Mexico included, perform social media manipulation, thus saying the manipulation comes directly from the Mexican government itself. During Peña Nieto's subsequent presidency, analysts noted that Peñabots were used to overpower trending topics that critiqued government, to flood trending government critical hashtags with spam, to create fake trends by pushing alternative hashtags, and to push smear campaigns and threats against government-critical activists and journalists. Peñabots were distinguished as their pattern of activity was distinct from that of ordinary interaction on social networks. === Meadebots === On Twitter it was reported that about 94% of the followers of 2018 presidential candidate from the PRI Jose Antonio Meade were bots. When Antonio Meade presented himself as a candidate for the 2018 presidential election, his social media accounts such as "@MovimientoMEADE" (created by the PRI's official account @PRI_Nacional), obtained a huge quantity of followers in a short span of time. Some users noticed and brought it to attention, and after investigation it was reported 94% of such followers were bots (702,000 out of 747,000), and the account was eliminated from Twitter after 20 hours. The fake accounts used the hashtags #YoConMeade and #Meade18. It was further revealed was that Meade's official account on Twitter, @JoseAMeadeK had 25% bots (216,000 fake followers out of the 981,000). == Manipulation of news media in Mexico, through television == The Mexican government of Peña Nieto has been accused of using various means to keep unfavorable news from reaching the Mexican people. Many Mexicans have protested this practice as it clearly goes against the freedom of speech. The PRI has been reported to use fake news since before Peña Nieto. The main tactic has been to spread such propaganda through radio and television. This tactic is perceived as effective in Mexico, because newspaper readership is low and research on the Internet and cable TV is largely limited to the middle classes; consequently, the country's two major television networks – Televisa and TV Azteca – exert a significant influence in national politics. Televisa itself, owns around two-thirds of the programming on Mexico's TV channels, making it not only Mexico's largest television network, but also is the largest media network in the Spanish-speaking world. In June 2012, before the 2012 Mexican presidential elections, the British newspaper The Guardian published a series of allegations claiming Televisa, sold favorable coverage to top politicians in its news and entertainment shows, this scandal became known as the Televisa controversy. The documents published by 'The Guardian alleged that a secretive circle within Televisa manipulated news coverage to favor PRI presidential candidate Enrique Peña Nieto, who was poised as favorite to win. Televisa's secret circle supposedly commissioned videos to promote Peña Nieto and lash out his political rivals in 2009. The Guardian documents suggest that Televisa's secret team distributed such videos through e-mail, posting them posted them on Facebook and YouTube, some can still be seen there. Another document was a PowerPoint presentation, with a slide explicitly aimed at rival leftist candidate of the Party of the Democratic Revolution (PRD), Andrés Manuel López Obrador. Supposedly given to The Guardian by a Televisa employee. The document's authenticity was never possible to confirm– however dates, names, and events largely coincide. Televisa refused to talk the documents, and denied a relationship with the PRI or its presidential candidate, saying that they had provided equal media coverage to all parties. Televisa published an article supposedly showing discrepancies in The Guardian documents and denying accusations. Mexican citizens complained about the perceived favoritism towards Enrique Peña Nieto and the PRI, protesting through the Yo Soy 132 movement which Televisa covered in detail. However, Televisa's news media coverage is perceived to have been biased, by using a media coverage tactic Mexican citizens call cortinas de humo (smoke screens). These introduce a news scandal giving extensive coverage to distract citizens from a potential conflict-of-interest or controversy that could damage the image of the politician favored by the network. An example of a perceived smoke screen would be the news media coverage of "Caso Michoacán" and "Caso Paolette" distracting all the attention from the parallel "Yo soy 132" movement. A few years later, on the day of September 11, 2016; factual evidence of Televisa's performing media manipulation emerged, when a Televisa news anchor while live-on air reading a teleprompter, mistakenly read out loud that "try that Jaime "Ël Bronco" Rodríguez Calderón (Nuevo Leon's governor) is mentioned as little as possible". Newspaper El Universal caught it on video and published it social media. Televisa didn't mention the story and declined to comment. Lack of news coverage concerning Nuevo León's Governor Jaime Rodriguez, is perceived due to him being the first elected governor to not be part of any political party (Independent Governor), and because unlike the governors from the PRI preceding him, the independent governor "El Bronco" doesn't spend money on publicity at all, preferring to communicate all news by using social media such as Twitter and Facebook. While the incident may have proven Televisa's bias, there wasn't anything to incriminate the PRI political party or Enrique Peña Nieto, though it did further suspicion of Televisa manipulating news media. In contrast, a December 2017 article of The New York Times, reported Enrique Peña Nieto spending about 2000 million dollars on publicity, during his first 5 years as president, the largest publicity budget ever spent by a Mexican President. Additionally, 68 percent of news journalists admitted to not believe to have enough freedom of speech, and award-winning news reporter Carmen Aristegui was controversially fired shortly after revealing the Mexican White House scandals. == Violence and spying towards news journalists and civil rights activists == Far for only being receiving accusations of spreading fake news, the Mexican government of EPN (Enrique Peña Nieto) has also been accused of violence towards news journalists, and of spying on them, and also towards civil right leaders and their families. During his tenure as president, Peña Nieto has been accused of failing to protect news journalists, whose deaths are speculated to be politically triggered, by politicians attempting to prevent them from covering political scandals. The New York Times published a news report on the matter titled, "In Mexico it's easy to kill a journalist", on it mentioning how during EPN's government, Mexico became one of the worst countries on which to be a journalist. The assassination of journalist Javier Valdez on May 23, 2017, received national coverage, with multiple news journalists

    Read more →
  • Communications security

    Communications security

    Communications security is the discipline of preventing unauthorized interceptors from accessing telecommunications in an intelligible form, while still delivering content to the intended recipients. In the North Atlantic Treaty Organization culture, including United States Department of Defense culture, it is often referred to by the abbreviation COMSEC. The field includes cryptographic security, transmission security, emissions security and physical security of COMSEC equipment and associated keying material. COMSEC is used to protect both classified and unclassified traffic on military communications networks, including voice, video, and data. It is used for both analog and digital applications, and both wired and wireless links. Voice over secure internet protocol VOSIP has become the de facto standard for securing voice communication, replacing the need for Secure Terminal Equipment (STE) in much of NATO, including the U.S.A. USCENTCOM moved entirely to VOSIP in 2008. == Specialties == Cryptographic security: The component of communications security that results from the provision of technically sound cryptosystems and their proper use. This includes ensuring message confidentiality and authenticity. Emission security (EMSEC): The protection resulting from all measures taken to deny unauthorized persons information of value that might be derived from communications systems and cryptographic equipment intercepts and the interception and analysis of compromising emanations from cryptographic equipment, information systems, and telecommunications systems. Transmission security (TRANSEC): The component of communications security that results from the application of measures designed to protect transmissions from interception and exploitation by means other than cryptanalysis (e.g. frequency hopping and spread spectrum). Physical security: The component of communications security that results from all physical measures necessary to safeguard classified equipment, material, and documents from access thereto or observation thereof by unauthorized persons. == Related terms == ACES – Automated Communications Engineering Software AEK – Algorithmic Encryption Key AKMS – the Army Key Management System CCI – Controlled Cryptographic Item - equipment which contains COMSEC embedded devices CT3 – Common Tier 3 DTD – Data Transfer Device ICOM – Integrated COMSEC, e.g. a radio with built in encryption KEK – Key Encryption Key KG-30 – family of COMSEC equipment KOI-18 – Tape Reader General Purpose KPK – Key production key KYK-13 – Electronic Transfer Device KYX-15 – Electronic Transfer Device LCMS – Local COMSEC Management Software OTAR – Over the Air Rekeying OWK – Over the Wire Key SKL – Simple Key Loader SOI – Signal operating instructions STE – Secure Terminal Equipment (secure phone) STU-III – (obsolete secure phone, replaced by STE) TED – Trunk Encryption Device such as the WALBURN/KG family TEK – Traffic Encryption Key TPI – Two person integrity TSEC – Telecommunications Security (sometimes referred to in error transmission security or TRANSEC) Types of COMSEC equipment: Authentication equipment Crypto equipment: Any equipment that embodies cryptographic logic or performs one or more cryptographic functions (key generation, encryption, and authentication). Crypto-ancillary equipment: Equipment designed specifically to facilitate efficient or reliable operation of crypto-equipment, without performing cryptographic functions itself. Crypto-production equipment: Equipment used to produce or load keying material == DoD Electronic Key Management System == The Electronic Key Management System (EKMS) is a United States Department of Defense (DoD) key management, COMSEC material distribution, and logistics support system. The National Security Agency (NSA) established the EKMS program to supply electronic key to COMSEC devices in securely and timely manner, and to provide COMSEC managers with an automated system capable of ordering, generation, production, distribution, storage, security accounting, and access control. The Army's platform in the four-tiered EKMS, AKMS, automates frequency management and COMSEC management operations. It eliminates paper keying material, hardcopy Signal operating instructions (SOI) and saves the time and resources required for courier distribution. It has 4 components: LCMS provides automation for the detailed accounting required for every COMSEC account, and electronic key generation and distribution capability. ACES is the frequency management portion of AKMS. ACES has been designated by the Military Communications Electronics Board as the joint standard for use by all services in development of frequency management and crypto-net planning. CT3 with DTD software is in a fielded, ruggedized hand-held device that handles, views, stores, and loads SOI, Key, and electronic protection data. DTD provides an improved net-control device to automate crypto-net control operations for communications networks employing electronically keyed COMSEC equipment. SKL is a hand-held PDA that handles, views, stores, and loads SOI, Key, and electronic protection data. == Key Management Infrastructure (KMI) Program == KMI is intended to replace the legacy Electronic Key Management System to provide a means for securely ordering, generating, producing, distributing, managing, and auditing cryptographic products (e.g., asymmetric keys, symmetric keys, manual cryptographic systems, and cryptographic applications). This system is currently being fielded by Major Commands and variants will be required for non-DoD Agencies with a COMSEC Mission.

    Read more →