AI App Kisne Banaya

AI App Kisne Banaya — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Minimum resolvable contrast

    Minimum resolvable contrast

    Minimum resolvable contrast (MRC) is a subjective measure of a visible spectrum sensor’s or camera's sensitivity and ability to resolve data. A snapshot image of a series of three bar targets of selected spatial frequencies and various contrast coatings captured by the unit under test (UUT) is used to determine the MRC of the UUT, i.e., the visible spectrum camera or sensor. A trained observer selects the smallest target resolvable at each contrast level. Typically, specialized computer software collects the inputted data of the observer and provides a graph of contrast vs. spatial frequency at a given luminance level. A first order polynomial is fitted to the data and an MRC curve of spatial frequency versus contrast is generated.

    Read more →
  • Social search

    Social search

    Social search is a behavior of retrieving and searching on a social searching engine that mainly searches user-generated content such as news, videos and images related search queries on social media like Facebook, LinkedIn, Twitter, Instagram and Flickr. It is an enhanced version of web search that combines traditional algorithms. The idea behind social search is that instead of ranking search results purely based on semantic relevance between a query and the results, a social search system also takes into account social relationships between the results and the searcher. The social relationships could be in various forms. For example, in LinkedIn people search engine, the social relationships include social connections between searcher and each result, whether or not they are in the same industries, work for the same companies, belong the same social groups, and go the same schools, etc. Social search may not be demonstrably better than algorithm-driven search. In the algorithmic ranking model that search engines used in the past, relevance of a site is determined after analyzing the text and content on the page and link structure of the document. In contrast, search results with social search highlight content that was created or touched by other users who are in the Social Graph of the person conducting a search. It is a personalized search technology with online community filtering to produce highly personalized results. Social search takes many forms, ranging from simple shared bookmarks or tagging of content with descriptive labels to more sophisticated approaches that combine human intelligence with computer algorithms. Depending on the feature-set of a particular search engine, these results may then be saved and added to community search results, further improving the relevance of results for future searches of that keyword. The principle behind social search is that human network oriented results would be more meaningful and relevant for the user, instead of computer algorithms deciding the results for specific queries. == Research and implementations == Over the years, there have been different studies, researches and some implementations of Social Search. In 2008, there were a few startup companies that focused on ranking search results according to one's social graph on social networks. Companies in the social search space include Sproose, Mahalo, Jumper 2.0, Scour, Wink, Eurekster, and Delver. Former efforts include Wikia Search. In 2008, a story on TechCrunch showed Google potentially adding in a voting mechanism to search results similar to Digg's methodology. This suggests growing interest in how social groups can influence and potentially enhance the ability of algorithms to find meaningful data for end users. There are also other services like Sentiment that turn search personal by searching within the users' social circles. In 2009, a startup project called HeyStaks (www.heystaks.com) developed a web browser plugin "HayStaks". HeyStaks applies social search through collaboration in web search as a way that leads to better search results. The main motivation for HeyStaks to work on this idea is to provide the user with features that search engines didn't provide at that time. For instance, different searches have indicated that about 70% of the time when user search for something, a friend or a coworker have found it already. Also, studies have shown that approximately, 30% of people who use online search, search for something that they have found before. The startup believe that they help avoid these kind of issues by providing a shared and rich search experience through a list of recommendations that get generated based on search results. In October 2009, Google rolled out its "Social Search"; after a time in beta, the feature was expanded to multiple languages in May 2011. Before the expansion however in 2010 Bing and Google were already taking into account re-tweets and Likes when providing search results. However, after a search deal with Twitter ended without renewal, Google began to retool its Social Search. In January 2012, Google released "Search plus Your World", a further development of Social Search. The feature, which is integrated into Google's regular search as an opt-out feature, pulls references to results from Google+ profiles. The goal was to deliver better, more relevant and personalized search results with this integration. This integration however had some problems in which Google+ still is not wildly adopted or has much usage among many users. Later on, Google was criticized by Twitter for the perceived potential impact of "Search plus Your World" upon web publishers, describing the feature's release to the public as a "bad day for the web", while Google replied that Twitter refused to allow deep search crawling by Google of Twitter's content. By Google integrating Google+, the company was encouraging users to switch to Google's social networking site in order to improve search results. One famous example occurred when Google showed a link to Mark Zuckerberg's dormant Google+ account rather than the active Facebook profile. In November 2014 these accusations started to die down because Google's Knowledge Graph started to finally show links to Facebook, Twitter, and other social media sites. In December 2008, Twitter had re-introduced their people search feature. While the interface had since changed significantly, it allows you to search either full names or usernames in a straight-forward search engine. In January 2013, Facebook announced a new search engine called Graph Search still in the beta stages. The goal was to allow users to prioritize results that were popular with their social circle over the general internet. Facebook's Graph search utilized Facebook's user generated content to target users. Although there have been different researches and studies in social search, social media networks have not vested enough interest in working with search engines. LinkedIn for example has taken steps to improve its own individual search functions in order to stray users from external search engines. Even Microsoft started working with Twitter in order to integrate some tweets into Bing's search results in November 2013. Yet Twitter has its own search engine which points out how much value their data has and why they would like to keep it in house. In the end though social search will never be truly comprehensive of the subjects that matter to people unless users opt to be completely public with their information. == Social discovery == Social discovery is the use of social preferences and personal information to predict what content will be desirable to the user. Technology is used to discover new people and sometimes new experiences shopping, meeting friends or even traveling. The discovery of new people is often in real-time, enabled by mobile apps. However, social discovery is not limited to meeting people in real-time, it also leads to sales and revenue for companies via social media. An example of retail would be the addition of social sharing with music, through the iTunes music store. There is a social component to discovering new music Social discovery is at the basis of Facebook's profitability, generating ad revenue by targeting the ads to users using the social connections to enhance the commercial appeal. == Social search engines == A social search engine in an aspect can be thought of as a search engine that provides an answer for a question from another answer by identifying a person in the answer. That can happen by retrieving a user submitted query and determining that the query is related to the question; and provides an answer, including the link to the resource, as part of search results that are responsive to the query. Few social search engines depend only on online communities. Depending on the feature-set of a particular search engine, these results may then be saved and added to community search results, further improving the relevance of results for future searches of that keyword. Social search engines are considered a part of Web 2.0 because they use the collective filtering of online communities to elevate particularly interesting or relevant content using tagging. These descriptive tags add to the meta data embedded in Web pages, theoretically improving the results for particular keywords over time. A user will generally see suggested tags for a particular search term, indicating tags that have previously been added. An implementation of a social search engine is Aardvark. Aardvark is a social search engine that is based on the "village paradigm" which is about connecting the user who has a question with friends or friends of friends whom can answer his or her question. In Aadvark, a user ask a question in different ways that mostly involves online ways such as instant messaging, email, web input or other non-online ways such as text message or voice. The Aar

    Read more →
  • Kerckhoffs's principle

    Kerckhoffs's principle

    Kerckhoffs's principle (also called Kerckhoffs's desideratum, assumption, axiom, doctrine or law) of cryptography was stated by the Dutch cryptographer Auguste Kerckhoffs in the 19th century. The principle holds that a cryptosystem should be secure, even if everything about the system, except the key, is public knowledge. This concept is widely embraced by cryptographers, in contrast to security through obscurity, which is not. Kerckhoffs's principle was phrased by the American mathematician Claude Shannon as "the enemy knows the system", i.e., "one ought to design systems under the assumption that the enemy will immediately gain full familiarity with them". In that form, it is called Shannon's maxim. Another formulation by American researcher and professor Steven M. Bellovin is: In other words—design your system assuming that your opponents know it in detail. (A former official at NSA's National Computer Security Center told me that the standard assumption there was that serial number 1 of any new device was delivered to the Kremlin.) == Origins == The invention of telegraphy radically changed military communications and increased the number of messages that needed to be protected from the enemy dramatically, leading to the development of field ciphers which had to be easy to use without large confidential codebooks prone to capture on the battlefield. It was this environment which led to the development of Kerckhoffs's requirements. Auguste Kerckhoffs was a professor of German language at Ecole des Hautes Etudes Commerciales (HEC) in Paris. In early 1883, Kerckhoffs's article, La Cryptographie Militaire, was published in two parts in the Journal of Military Science, in which he stated six design rules for military ciphers. Translated from French, they are: The system must be practically, if not mathematically, indecipherable; It should not require secrecy, and it should not be a problem if it falls into enemy hands; It must be possible to communicate and remember the key without using written notes, and correspondents must be able to change or modify it at will; It must be applicable to telegraph communications; It must be portable, and should not require several persons to handle or operate; Lastly, given the circumstances in which it is to be used, the system must be easy to use and should not be stressful to use or require its users to know and comply with a long list of rules. Some are no longer relevant given the ability of computers to perform complex encryption. The second rule, now known as Kerckhoffs's principle, is still critically important. == Explanation of the principle == Kerckhoffs viewed cryptography as a rival to, and a better alternative than, steganographic encoding, which was common in the nineteenth century for hiding the meaning of military messages. One problem with encoding schemes is that they rely on humanly-held secrets such as "dictionaries" which disclose for example, the secret meaning of words. Steganographic-like dictionaries, once revealed, permanently compromise a corresponding encoding system. Another problem is that the risk of exposure increases as the number of users holding the secrets increases. Nineteenth century cryptography, in contrast, used simple tables which provided for the transposition of alphanumeric characters, generally given row-column intersections which could be modified by keys which were generally short, numeric, and could be committed to human memory. The system was considered "indecipherable" because tables and keys do not convey meaning by themselves. Secret messages can be compromised only if a matching set of table, key, and message falls into enemy hands in a relevant time frame. Kerckhoffs viewed tactical messages as only having a few hours of relevance. Systems are not necessarily compromised, because their components (i.e. alphanumeric character tables and keys) can be easily changed. === Advantage of secret keys === Using secure cryptography is supposed to replace the difficult problem of keeping messages secure with a much more manageable one, keeping relatively small keys secure. A system that requires long-term secrecy for something as large and complex as the whole design of a cryptographic system obviously cannot achieve that goal. It only replaces one hard problem with another. However, if a system is secure even when the enemy knows everything except the key, then all that is needed is to manage keeping the keys secret. There are a large number of ways the internal details of a widely used system could be discovered. The most obvious is that someone could bribe, blackmail, or otherwise threaten staff or customers into explaining the system. In war, for example, one side will probably capture some equipment and people from the other side. Each side will also use spies to gather information. If a method involves software, someone could do memory dumps or run the software under the control of a debugger in order to understand the method. If hardware is being used, someone could buy or steal some of the hardware and build whatever programs or gadgets needed to test it. Hardware can also be dismantled so that the chip details can be examined under the microscope. === Maintaining security === A generalization some make from Kerckhoffs's principle is: "The fewer and simpler the secrets that one must keep to ensure system security, the easier it is to maintain system security." Bruce Schneier ties it in with a belief that all security systems must be designed to fail as gracefully as possible: Kerckhoffs's principle applies beyond codes and ciphers to security systems in general: every secret creates a potential failure point. Secrecy, in other words, is a prime cause of brittleness—and therefore something likely to make a system prone to catastrophic collapse. Conversely, openness provides ductility. Any security system depends crucially on keeping some things secret. However, Kerckhoffs's principle points out that the things kept secret ought to be those least costly to change if inadvertently disclosed. For example, a cryptographic algorithm may be implemented by hardware and software that is widely distributed among users. If security depends on keeping that secret, then disclosure leads to major logistic difficulties in developing, testing, and distributing implementations of a new algorithm – it is "brittle". On the other hand, if keeping the algorithm secret is not important, but only the keys used with the algorithm must be secret, then disclosure of the keys simply requires the simpler, less costly process of generating and distributing new keys. == Applications == In accordance with Kerckhoffs's principle, the majority of civilian cryptography makes use of publicly known algorithms. By contrast, ciphers used to protect classified government or military information are often kept secret (see Type 1 encryption). However, it should not be assumed that government/military ciphers must be kept secret to maintain security. It is possible that they are intended to be as cryptographically sound as public algorithms, and the decision to keep them secret is in keeping with a layered security posture. == Security through obscurity == It is moderately common for companies to keep the inner workings of a system secret. Some argue this "security by obscurity" makes the product safer and less vulnerable to attack. A counter-argument is that keeping the innards secret may improve security in the short term, but in the long run, only systems that have been published and analyzed should be trusted. Steven Bellovin and Randy Bush commented: Security Through Obscurity Considered Dangerous Hiding security vulnerabilities in algorithms, software, and/or hardware decreases the likelihood they will be repaired and increases the likelihood that they can and will be exploited. Discouraging or outlawing discussion of weaknesses and vulnerabilities is extremely dangerous and deleterious to the security of computer systems, the network, and its citizens. Open Discussion Encourages Better Security The long history of cryptography and cryptoanalysis has shown time and time again that open discussion and analysis of algorithms exposes weaknesses not thought of by the original authors, and thereby leads to better and more secure algorithms. As Kerckhoffs noted about cipher systems in 1883 [Kerc83], "Il faut qu'il n'exige pas le secret, et qu'il puisse sans inconvénient tomber entre les mains de l'ennemi." (Roughly, "the system must not require secrecy and must be able to be stolen by the enemy without causing trouble.")

    Read more →
  • Memory-hard function

    Memory-hard function

    In cryptography, a memory-hard function (MHF) is a function that costs a significant amount of memory to efficiently evaluate. It differs from a memory-bound function, which incurs cost by slowing down computation through memory latency. MHFs have found use in key stretching and proof of work as their increased memory requirements significantly reduce the computational efficiency advantage of custom hardware over general-purpose hardware compared to non-MHFs. == Introduction == MHFs are designed to consume large amounts of memory on a computer in order to reduce the effectiveness of parallel computing. In order to evaluate the function using less memory, a significant time penalty is incurred. As each MHF computation requires a large amount of memory, the number of function computations that can occur simultaneously is limited by the amount of available memory. This reduces the efficiency of specialised hardware, such as application-specific integrated circuits and graphics processing units, which utilise parallelisation, in computing a MHF for a large number of inputs, such as when brute-forcing password hashes or mining cryptocurrency. == Motivation and examples == Bitcoin's proof-of-work uses repeated evaluation of the SHA-256 function, but modern general-purpose processors, such as off-the-shelf CPUs, are inefficient when computing a fixed function many times over. Specialized hardware, such as application-specific integrated circuits (ASICs) designed for Bitcoin mining, can use 30,000 times less energy per hash than x86 CPUs whilst having much greater hash rates. This led to concerns about the centralization of mining for Bitcoin and other cryptocurrencies. Because of this inequality between miners using ASICs and miners using CPUs or off-the shelf hardware, designers of later proof-of-work systems utilised hash functions for which it was difficult to construct ASICs that could evaluate the hash function significantly faster than a CPU. As memory cost is platform-independent, MHFs have found use in cryptocurrency mining, such as for Litecoin, which uses scrypt as its hash function. They are also useful in password hashing because they significantly increase the cost of trying many possible passwords against a leaked database of hashed passwords without significantly increasing the computation time for legitimate users. == Measuring memory hardness == There are various ways to measure the memory hardness of a function. One commonly seen measure is cumulative memory complexity (CMC). In a parallel model, CMC is the sum of the memory required to compute a function over every time step of the computation. Other viable measures include integrating memory usage against time and measuring memory bandwidth consumption on a memory bus. Functions requiring high memory bandwidth are sometimes referred to as "bandwidth-hard functions". == Variants == MHFs can be categorized into two different groups based on their evaluation patterns: data-dependent memory-hard functions (dMHF) and data-independent memory-hard functions (iMHF). As opposed to iMHFs, the memory access pattern of a dMHF depends on the function input, such as the password provided to a key derivation function. Examples of dMHFs are scrypt and Argon2d, while examples of iMHFs are Argon2i and catena. Many of these MHFs have been designed to be used as password hashing functions because of their memory hardness. A notable problem with dMHFs is that they are prone to side-channel attacks such as cache timing. This has resulted in a preference for using iMHFs when hashing passwords. However, iMHFs have been mathematically proven to have weaker memory hardness properties than dMHFs.

    Read more →
  • ImageNet

    ImageNet

    The ImageNet project is a large visual database designed for use in visual object recognition software research. More than 14 million images have been hand-annotated by the project to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided. ImageNet contains more than 20,000 categories, with a typical category, such as "balloon" or "strawberry", consisting of several hundred images. The database of annotations of third-party image URLs is freely available directly from ImageNet, though the actual images are not owned by ImageNet. Since 2010, the ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), where software programs compete to correctly classify and detect objects and scenes. The challenge uses a "trimmed" list of one thousand non-overlapping classes. == History == AI researcher Fei-Fei Li began working on the idea for ImageNet in 2006. At a time when most AI research focused on models and algorithms, Li wanted to expand and improve the data available to train AI algorithms. In 2007, Li met with Princeton professor Christiane Fellbaum, one of the creators of WordNet, to discuss the project. As a result of this meeting, Li went on to build ImageNet starting from the roughly 22,000 nouns of WordNet and using many of its features. She was also inspired by a 1987 estimate that the average person recognizes roughly 30,000 different kinds of objects. As an assistant professor at Princeton, Li assembled a team of researchers to work on the ImageNet project. They used Amazon Mechanical Turk to help with the classification of images. Labeling started in July 2008 and ended in April 2010. It took 49K workers from 167 countries filtering and labeling over 160M candidate images. They had enough budget to have each of the 14 million images labelled three times. The original plan called for 10,000 images per category, for 40,000 categories at 400 million images, each verified 3 times. They found that humans can classify at most 2 images/sec. At this rate, it was estimated to take 19 human-years of labor (without rest). They presented their database for the first time as a poster at the 2009 Conference on Computer Vision and Pattern Recognition (CVPR) in Florida, titled "ImageNet: A Preview of a Large-scale Hierarchical Dataset". The poster was reused at Vision Sciences Society 2009. In 2009, Alex Berg suggested adding object localization as a task. Li approached PASCAL Visual Object Classes contest in 2009 for a collaboration. It resulted in the subsequent ImageNet Large Scale Visual Recognition Challenge starting in 2010, which has 1000 classes and object localization, as compared to PASCAL VOC which had just 20 classes and 19,737 images (in 2010). === Significance for deep learning === On 30 September 2012, a convolutional neural network (CNN) called AlexNet achieved a top-5 error of 15.3% in the ImageNet 2012 Challenge, more than 10.8 percentage points lower than that of the runner-up. Using convolutional neural networks was feasible due to the use of graphics processing units (GPUs) during training, an essential ingredient of the deep learning revolution. According to The Economist, "Suddenly people started to pay attention, not just within the AI community but across the technology industry as a whole." In 2015, AlexNet was outperformed by Microsoft's very deep CNN with over 100 layers, which won the ImageNet 2015 contest, having 3.57% error on the test set. Andrej Karpathy estimated in 2014 that with concentrated effort, he could reach 5.1% error rate, and ~10 people from his lab reached ~12-13% with less effort. It was estimated that with maximal effort, a human could reach 2.4%. == Dataset == ImageNet crowdsources its annotation process. Image-level annotations indicate the presence or absence of an object class in an image, such as "there are tigers in this image" or "there are no tigers in this image". Object-level annotations provide a bounding box around the (visible part of the) indicated object. ImageNet uses a variant of the broad WordNet schema to categorize objects, augmented with 120 categories of dog breeds to showcase fine-grained classification. In 2012, ImageNet was the world's largest academic user of Mechanical Turk. The average worker identified 50 images per minute. The original plan of the full ImageNet would have roughly 50M clean, diverse and full resolution images spread over approximately 50K synsets. This was not achieved. The summary statistics given on April 30, 2010: Total number of non-empty synsets: 21841 Total number of images: 14,197,122 Number of images with bounding box annotations: 1,034,908 Number of synsets with SIFT features: 1000 Number of images with SIFT features: 1.2 million === Categories === The categories of ImageNet were filtered from the WordNet concepts. Each concept, since it can contain multiple synonyms (for example, "kitty" and "young cat"), so each concept is called a "synonym set" or "synset". There were more than 100,000 synsets in WordNet 3.0, majority of them are nouns (80,000+). The ImageNet dataset filtered these to 21,841 synsets that are countable nouns that can be visually illustrated. Each synset in WordNet 3.0 has a "WordNet ID" (wnid), which is a concatenation of part of speech and an "offset" (a unique identifying number). Every wnid starts with "n" because ImageNet only includes nouns. For example, the wnid of synset "dog, domestic dog, Canis familiaris" is "n02084071". The categories in ImageNet fall into 9 levels, from level 1 (such as "mammal") to level 9 (such as "German shepherd"). === Image format === The images were scraped from online image search (Google, Picsearch, MSN, Yahoo, Flickr, etc) using synonyms in multiple languages. For example: German shepherd, German police dog, German shepherd dog, Alsatian, ovejero alemán, pastore tedesco, 德国牧羊犬. ImageNet consists of images in RGB format with varying resolutions. For example, in ImageNet 2012, "fish" category, the resolution ranges from 4288 x 2848 to 75 x 56. In machine learning, these are typically preprocessed into a standard constant resolution, and whitened, before further processing by neural networks. For example, in PyTorch, ImageNet images are by default normalized by dividing the pixel values so that they fall between 0 and 1, then subtracting by [0.485, 0.456, 0.406], then dividing by [0.229, 0.224, 0.225]. These are the mean and standard deviations for ImageNet, so this whitens the input data. === Labels and annotations === Each image is labelled with exactly one wnid. Dense SIFT features (raw SIFT descriptors, quantized codewords, and coordinates of each descriptor/codeword) for ImageNet-1K were available for download, designed for bag of visual words. The bounding boxes of objects were available for about 3000 popular synsets with on average 150 images in each synset. Furthermore, some images have attributes. They released 25 attributes for ~400 popular synsets: Color: black, blue, brown, gray, green, orange, pink, red, violet, white, yellow Pattern: spotted, striped Shape: long, round, rectangular, square Texture: furry, smooth, rough, shiny, metallic, vegetation, wooden, wet === ImageNet-21K === The full original dataset is referred to as ImageNet-21K. ImageNet-21k contains 14,197,122 images divided into 21,841 classes. Some papers round this up and name it ImageNet-22k. The full ImageNet-21k was released in Fall of 2011, as fall11_whole.tar. There is no official train-validation-test split for ImageNet-21k. Some classes contain only 1-10 samples, while others contain thousands. === ImageNet-1K === There are various subsets of the ImageNet dataset used in various context, sometimes referred to as "versions". One of the most highly used subsets of ImageNet is the "ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012–2017 image classification and localization dataset". This is also referred to in the research literature as ImageNet-1K or ILSVRC2017, reflecting the original ILSVRC challenge that involved 1,000 classes. ImageNet-1K contains 1,281,167 training images, 50,000 validation images and 100,000 test images. Each category in ImageNet-1K is a leaf category, meaning that there are no child nodes below it, unlike ImageNet-21K. For example, in ImageNet-21K, there are some images categorized as simply "mammal", whereas in ImageNet-1K, there are only images categorized as things like "German shepherd", since there are no child-words below "German shepherd". === Later developments === In the WordNet they built ImageNet on, there were 2832 synsets in the "person" subtree. During 2018--2020 period, they removed the download of the ImageNet-21k as they went through extensive filtering in these person synsets. Out of these 2832 synsets, 1593 were deemed "potentially offensive". Out of the remaining 1239, 1081 were deemed not really "visual". The result was that only 158 syn

    Read more →
  • Media contacts database

    Media contacts database

    In public relations (PR) and marketing, a media contacts database is a resource which catalogs the names, contact information, and other details about people who work in various media professions. These include journalists, reporters, editors, publishers, contributors, freelance journalists, opinion writers, social media personalities/ influencers, TV show anchors, radio show hosts, DJs, and others. A media contacts database usually contains the following information: Full name of the media contact, The publication or channel they work for Designations (past and present) Topics they cover, or their beat Contact information found in public domains Online presence like blogs and other social networking sites Education Information == Overview == A media contacts database is a public relations tool that is maintained and used by PR professionals to pitch stories on a particular topic, product, or company to a specific group of journalists. These journalists would then write or speak about the particular topic in a relevant issue or episode of their shows. A media contacts database allows a PR professional to gain easy access to hundreds of journalists within a short span of time. Media contacts database are created and sold by many media research companies that offer such PR software for professionals.

    Read more →
  • Trusted Computing

    Trusted Computing

    Trusted Computing (TC) is a technology developed and promoted by the Trusted Computing Group. The term is taken from the field of trusted systems and has a specialized meaning that is distinct from the field of confidential computing. With Trusted Computing, the computer will consistently behave in expected ways, and those behaviors will be enforced by computer hardware and software. Enforcing this behavior is achieved by loading the hardware with a unique encryption key that is inaccessible to the rest of the system and the owner. TC is controversial as the hardware is not only secured for its owner, but also against its owner, leading opponents of the technology like free software activist Richard Stallman to deride it as "treacherous computing", and certain scholarly articles to use scare quotes when referring to the technology. Trusted Computing proponents such as International Data Corporation, the Enterprise Strategy Group and Endpoint Technologies Associates state that the technology will make computers safer, less prone to viruses and malware, and thus more reliable from an end-user perspective. They also state that Trusted Computing will allow computers and servers to offer improved computer security over that which is currently available. Opponents often state that this technology will be used primarily to enforce digital rights management policies (imposed restrictions to the owner) and not to increase computer security. Chip manufacturers Intel and AMD, hardware manufacturers such as HP and Dell, and operating system providers such as Microsoft include Trusted Computing in their products if enabled. The U.S. Army requires that every new PC it purchases comes with a Trusted Platform Module (TPM). As of July 3, 2007, so does virtually the entire United States Department of Defense. == Key concepts == Trusted Computing encompasses six key technology concepts, of which all are required for a fully Trusted system, that is, a system compliant to the TCG specifications: Endorsement key Secure input and output Memory curtaining / protected execution Sealed storage Remote attestation Trusted Third Party (TTP) === Endorsement key === The endorsement key is a 2048-bit RSA public and private key pair that is created randomly on the chip at manufacture time and cannot be changed. The private key never leaves the chip, while the public key is used for attestation and for encryption of sensitive data sent to the chip, as occurs during the TPM_TakeOwnership command. This key is used to allow the execution of secure transactions: every Trusted Platform Module (TPM) is required to be able to sign a random number (in order to allow the owner to show that he has a genuine trusted computer), using a particular protocol created by the Trusted Computing Group (the direct anonymous attestation protocol) in order to ensure its compliance of the TCG standard and to prove its identity; this makes it impossible for a software TPM emulator with an untrusted endorsement key (for example, a self-generated one) to start a secure transaction with a trusted entity. The TPM should be designed to make the extraction of this key by hardware analysis hard, but tamper resistance is not a strong requirement. === Memory curtaining === Memory curtaining extends common memory protection techniques to provide full isolation of sensitive areas of memory—for example, locations containing cryptographic keys. Even the operating system does not have full access to curtained memory. The exact implementation details are vendor specific. === Sealed storage === Sealed storage protects private information by binding it to platform configuration information including the software and hardware being used. This means the data can be released only to a particular combination of software and hardware. Sealed storage can be used for DRM enforcing. For example, users who keep a song on their computer that has not been licensed to be listened will not be able to play it. Currently, a user can locate the song, listen to it, and send it to someone else, play it in the software of their choice, or back it up (and in some cases, use circumvention software to decrypt it). Alternatively, the user may use software to modify the operating system's DRM routines to have it leak the song data once, say, a temporary license was acquired. Using sealed storage, the song is securely encrypted using a key bound to the trusted platform module so that only the unmodified and untampered music player on his or her computer can play it. In this DRM architecture, this might also prevent people from listening to the song after buying a new computer, or upgrading parts of their current one, except after explicit permission of the vendor of the song. === Remote attestation === Remote attestation allows changes to the user's computer to be detected by authorized parties. For example, software companies can identify unauthorized changes to software, including users modifying their software to circumvent commercial digital rights restrictions. It works by having the hardware generate a certificate stating what software is currently running. The computer can then present this certificate to a remote party to show that unaltered software is currently executing. Numerous remote attestation schemes have been proposed for various computer architectures, including Intel, RISC-V, and ARM. Remote attestation is usually combined with public-key encryption so that the information sent can only be read by the programs that requested the attestation, and not by an eavesdropper. To take the song example again, the user's music player software could send the song to other machines, but only if they could attest that they were running an authorized copy of the music player software. Combined with the other technologies, this provides a more restricted path for the music: encrypted I/O prevents the user from recording it as it is transmitted to the audio subsystem, memory locking prevents it from being dumped to regular disk files as it is being worked on, sealed storage curtails unauthorized access to it when saved to the hard drive, and remote attestation prevents unauthorized software from accessing the song even when it is used on other computers. To preserve the privacy of attestation responders, Direct Anonymous Attestation has been proposed as a solution, which uses a group signature scheme to prevent revealing the identity of individual signers. Proof of space (PoS) have been proposed to be used for malware detection, by determining whether the L1 cache of a processor is empty (e.g., has enough space to evaluate the PoSpace routine without cache misses) or contains a routine that resisted being evicted. === Trusted third party === == Known applications == The Microsoft products Windows Vista, Windows 7, Windows 8 and Windows RT make use of a Trusted Platform Module to facilitate BitLocker Drive Encryption. Other known applications with runtime encryption and the use of secure enclaves include the Signal messenger and the e-prescription service ("E-Rezept") by the German government. == Possible applications == === Digital rights management === Trusted Computing would allow companies to create a digital rights management (DRM) system which would be very hard to circumvent, though not impossible. An example is downloading a music file. Sealed storage could be used to prevent the user from opening the file with an unauthorized player or computer. Remote attestation could be used to authorize play only by music players that enforce the record company's rules. The music would be played from curtained memory, which would prevent the user from making an unrestricted copy of the file while it is playing, and secure I/O would prevent capturing what is being sent to the sound system. Circumventing such a system would require either manipulation of the computer's hardware, capturing the analogue (and thus degraded) signal using a recording device or a microphone, or breaking the security of the system. New business models for use of software (services) over Internet may be boosted by the technology. By strengthening the DRM system, one could base a business model on renting programs for a specific time periods or "pay as you go" models. For instance, one could download a music file which could only be played a certain number of times before it becomes unusable, or the music file could be used only within a certain time period. === Preventing cheating in online games === Trusted Computing could be used to combat cheating in online games. Some players modify their game copy in order to gain unfair advantages in the game; remote attestation, secure I/O and memory curtaining could be used to determine that all players connected to a server were running an unmodified copy of the software. === Verification of remote computation for grid computing === Trusted Computing could be used to guarantee participants in a grid computing sys

    Read more →
  • Pepper (cryptography)

    Pepper (cryptography)

    In cryptography, a pepper is a secret added to an input such as a password during hashing with a cryptographic hash function. This value differs from a salt in that it is not stored alongside a password hash, but rather the pepper is kept separate using another meachanism, such as a Hardware Security Module. Note that the National Institute of Standards and Technology refers to this value as a secret key rather than a pepper. A pepper is similar in concept to a salt or an encryption key. It is like a salt in that it is a randomized value that is added to a password hash, and it is similar to an encryption key in that it should be kept secret. A pepper performs a comparable role to a salt or an encryption key, but while a salt is not secret (merely unique) and can be stored alongside the hashed output, a pepper is secret and must not be stored with the output. The hash and salt are usually stored in a database, but, if stored, a pepper must be stored separately to prevent it from being obtained by the attacker in case of a database breach. == History == The idea of a site- or service-specific salt (in addition to a per-user salt) has a long history, with Steven M. Bellovin proposing a local parameter in a Bugtraq post in 1995. In 1996 Udi Manber also described the advantages of such a scheme, terming it a secret salt. However, he suggested not storing the value of the secret salt, but instead rediscovering it by trial and error at password verification time. The term pepper has been used, by analogy to salt, but with a variety of meanings. For example, when discussing a challenge-response scheme, pepper has been used for a salt-like quantity, though not used for password storage; it has been used for a data transmission technique where a pepper must be guessed; and even as a part of jokes. The term pepper was proposed for a secret or local parameter stored separately from the password in a discussion of protecting passwords from rainbow table attacks. This usage did not immediately catch on: for example, Fred Wenzel added support to Django password hashing for storage based on a combination of bcrypt and HMAC with separately stored nonces, without using the term. Usage has since become more common. == Types == There are multiple different types of pepper: A shared secret that is common to all users. A randomly-selected number that must be re-discovered on every password input. These mechanisms could be combined with password salting, iterated hashing or even one another. == Shared-secret pepper == Bellovin and Webster suggest prepend a shared secret to the password before hashing, which allows easy use of existing hash functions. For example, consider two users to be added to a database. This table contains two combinations of username and password. The password is not saved, and the 8-byte (64-bit) 44534C70C6883DE2 pepper is saved in a safe place separate from the output values of the hash, in this case SHA256. Unlike the salt, the pepper does not provide protection to users who use the same password, but protects against dictionary attacks, unless the attacker has the pepper value available. Since the same pepper is not shared between different applications, an attacker is unable to reuse the hashes of one compromised database to another. A complete scheme for saving passwords may include both salt and pepper use. For example, it has been suggested to combine the pepper by encrypting salted password hashes, which allows rotation of the pepper. In the case of a shared-secret pepper, a single compromised password (via password reuse or other attack) along with a user's salt can lead to an attack to discover the pepper, rendering it ineffective. If an attacker knows a plaintext password and a user's salt, as well as the algorithm used to hash the password, then discovering the pepper can be a matter of brute forcing the values of the pepper. This is why NIST recommends the secret value be at least 112 bits, so that discovering it by exhaustive search is prohibitively expensive. The pepper must be generated anew for every application it is deployed in, otherwise a breach of one application would result in lowered security of another application. Without knowledge of the pepper, other passwords in the database will be far more difficult to extract from their hashed values, as the attacker would need to guess the password as well as the pepper. A pepper adds security to a database of salts and hashes because unless the attacker is able to obtain the pepper, cracking even a single hash is intractable, no matter how weak the original password. Even with a list of (salt, hash) pairs, an attacker must also guess the secret pepper in order to find the password which produces the hash. The NIST specification for a secret salt suggests using a Password-Based Key Derivation Function (PBKDF) with an approved Pseudorandom Function such as HMAC with SHA-3 as the hash function of the HMAC. The NIST recommendation is also to perform at least 1000 iterations of the PBKDF, and a further minimum 1000 iterations using the secret salt in place of the non-secret salt. == Randomly-selected pepper that must be re-discovered == The aim of this mechanism is to slow down password the password verification step, thus slowing attacks. The aim is similar increasing the iteration count on bcrypt or Argon2, but the mechanism is different. The secret salt or pepper must be rediscovered by the verifier or attacker each time by guessing. In this situation, the password hashing function is calculated using both the password and the pepper. At password storage time, the pepper is chosen randomly from a range between 1 and R, the hash output is calculated using the password and the pepper. The hash output is stored with the username. The pepper is then discarded. At password verification time, the verifier is provided with a username and password to verify. The originally calculated hash is retrieved for the given username, and then the hash of the password and each value between 1 and R is calculated. If any of these hash values match the stored password hash, the password is considered valid. Note, the possible values of the pepper should not be tested in a fixed order known to an attacker, otherwise a timing attack may reveal the pepper. If the password is correct, the correct pepper will be found in R/2 hash evaluations on average. If the password is incorrect, all R values must be tested before the password can be rejected.

    Read more →
  • Grokking (machine learning)

    Grokking (machine learning)

    In machine learning, grokking, or delayed generalization, is a phenomenon observed in some settings where a model abruptly transitions from overfitting (performing well only on training data) to generalizing (performing well on both training and test data), after many training iterations with little or no improvement on the held-out data. This contrasts with what is typically observed in machine learning, where generalization occurs gradually alongside improved performance on training data. == Origin == Grokking was introduced by OpenAI researcher Alethea Power and colleagues in the January 2022 paper "Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets". It is derived from the word grok coined by Robert Heinlein in his novel Stranger in a Strange Land. In ML research, "grokking" is not used as a synonym for "generalization"; rather, it names a sometimes-observed delayed‑generalization training phenomenon in which training and held‑out performance do not improve in tandem, and in which held‑out performance rises abruptly later. Authors also analyze the "grokking time", the epoch or step at which this transition occurs in those scenarios. == Interpretations == Grokking can be understood as a phase transition during the training process. In particular, recent work has shown that grokking may be due to a complexity phase transition in the model during training. While grokking has been thought of as largely a phenomenon of relatively shallow models, grokking has been observed in deep neural networks and non-neural models and is the subject of active research. One potential explanation is that the weight decay (a component of the loss function that penalizes higher values of the neural network parameters, also called regularization) slightly favors the general solution that involves lower weight values, but that is also harder to find. According to Neel Nanda, the process of learning the general solution may be gradual, even though the transition to the general solution occurs more suddenly later. Recent theories have hypothesized that grokking occurs when neural networks transition from a "lazy training" regime where the weights do not deviate far from initialization, to a "rich" regime where weights abruptly begin to move in task-relevant directions. Follow-up empirical and theoretical work has accumulated evidence in support of this perspective, and it offers a unifying view of earlier work as the transition from lazy to rich training dynamics is known to arise from properties of adaptive optimizers, weight decay, initial parameter weight norm, and more. This perspective is complementary to a unifying "pattern learning speeds" framework that links grokking and double descent; within this view, delayed generalization can arise across training time ("epoch‑wise") or across model size ("model‑wise"), and the authors report "model‑wise grokking".

    Read more →
  • Instagram egg

    Instagram egg

    The Instagram egg is a photo of an egg posted by the account @world_record_egg on the social media platform Instagram. It became a global phenomenon and an internet meme within days of its publication on 4 January 2019. It is the second most-liked Instagram post and was the most-liked Instagram post from 14 January 2019 until 20 December 2022, when it was overtaken by Lionel Messi's post showing him and his teammates celebrating after Argentina won the 2022 FIFA World Cup. The owner of the account was revealed to be Chris Godfrey, a British advertising creative, who later worked with his two friends Alissa Khan-Whelan and CJ Brown on a Hulu commercial featuring the egg, intended to raise mental health awareness. == Background == The photo was originally taken by Serghei Platanov, who then posted it to Shutterstock on 23 June 2015 with the title "eggs isolated on white background". == History == On 4 January 2019, the @world_record_egg account was created, and posted an image of a bird egg with the caption, "Let's set a world record together and get the most liked post on Instagram. Beating the current world record held by Kylie Jenner (18 million)! We got this." Jenner's previous record, the first photo of her daughter Stormi, had garnered a total of 18.4 million likes. The post quickly reached 18.4 million likes in just under 10 days, becoming the most-liked Instagram post at the time. It then continued to rise over 45 million likes in the next 48 hours, surpassing the "Despacito" music video and taking the world record for the most-liked online post (on any media platform) in history. After the account became verified on 14 January 2019, the post rose in popularity and likes, which snowballed into coverage in various media outlets. By 18 March 2019, the post had accumulated over 53.3 million likes, nearly three times the previous record of 18.4 million. It posted frequent updates for a few days in the form of Instagram Stories. Alongside the like tally, as of January 2023 the post has 3.8 million comments. Several individuals tried to claim that they were the account's creator, the claims being dismissed by "the egg" on Instagram direct messages. On 3 February 2019, the creator of the Instagram egg was revealed by Hulu and The New York Times to be Chris Godfrey, a British advertising creative. Alissa Khan-Whelan, his colleague, was also outed. On 18 January 2019, the account posted a second picture of an egg, almost identical to the first one apart from a small crack at the top left. As of 25 February 2019, the post accumulated 11.8 million likes. On 22 January 2019, the account posted a third picture of an egg, this time having two larger cracks. In less than 25 minutes, the post accumulated 1 million likes, and by 25 February 2019, it had accumulated 9.5 million likes. On 29 January 2019, a fourth picture of an egg was posted to the account which has another large crack on the right hand side, attracting 7.6 million likes by 25 February 2019. On 1 February 2019, a fifth picture of an egg was posted with stitching like that of a football, referencing the upcoming Super Bowl. That post had accumulated 6.5 million likes by 25 February 2019. The account promised that it would reveal what was inside the egg on 3 February, on the subscription video on demand service Hulu. The Hulu Instagram egg reveal was used to promote an animation about a mental health campaign. A caption from the clip read, "Recently I've started to crack, the pressure of social media is getting to me. If you're struggling too, talk to someone." The video was later posted on the @world_record_egg Instagram account, and this post received over 33 million views by May 2019. As of May 2020, it had received over 41 million views. On 16 July 2019, Chris Godfrey (the creator of the account) was listed as one of the top 25 most influential people on the internet. On 20 December 2022, the record for the most-liked Instagram post was surpassed by a post from Argentine footballer Lionel Messi, showing him and his teammates celebrating after winning the 2022 FIFA World Cup with their national team. The world record egg responded to being overtaken in likes by Messi with "Today [Lionel Messi] has taken the crown, for now. But I'm still left with one question… Who is the greatest of all time – Cristiano Ronaldo or Leo Messi?" The account sold to Dubai-based investor Mustafa El Fishawy in April 2024 for an undisclosed seven-figure sum. Reed Smith, who advised Godfrey, Brown, and Khan-Whelan in the transaction, stated they opted to sell it to "focus on new ventures." On 3 June, @world_record_egg posted an egg with the flag of Palestine in support of the country during the Gaza war; the post's caption described it as an "Egg for Peace" and hoped to "set a new world record together and get the most liked post on Instagram for a good cause." == Reception == In response to breaking the world record for the most-liked Instagram post, the account's owner wrote "This is madness. What a time to be alive." Hours later, Jenner posted a video on Instagram of her cracking open an egg and pouring its yolk onto the ground, with the caption: "Take that little egg." Pundits pontificated on the meaning of the egg picture's dominance over social media's "first family". As Vogue observed, tapping a heart pictogram is easy, and eggs are "lovable". More pointedly: [T]he attention economy is a scam based on requiring little to no labor from both producer and consumer despite commanding the most space, and therefore value, in our digital lives... but it very well could be: As a metaphor for the fragility of the influencer ecosystem, the egg has broken the Internet. The significance of the event and its massive republishing are a topic of discussion. A University of Westminster researcher of internet memes compared it to the movement to name a scientific research vessel in the United Kingdom as Boaty McBoatface. The Instagrammer's success is a rare victory for the unpaid viral campaign on social media. "There is a bit of an anti-celebrity revolt here – 'look what we can do with a simple egg'" The researcher suggests that the accomplishment of becoming such a widely heralded unpaid viral post may become increasingly rare, as social networks rely more on paid and business promotion. The post's spread has been characterized as a populist backlash against "consumerism" and is seen by some as a triumph of community over celebrity. However, propelled by their popular success, the creators promised to release 'egg-centric' memorabilia. Hundreds of games based on the Instagram egg have appeared on Apple's App Store. The creators of the Instagram egg also reached a deal to promote Hulu.

    Read more →
  • Data Transformation Services

    Data Transformation Services

    Data Transformation Services (DTS) is a Microsoft database tool with a set of objects and utilities to allow the automation of extract, transform and load operations to or from a database. The objects are DTS packages and their components, and the utilities are called DTS tools. DTS was included with earlier versions of Microsoft SQL Server, and was almost always used with SQL Server databases, although it could be used independently with other databases. DTS allows data to be transformed and loaded from heterogeneous sources using OLE DB, ODBC, or text-only files, into any supported database. DTS can also allow automation of data import or transformation on a scheduled basis, and can perform additional functions such as FTPing files and executing external programs. In addition, DTS provides an alternative method of version control and backup for packages when used in conjunction with a version control system, such as Microsoft Visual SourceSafe. DTS has been superseded by SQL Server Integration Services in later releases of Microsoft SQL Server though there was some backwards compatibility and ability to run DTS packages in the new SSIS for a time. == History == In SQL Server versions 6.5 and earlier, database administrators (DBAs) used SQL Server Transfer Manager and Bulk Copy Program, included with SQL Server, to transfer data. These tools had significant shortcomings, and many DBAs used third-party tools such as Pervasive Data Integrator to transfer data more flexibly and easily. With the release of SQL Server 7 in 1998, "Data Transformation Services" was packaged with it to replace all these tools. The concept, design, and implementation of the Data Transformation Services was led by Stewart P. MacLeod (SQL Server Development Group Program Manager), Vij Rajarajan (SQL Server Lead Developer), and Ted Hart (SQL Server Lead Developer). The goal was to make it easier to import, export, and transform heterogeneous data and simplify the creation of data warehouses from operational data sources. SQL Server 2000 expanded DTS functionality in several ways. It introduced new types of tasks, including the ability to FTP files, move databases or database components, and add messages into Microsoft Message Queue. DTS packages can be saved as a Visual Basic file in SQL Server 2000, and this can be expanded to save into any COM-compliant language. Microsoft also integrated packages into Windows 2000 security and made DTS tools more user-friendly; tasks can accept input and output parameters. DTS comes with all editions of SQL Server 7 and 2000, but was superseded by SQL Server Integration Services in the Microsoft SQL Server 2005 release in 2005. == DTS packages == The DTS package is the fundamental logical component of DTS; every DTS object is a child component of the package. Packages are used whenever one modifies data using DTS. All the metadata about the data transformation is contained within the package. Packages can be saved directly in a SQL Server, or can be saved in the Microsoft Repository or in COM files. SQL Server 2000 also allows a programmer to save packages in a Visual Basic or other language file (when stored to a VB file, the package is actually scripted—that is, a VB script is executed to dynamically create the package objects and its component objects). A package can contain any number of connection objects, but does not have to contain any. These allow the package to read data from any OLE DB-compliant data source, and can be expanded to handle other sorts of data. The functionality of a package is organized into tasks and steps. A DTS Task is a discrete set of functionalities executed as a single step in a DTS package. Each task defines a work item to be performed as part of the data movement and data transformation process or as a job to be executed. Data Transformation Services supplies a number of tasks that are part of the DTS object model and that can be accessed graphically through the DTS Designer or accessed programmatically. These tasks, which can be configured individually, cover a wide variety of data copying, data transformation and notification situations. For example, the following types of tasks represent some actions that you can perform by using DTS: executing a single SQL statement, sending an email, and transferring a file with FTP. A step within a DTS package describes the order in which tasks are run and the precedence constraints that describe what to do in the case damage or of failure. These steps can be executed sequentially or in parallel. Packages can also contain global variables which can be used throughout the package. SQL Server 2000 allows input and output parameters for tasks, greatly expanding the usefulness of global variables. DTS packages can be edited, password protected, scheduled for execution, and retrieved by version. == DTS tools == DTS tools packaged with SQL Server include the DTS wizards, DTS Designer, and DTS Programming Interfaces. === DTS wizards === The DTS wizards can be used to perform simple or common DTS tasks. These include the Import/Export Wizard and the Copy of Database Wizard. They provide the simplest method of copying data between OLE DB data sources. There is a great deal of functionality that is not available by merely using a wizard. However, a package created with a wizard can be saved and later altered with one of the other DTS tools. A Create Publishing Wizard is also available to schedule packages to run at certain times. This only works if SQL Server Agent is running; otherwise the package will be scheduled, but will not be executed. === DTS Designer === The DTS Designer is a graphical tool used to build complex DTS Packages with workflows and event-driven logic. DTS Designer can also be used to edit and customize DTS Packages created with the DTS wizard. Each connection and task in DTS Designer is shown with a specific icon. These icons are joined with precedence constraints, which specify the order and requirements for tasks to be run. One task may run, for instance, only if another task succeeds (or fails). Other tasks may run concurrently. The DTS Designer has been criticized for having unusual quirks and limitations, such as the inability to visually copy and paste multiple tasks at one time. Many of these shortcomings have been overcome in SQL Server Integration Services, DTS's successor. === DTS Query Designer === A graphical tool used to build queries in DTS. === DTS Run Utility === DTS Packages can be run from the command line using the DTSRUN Utility. The utility is invoked using the following syntax: dtsrun /S server_name[\instance_name] { {/[~]U user_name [/[~]P password]} | /E } ] { {/[~]N package_name } | {/[~]G package_guid_string} | {/[~]V package_version_guid_string} } [/[~]M package_password] [/[~]F filename] [/[~]R repository_database_name] [/A global_variable_name:typeid=value] [/L log_file_name] [/W NT_event_log_completion_status] [/Z] [/!X] [/!D] [/!Y] [/!C] ] When passing in parameters which are mapped to Global Variables, you are required to include the typeid. This is rather difficult to find on the Microsoft site. Below are the TypeIds used in passing in these values.

    Read more →
  • Hardware random number generator

    Hardware random number generator

    In computing, a hardware random number generator (HRNG), true random number generator (TRNG), non-deterministic random bit generator (NRBG), or physical random number generator is a device that generates random numbers from a physical process capable of producing entropy, unlike a pseudorandom number generator (PRNG) that utilizes a deterministic algorithm and non-physical nondeterministic random bit generators that do not include hardware dedicated to generation of entropy. Many natural phenomena generate low-level, statistically random "noise" signals, including thermal and shot noise, jitter and metastability of electronic circuits, Brownian motion, and atmospheric noise. Researchers also used the photoelectric effect, involving a beam splitter, other quantum phenomena, and even nuclear decay (due to practical considerations the latter, as well as the atmospheric noise, is not viable except for fairly restricted applications or online distribution services). While "classical" (non-quantum) phenomena are not truly random, an unpredictable physical system is usually acceptable as a source of randomness, so the qualifiers "true" and "physical" are used interchangeably. A hardware random number generator is expected to output near-perfect random numbers ("full entropy"). A physical process usually does not have this property, and a practical TRNG typically includes a few blocks: a noise source that implements the physical process producing the entropy. Usually this process is analog, so a digitizer is used to convert the output of the analog source into a binary representation; a conditioner (randomness extractor) that improves the quality of the random bits; health tests. TRNGs are mostly used in cryptographical algorithms that get completely broken if the random numbers have low entropy, so the testing functionality is usually included. Hardware random number generators generally produce only a limited number of random bits per second. In order to increase the available output data rate, they are often used to generate the "seed" for a faster PRNG. PRNG also helps with the noise source "anonymization" (whitening out the noise source identifying characteristics) and entropy extraction. With a proper PRNG algorithm selected (cryptographically secure pseudorandom number generator, CSPRNG), the combination can satisfy the requirements of Federal Information Processing Standards and Common Criteria standards. == Uses == Hardware random number generators can be used in any application that needs randomness. However, in many scientific applications additional cost and complexity of a TRNG (when compared with pseudo random number generators) provide no meaningful benefits. TRNGs have additional drawbacks for data science and statistical applications: impossibility to re-run a series of numbers unless they are stored, reliance on an analog physical entity can obscure the failure of the source. The TRNGs therefore are primarily used in the applications where their unpredictability and the impossibility to re-run the sequence of numbers are crucial to the success of the implementation: in cryptography and gambling machines. === Cryptography === The major use for hardware random number generators is in the field of data encryption, for example to create random cryptographic keys and nonces needed to encrypt and sign data. In addition to randomness, there are at least two additional requirements imposed by the cryptographic applications: forward secrecy guarantees that the knowledge of the past output and internal state of the device should not enable the attacker to predict future data; backward secrecy protects the "opposite direction": knowledge of the output and internal state in the future should not divulge the preceding data. A typical way to fulfill these requirements is to use a TRNG to seed a cryptographically secure pseudorandom number generator. == History == Physical devices were used to generate random numbers for thousands of years, primarily for gambling. Dice in particular have been known for more than 5000 years (found on locations in modern Iraq and Iran), and flipping a coin (thus producing a random bit) dates at least to the times of ancient Rome. The first documented use of a physical random number generator for scientific purposes was by Francis Galton (1890). He devised a way to sample a probability distribution using a common gambling die. In addition to the top digit, Galton also looked at the face of a die closest to him, thus creating 64 = 24 outcomes (about 4.6 bits of randomness). Kendall and Babington-Smith (1938) used a fast-rotating 10-sector disk that was illuminated by periodic bursts of light. The sampling was done by a human who wrote the number under the light beam onto a pad. The device was utilized to produce a 100,000-digit random number table (at the time such tables were used for statistical experiments, like PRNG nowadays). On 29 April 1947, the RAND Corporation began generating random digits with an "electronic roulette wheel", consisting of a random frequency pulse source of about 100,000 pulses per second gated once per second with a constant frequency pulse and fed into a five-bit binary counter. Douglas Aircraft built the equipment, implementing Cecil Hasting's suggestion (RAND P-113) for a noise source (most likely the well known behavior of the 6D4 miniature gas thyratron tube, when placed in a magnetic field). Twenty of the 32 possible counter values were mapped onto the 10 decimal digits and the other 12 counter values were discarded. The results of a long run from the RAND machine, filtered and tested, were converted into a table, which originally existed only as a deck of punched cards, but was later published in 1955 as a book, 50 rows of 50 digits on each page (A Million Random Digits with 100,000 Normal Deviates). The RAND table was a significant breakthrough in delivering random numbers because such a large and carefully prepared table had never before been available. It has been a useful source for simulations, modeling, and for deriving the arbitrary constants in cryptographic algorithms to demonstrate that the constants had not been selected maliciously ("nothing up my sleeve numbers"). Since the early 1950s, research into TRNGs has been highly active, with thousands of research works published and about 2000 patents granted by 2017. == Physical phenomena with random properties == Multiple different TRNG designs were proposed over time with a large variety of noise sources and digitization techniques ("harvesting"). However, practical considerations (size, power, cost, performance, robustness) dictate the following desirable traits: use of a commonly available inexpensive silicon process; exclusive use of digital design techniques. This allows an easier system-on-chip integration and enables the use of FPGAs; compact and low-power design. This discourages use of analog components (e.g., amplifiers); mathematical justification of the entropy collection mechanisms. Stipčević & Koç in 2014 classified the physical phenomena used to implement TRNG into four groups: electrical noise; free-running oscillators; chaos; quantum effects. === Electrical noise-based RNG === Noise-based RNGs generally follow the same outline: the source of a noise generator is fed into a comparator. If the voltage is above threshold, the comparator output is 1, otherwise 0. The random bit value is latched using a flip-flop. Sources of noise vary and include: Johnson–Nyquist noise ("thermal noise"); Zener noise; avalanche breakdown. The drawbacks of using noise sources for an RNG design are: noise levels are hard to control, they vary with environmental changes and device-to-device; calibration processes needed to ensure a guaranteed amount of entropy are time-consuming; noise levels are typically low, thus the design requires power-hungry amplifiers. The sensitivity of amplifier inputs enables manipulation by an attacker; circuitry located nearby generates a lot of non-random noise thus lowering the entropy; a proof of randomness is near-impossible as multiple interacting physical processes are involved. === Chaos-based RNG === The idea of chaos-based noise stems from the use of a complex system that is hard to characterize by observing its behavior over time. For example, lasers can be put into (undesirable in other applications) chaos mode with chaotically fluctuating power, with power detected using a photodiode and sampled by a comparator. The design can be quite small, as all photonics elements can be integrated on-chip. Stipčević & Koç characterize this technique as "most objectionable", mostly due to the fact that chaotic behavior is usually controlled by a differential equation and no new randomness is introduced, thus there is a possibility of the chaos-based TRNG producing a limited subset of possible output strings. === Free-running oscillators-based RNG === The TRNGs based on a free-running oscilla

    Read more →
  • Plotly

    Plotly

    Plotly is a technical computing company headquartered in Montreal, Quebec, that develops online data analytics and visualization tools. Plotly provides online graphing, analytics, and statistics tools for individuals and collaboration, as well as scientific graphing libraries for Python, R, MATLAB, Perl, Julia, Arduino, JavaScript and REST. == History == Plotly was founded by Alex Johnson, Jack Parmer, Chris Parmer, and Matthew Sundquist. The founders' backgrounds are in science, energy, and data analysis and visualization. Early employees include Christophe Viau, a Canadian software engineer and Ben Postlethwaite, a Canadian geophysicist. Plotly was named one of the Top 20 Hottest Innovative Companies in Canada by the Canadian Innovation Exchange. Plotly was featured in "startup row" at PyCon 2013, and sponsored the SciPy 2018 conference. Plotly raised $5.5 million during its Series A funding, led by MHS Capital, Siemens Venture Capital, Rho Ventures, Real Ventures, and Silicon Valley Bank. The Boston Globe and Washington Post newsrooms have produced data journalism using Plotly. In 2020, Plotly was named a Best Place to Work by the Canadian SME National Business Awards, and nominated as Business of the Year. == Products == Plotly offers open-source and enterprise products. Dash is an open-source Python, R, and Julia framework for building web-based analytic applications. Many specialized open-source Dash libraries exist that are tailored for building domain-specific Dash components and applications. Some examples are Dash DAQ, for building data acquisition GUIs to use with scientific instruments, and Dash Bio, which enables users to build custom chart types, sequence analysis tools, and 3D rendering tools for bioinformatics applications. Dash Enterprise is Plotly's paid product for building, testing, deploying, managing and scaling Dash applications organization-wide. Chart Studio Cloud is a free, online tool for creating interactive graphs. It has a point-and-click graphical user interface for importing and analyzing data into a grid and using stats tools. Graphs can be embedded or downloaded. Chart Studio Enterprise is a paid product that allows teams to create, style, and share interactive graphs on a single platform. It offers expanded authentication and file export options, and does not limit sharing and viewing. Data visualization libraries Plotly.js is an open-source JavaScript library for creating graphs and powers Plotly.py for Python, as well as Plotly.R for R, MATLAB, Node.js, Julia, and Arduino and a REST API. Plotly can also be used to style interactive graphs with Jupyter notebook. Figure converters which convert matplotlib, ggplot2, and IGOR Pro graphs into interactive, online graphs. == Data visualization libraries == Plotly provides a collection of supported chart types across several programming languages: == Dash == Dash is a Python framework built on top of React, a JavaScript library. Dash also works for R, and most recently supports Julia. While still described as a Python framework, Python isn't used for the other languages: "... describing Dash as a Python framework misses a key feature of its design: the Python side (the back end/server) of Dash was built to be lightweight and stateless [allowing] multiple back-end languages to coexist on an equal footing". It is possible to integrate D3.js charts as Dash components. Dash provides the default CSS (plus HTML and JavaScript), but for custom styling Dash applications, CSS can be added, or Dash Enterprise used. === Dash Enterprise === Dash Enterprise is Plotly's paid product for building, testing, deploying, managing and scaling Dash applications organization-wide. The product integrates with enterprise IT systems to enable organizations to build, deploy and scale low-code Dash applications. With open-source Dash, analytic applications can be run from a local machine, but cannot be easily accessed by others in the organization. ==== Enterprise IT integration ==== Dash Enterprise installs on cloud environments and on-premises. Amazon Web Services, Google Cloud Platform, and Microsoft Azure are supported, as are multiple Linux on-premises servers. Authentication integrations include LDAP, AD, PKI, Okta, SAML, OAuth2, SSO, and email authentication, and Dash application access is managed through a GUI rather than code. Dash Enterprise connects to major big data backends, including Salesforce, PostgreSQL, Databricks via PySpark, Snowflake, Dask, Datashader, and Vaex. In 2020, Plotly partnered with NVIDIA to integrate Dash with RAPIDS, and NVIDIA participated in Plotly's Series C funding round. ==== Low-code capabilities ==== Dash Enterprise enables low-code development of Dash applications, which is not possible with open-source Dash. Enterprise users can write applications in multiple development environments, including Jupyter Notebook. Dash Enterprise ships with several “development engines” for drag-and-drop application editing, application design, and automated reporting, as well as dozens of artificial intelligence and machine learning application templates. ==== Deployment and scaling ==== Dash application code is deployed to Dash Enterprise using the git-push command. Dash application deployments are containerized to avoid dependency conflicts, and can be embedded in existing web platforms without iframes. Deployed applications can be managed and accessed in a single portal called App Manager, where administrators can control user authentication and view usage analytics. Dash Enterprise scales horizontally with Kubernetes. Jobs queuing, GPU acceleration, and CPU parallelization support high performance computing requirements. Plotly also offers professional services for application development and workshop training.

    Read more →
  • AS2

    AS2

    AS2 (Applicability Statement 2) is a specification on how to transport structured business-to-business data securely and reliably over the Internet. Security is achieved by using digital certificates and encryption. == Background == AS2 was created in 2002 by the IETF to replace AS1, which they created in the early 1990s. The adoption of AS2 grew rapidly throughout the early 2000s because major players in the retail and fast-moving consumer goods industries championed AS2. Walmart was the first major retailer to require its suppliers to use the AS2 protocol instead of relying on dial-up modems for ordering goods. Amazon, Target, Lowe's, Bed, Bath, & Beyond and thousands of others followed suit. Many other industries use the AS2 protocol, including healthcare, as AS2 meets legal HIPAA requirements. In some cases, AS2 is a way to bypass expensive value-added networks previously used for data interchange. == Technical overview == AS2 is specified in RFC 4130, and is based on HTTP and S/MIME. It was the second AS protocol developed and uses the same signing, encryption and MDN (as defined by RFC3798) conventions used in the original AS1 protocol introduced in the late 1990s by IETF. In other words: Files are encoded as "attachments" in a standardized S/MIME message (an AS2 message). AS2 does not specify the contents of the files. Usually, the file contents are in a standardized format that is separately agreed upon, such as XML or EDIFACT. AS2 messages are always sent using the HTTP or HTTPS protocol (Secure Sockets Layer — also known as SSL — is implied by HTTPS) and usually use the "POST" method (use of "GET" is rare). Messages can be signed, but do not have to be. Messages can be encrypted, but do not have to be. Messages may request a Message Disposition Notification (MDN) back if all went well, but do not have to request such a message. If the original AS2 message requested an MDN: Upon the receipt of the message and its successful decryption or signature validation (as necessary) a "success" MDN will be sent back to the original sender. This MDN is typically signed but never encrypted (unless temporarily encrypted in transit via HTTPS). Upon the receipt and successful verification of the signature on the MDN, the original sender will "know" that the recipient got their message (this provides the "Non-repudiation" element of AS2). If there are any problems receiving or interpreting the original AS2 message, a "failed" MDN may be sent back. However, part of the AS2 protocol states that the client must treat a lack of an MDN as a failure as well, so some AS2 receivers will not return an MDN in this case. Like any other AS file transfer, AS2 file transfers typically require both sides of the exchange to trade X.509 certificates and specific "trading partner" names before any transfers can take place. AS2 trading partner names can usually be any valid phrase. === MDN options === Unlike AS1 or AS3 file transfers, AS2 file transfers offer several "MDN return" options instead of the traditional options of "yes" or "no". Specifically, the choices are: ==== AS2 w/ "Sync" MDNs ==== Return Synchronous MDN via HTTP(S) ("AS2 Sync") - This popular option allows AS2 MDNs to be returned to AS2 message sender clients over the same HTTP connection they used to send the original message. This "MDN while you wait" capability makes "AS2 Sync" transfers the fastest of any type of AS file transfer, but it also keeps this flavor of MDN requests from being used with large files (which may time out in low-bandwidth situations). ==== AS2 w/ "ASync" MDNs ==== Return Asynchronous MDN via HTTP(S) (a.k.a. "AS2 Async") - This popular option allows AS2 MDNs to be returned to the AS2 message sender's server later over a different HTTP connection. This flavor of MDN request is usually used if large files are involved or if your trading partner's AS2 server has poor Internet service. ==== AS2 w/ "Email" MDNs ==== Return (Asynchronous) MDN via Email - This rarely used option allows AS2 MDNs to be returned to AS2 message senders via email rather than HTTP. Otherwise, it is similar to "AS2 Async (HTTP)". ==== AS2 w/ No MDNs ==== Do not return MDN - This option works like it does in any other AS protocol: the receiver of an AS2 message with this option set simply does not try to return an MDN to the AS2 message sender. ==== Filename preservation ==== AS2 filename preservation feature will be used to communicate the filename to the trading partner. The banking industry relies on filenames being communicated between trading partners. AS2 vendors are currently certifying that implementation of filename communication conforms to the standard and is interoperable. There are two profiles for filename preservation being optionally tested under AS2 testing: Filename preservation without MDN responses Filename preservation with an associated MDN response certification Walmart recommends contacting Drummond Group, LLC for more information on EDIINT AS2, or for a list of interoperable-testing AS2 software providers. == Benefits == For many businesses, the use of AS2 and electronic data interchange (EDI) is not a choice so much as it is a requirement of doing business with a large customer or partner. That said, AS2 is a universal protocol that has benefits, from both business and technology vantage points. === Business case === Cut costs by using the web for EDI file transfers, AS2 reduces the cost of transactions from expensive VANs. Extend EDI to more partners; with lower costs and universal web connectivity, AS2 allows organizations to implement EDI with partners worldwide that have little EDI infrastructure. Save time by eliminating the need to manually process orders. Eliminate errors by turning manual processes into automated processes. Universal solution — AS2 is established and tested, so no one has to re-invent the wheel. === Technological advantages === Leverage the web: if an organization can share data securely via the web, they already have much of the infrastructure for AS2. Unlimited EDI data — there are no practical limitations on transaction sizes via the web, and AS2 includes features for managing large transfers. Payload Agnostic — AS2 can be used to transport any type of document. While EDI X12, EDIFACT and XML are common, any mutually agreed-upon format may be transferred.

    Read more →
  • Smart-ID

    Smart-ID

    Smart-ID is an electronic authentication tool developed by SK ID Solutions, an Estonian company. Users can log in to various electronic services and sign documents with an electronic signature. Smart-ID meets the European Union's eIDAS Regulation and the European Central Bank's standards for a secure authentication solution. Smart-ID is a Qualified Signature Creator Device (QSCD) that can issue a Qualified Electronic Signature (QES). The Smart-ID app is compatible with both iOS and Android devices and does not require a SIM card. By 2021, the Smart-ID application was launched in the Huawei AppGallery. As of May 2023, Smart-ID has 3,298,969 active users across the Baltic States (Latvia, Lithuania, and Estonia). Every month, the Smart-ID processes 79 million transactions. In March 2023, Smart-ID users made an exceptional 85 million transactions. == History == In November 2016, SK ID Solutions debuted the Smart-ID tool for the first time at its annual conference. In February 2017, eKool, Starman, and Tallinn Kaubamaja Grupp were the first to implement Smart-ID authentication in their e-services. In March 2017, Smart-ID was added as an authentication option to SEB bank and Swedbank's online banking in all three Baltic States. Dokobit, previously known as DigiDoc, began offering its clients the ability to use e-services using Smart-ID in April 2017. More than 100 service providers had implemented Smart-ID as an authentication solution for their services by November 2019. At its annual conference on November 8, 2018, SK ID Solutions revealed that Smart-ID had been certified as compatible with the QSCD[8] level, the highest level of qualified electronic signature in the European Union, following a rigorous certification process. As a result, the Smart-QES-level ID's electronic signature, the digital counterpart of a handwritten signature, is now available to all users who have registered with the tool. This signature is accepted by all European Union member states. On August 26, 2019, Estonian Information Systems Supervisory Authority experts reviewed Smart-ID (ISSA). Based on the methods provided in the eIDAS Regulation, the expert committee concluded that Smart-ID offers a high level of electronic identification assurance. SK ID Solutions and RIA struck an agreement in September 2019 that allows Smart-ID to authenticate Estonian state e-services via RIA's central authentication service, which is used by over 60 public authorities. Smart-ID accounts created three years ago have expired in January 2020. Therefore, renewing them and performing mandatory updates was necessary. In February 2020, SK ID Solutions announced that Smart-ID could be used to give digital signatures in the national digital signature software DigiDoc4, which up until this moment was only possible with ID cards via Mobile-ID. Users must have at least version 4.2.4.71 or later of the DigiDoc4 software installed on their computers to use this feature. Since February 2020, Smart-ID accounts can now be created with biometric information from an ID card or passport, but only by users who have previously used a Smart-ID account. Since October 2022, 13–17 years old minors in Lithuania are able to create a Smart-ID account using biometric information too. A parent or legal guardian must approve the registration. SK ID Solutions collaborated on the new solution with iProov from the United Kingdom and InnoValor from the Netherlands. TÜV Informationstechnik GmbH, a German certification company, assessed it. Since May 2023, Smart-ID can be used to submit company's annual reports in Estonia and digitally sign anything in the e-business register using your PIN2. == Overview == The Smart-ID app is available for download on Google Play and Apple's App Store. Android 4.4 and iOS 11 are the oldest supported operating system versions for Smart-ID. Smart-ID works on the premise of two-factor authentication, combining an intelligent device (something the user owns) with PINs (something the user knows). A new user must first authenticate themselves with an ID card or a mobile phone number and then confirm a PIN1 and PIN2 code, either manually or automatically produced. The first PIN is used to authenticate a person's identity when accessing e-banking or e-services, while the second PIN is used to support electronic signatures and authenticate transactions (e.g., transfers). The PIN1 code must be four digits long, while the PIN2 code must be five digits long. To log in to an e-service, the user must use Smart-ID as the authentication method and enter their unique Smart-ID user ID. A notification will open on the user's smart device where the software is installed and display a verification code. If the code matches the code presented to the user by the e-service, then the user can confirm the match by entering their PIN1 code. The user must verify the action with their PIN2 code when giving digital signatures. A Smart-ID account is valid for three years. The report can be updated, changed, and deleted at any given time, free of charge. Smart-ID is available in five languages: Estonian, Latvian, Lithuanian, Russian, and English. An international survey conducted in 2021 revealed that Smart-ID is the most reliable authentication solution in Baltic countries. In January 2023, the number of times Smart-ID was used to access State Authentication Service (TARA) in Estonia has surpassed those of Mobile-ID and ID-cards for the first time since July 2022. == Security == Smart-ID is based on Cybernetica's SplitKey authentication and digital signature platform technology, for which the company has filed a patent application. Public key cryptography, digital signature methods, and critical public infrastructures are all used in the technology. The user's PIN is not saved on the device and is only needed to decrypt the private key in the Smart-ID app. When the user inputs the PIN, the private key is cracked, and the answer is transmitted to the Smart-ID server, where a portion of the key given by the app is joined with the server's encrypted key. The app will block the user from accessing it for three hours if they input the incorrect PIN three times in a row. If this happens once again, the app will lock for 24 hours. If this happens a third time, the account will be permanently disabled. PINs cannot be changed or recovered once an account has been created. The user must create a new account if the account is permanently blocked. Smart-ID uses the Apple and Google messaging networks to notify the app when new data is saved on its servers. == Phishing == In February 2019, unknown criminals attempted to create Smart-ID accounts with stolen IDs obtained via phishing customers' text messages and website addresses, according to a monthly report by the Estonian Information System Manager in April 2019. The Latvian Information Technology Security Incident Assessment Body Cert was also notified of these intrusions on March 1. Fraudsters sent emails to potential victims pretending to be bank representatives. The mails linked users to a phishing page after redirecting them to a phony bank login page. Victims were asked to log in using their identification information and PIN1 code. The fraudsters then began the process of generating a new Smart-ID account. As a result, the victim had to input a PIN2 number, which permitted the fraudster to finish setting up a new tab with the victim's personal information. Fraudsters in Estonia were able to log in to multiple e-services utilizing Smart-ID using a Smart-ID account and the victim's data. On behalf of the victims, fraudsters also employed online banking services. Later, the Estonian Information System Manager identified several victims, some of whom had also experienced financial losses. The Estonian Information System Manager requested a full report on the event from SK ID Solutions. The organization opted not to criticize the corporation after receiving the information, although it did propose that the procedure of creating Smart-ID accounts be reviewed. According to the Estonian Banking Association, Estonian banks have not discontinued using Smart-ID and do not think it is required. Smart-ID was exposed to a thorough review process in September 2019 to determine this authentication instrument's level of security. Reviewers discovered no flaws, and SK ID Solutions and the Estonian Information System Manager signed a contract. Estonia later introduced Smart-ID and other authentication mechanisms to the central public services portal.

    Read more →