AI Chat On Google

AI Chat On Google — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Apache Pig

    Apache Pig

    Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. The language for this platform is called Pig Latin. Pig can execute its Hadoop jobs in MapReduce, Apache Tez, or Apache Spark. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for relational database management systems. Pig Latin can be extended using user-defined functions (UDFs) which the user can write in Java, Python, JavaScript, Ruby or Groovy and then call directly from the language. == History == Apache Pig was originally developed at Yahoo Research around 2006 for researchers to have an ad hoc way of creating and executing MapReduce jobs on very large data sets. In 2007, it was moved into the Apache Software Foundation. === Naming === Regarding the naming of the Pig programming language, the name was chosen arbitrarily and stuck because it was memorable, easy to spell, and for novelty. The story goes that the researchers working on the project initially referred to it simply as 'the language'. Eventually they needed to call it something. Off the top of his head, one researcher suggested Pig, and the name stuck. It is quirky yet memorable and easy to spell. While some have hinted that the name sounds coy or silly, it has provided us with an entertaining nomenclature, such as Pig Latin for the language, Grunt for the shell, and PiggyBank for the CPAN-like shared repository. == Example == Below is an example of a "Word Count" program in Pig Latin: The above program will generate parallel executable tasks which can be distributed across multiple machines in a Hadoop cluster to count the number of words in a dataset such as all the webpages on the internet. == Pig vs SQL == In comparison to SQL, Pig has a nested relational model, uses lazy evaluation, uses extract, transform, load (ETL), is able to store data at any point during a pipeline, declares execution plans, supports pipeline splits, thus allowing workflows to proceed along DAGs instead of strictly sequential pipelines. On the other hand, it has been argued DBMSs are substantially faster than the MapReduce system once the data is loaded, but that loading the data takes considerably longer in the database systems. It has also been argued RDBMSs offer out of the box support for column-storage, working with compressed data, indexes for efficient random data access, and transaction-level fault tolerance. Pig Latin is procedural and fits very naturally in the pipeline paradigm while SQL is instead declarative. In SQL users can specify that data from two tables must be joined, but not what join implementation to use (You can specify the implementation of JOIN in SQL, thus "... for many SQL applications the query writer may not have enough knowledge of the data or enough expertise to specify an appropriate join algorithm."). Pig Latin allows users to specify an implementation or aspects of an implementation to be used in executing a script in several ways. In effect, Pig Latin programming is similar to specifying a query execution plan, making it easier for programmers to explicitly control the flow of their data processing task. SQL is oriented around queries that produce a single result. SQL handles trees naturally, but has no built in mechanism for splitting a data processing stream and applying different operators to each sub-stream. Pig Latin script describes a directed acyclic graph (DAG) rather than a pipeline. Pig Latin's ability to include user code at any point in the pipeline is useful for pipeline development. If SQL is used, data must first be imported into the database, and then the cleansing and transformation process can begin.

    Read more →
  • Corporate surveillance

    Corporate surveillance

    Corporate surveillance describes the practice of businesses monitoring and extracting information from their users, clients, or staff. This information may consist of online browsing history, email correspondence, phone calls, location data, and other private details. Acts of corporate surveillance frequently look to boost results, detect potential security problems, or adjust advertising strategies. These practices have been criticized for violating ethical standards and invading personal privacy. Critics and privacy activists have called for businesses to incorporate rules and transparency surrounding their monitoring methods to ensure they are not misusing their position of authority or breaching regulatory standards. Monitoring can feel intrusive and give the impression that the business does not promote ethical behavior among its personnel. Staff satisfaction, productivity, and staff turnover may all suffer as a result of the invasion of privacy. == Monitoring methods == Employers may be authorized to gather information through keystroke logging and mouse tracking, which involves recording the keys individuals interact with and cursor position on computers. In cases where employment contracts permit it, they may also monitor webcam activity on company-provided computers. Employers may be able to view the emails sent from business accounts and may be able to see the websites visited when using a corporate internet connection. The screenshot capability is another tool that enables companies to see what remote workers are doing. This feature, which can be found in tracking software, takes screenshots throughout the day at predetermined or arbitrary intervals. Additionally, people who don't work in offices are observed. For instance, it has been claimed that Amazon has incorporated tracking technology to monitor warehouse staff and delivery drivers. == Use of collected information == Information collected by corporations can be used for a variety of uses including marketing research, targeting advertising, fraud detection and prevention, ensuring policy adherence, preventing lawsuits, and safeguarding records and company assets. == Privacy concerns == Concerns over corporate privacy have become more important due to companies collection and manipulation of personal data. Since these practices have been recognized there has been a rising concern about both the security and the possible mishandling of the data accumulated. Social Media data collection and monitoring has been one of the most concerned areas regarding corporate surveillance. Recently, many employers on CareerBuilder have checked their potential candidates' social media activities before the hiring process. This approach can be excusable since it is important to be aware of a future employee or applicant's online presence, and how it might affect the company's reputation in the future. This is crucial since employers are often made legally responsible for their worker's digital actions. These data can also be used to enact political gains. The Facebook-Cambridge Analytica data scandal in 2018 revealed that its British branch to have surreptitiously sold American psychological data to the Trump campaign. This information was supposed to be private, but Facebook's inability to protect user information had reportedly not been a top priority of the company at the time. == Laws and regulations == The National Labor and Relations Act (NLRA) safeguards workplace democracy by giving workers in the private sector the basic freedom to demand better working conditions and choice of representation without fear of retaliation. General Data Protection Regulation (GDPR) outlines the broad responsibilities of data controllers and the "processors" that handle personal data on their behalf. They must adopt the necessary security measures in accordance with the risk involved in the data processing operations they carry out.[1] Electronics Communication Privacy Act (ECPA), as amended, provides protection for electronic, oral, and wire communications while they are being created, while they are being sent, and while they are being stored on computers. Email, phone calls, and electronically stored data are covered by the Act. == Sale of customer data == If it is business intelligence, data collected on individuals and groups can be sold to other corporations, so that they can use it for the aforementioned purpose. It can be used for direct marketing purposes, such as targeted advertisements on Google and Yahoo. These ads are tailored to the individual user of the search engine by analyzing their search history and emails (if they use free webmail services). For example, the world's most popular web search engine stores identifying information for each web search. Google stores an IP address and the search phrase used in a database for up to 2 years. Google also scans the content of emails of users of its Gmail webmail service, in order to create targeted advertising based on what people are talking about in their personal email correspondences. Google is, by far, the largest web advertising agency. Their revenue model is based on receiving payments from advertisers for each page-visit resulting from a visitor clicking on a Google AdWords ad, hosted either on a Google service or a third-party website. Millions of sites place Google's advertising banners and links on their websites, in order to share this profit from visitors who click on the ads. Each page containing Google advertisements adds, reads, and modifies cookies on each visitor's computer. These cookies track the user across all of these sites, and gather information about their web surfing habits, keeping track of which sites they visit, and what they do when they are on these sites. This information, along with the information from their email accounts, and search engine histories, is stored by Google to use for building a profile of the user to deliver better-targeted advertising. == Surveillance of workers == In 1993, David Steingard and Dale Fitzgibbons argued that modern management, far from empowering workers, had features of neo-Taylorism, where teamwork perpetuated surveillance and control. They argued that employees had become their own "thought police" and the team gaze was the equivalent of Bentham's panopticon guard tower. A critical evaluation of the Hawthorne Plant experiments has in turn given rise to the notion of a Hawthorne effect, where workers increase their productivity in response to their awareness of being observed or because they are gratified for being chosen to participate in a project. According to the American Management Association and the ePolicy Institute, who undertook a quantitative survey in 2007 about electronic monitoring and surveillance with approximately 300 US companies, "more than one fourth of employers have fired workers for misusing email and nearly one third have fired employees for misusing the Internet." Furthermore, about 30 percent of the companies had also fired employees for usage of "inappropriate or offensive language" and "viewing, downloading, or uploading inappropriate/offensive content." More than 40 percent of the companies monitor email traffic of their workers, and 66 percent of corporations monitor Internet connections. In addition, most companies use software to block websites such as sites with games, social networking, entertainment, shopping, and sports. The American Management Association and the ePolicy Institute also stress that companies track content that is being written about them, for example by monitoring blogs and social media, and scanning all files that are stored in a filesystem. == Government use of corporate surveillance data == The United States government often gains access to corporate databases, either by producing a warrant for it, or by asking. The Department of Homeland Security has openly stated that it uses data collected from consumer credit and direct marketing agencies—such as Google—for augmenting the profiles of individuals that it is monitoring. The US government has gathered information from grocery store discount card programs, which track customers' shopping patterns and store them in databases, in order to look for terrorists by analyzing shoppers' buying patterns. == Corporate surveillance of citizens == According to Dennis Broeders, "Big Brother is joined by big business". He argues that corporations are in any event interested in data on their potential customers and that placing some forms of surveillance in the hands of companies, results in companies owning video surveillance data for stores and public places. The commercial availability of surveillance systems has led to their rapid spread. Therefore it is almost impossible for citizens to maintain their anonymity. When businesses can monitor their customers, such customers run the risk of facing prejudice when applying for housing, loans, jobs, and other economic opportun

    Read more →
  • Blacker (security)

    Blacker (security)

    Blacker (styled BLACKER) is a U.S. Department of Defense computer network security project designed to achieve A1 class ratings (very high assurance) of the Trusted Computer System Evaluation Criteria (TCSEC). The first Blacker program began in the late 1970s, with a follow-on eventually producing fielded devices in the late 1980s. It was the first secure system with trusted end-to-end encryption on the United States' Defense Data Network. The project was implemented by SDC (software), and Burroughs (hardware), and after their merger, by the resultant company Unisys.

    Read more →
  • Commit (data management)

    Commit (data management)

    In computer science and data management, a commit is a behavior that marks the end of a transaction and provides Atomicity, Consistency, Isolation, and Durability (ACID) in transactions. The submission records are stored in the submission log for recovery and consistency in case of failure. In terms of transactions, the opposite of committing is giving up tentative changes to the transaction, which is rolled back. Due to the rise of distributed computing and the need to ensure data consistency across multiple systems, commit protocols have been evolving since their emergence in the 1970s. The main developments include the Two-Phase Commit (2PC) first proposed by Jim Gray, which is the fundamental core of distributed transaction management. Subsequently, the Three-phase Commit (3PC), Hypothesis Commit (PC), Hypothesis Abort (PA), and Optimistic Commit protocols gradually emerged, solving the problems of blocking and fault recovery. Today, new fields such as e-commerce payment and blockchain technology are emerging, and submission protocols play a significant role in various business areas. By effectively handling transactions, resolving faults and recovering problems, the commit protocol becomes crucial in ensuring the reliability and consistency of data management. == History == The concept of Commit originated in the late 1960s and early 1970s, when computer technology was rapidly advancing and data management was becoming an important requirement in business and finance. Enterprises have gradually replaced the traditional paper records with computers, which has fully improved the work efficiency. The reliability and consistency of data have become a necessary requirement. Transaction management at this stage is relatively simple, limited to using a single computer for processing. It merely effectively records the changes in data to ensure that the data remains stable after the transaction is completed or terminated. In the late 1970s, as database systems moved from a single calculator operation to multiple distributed collaborations, ensuring data consistency and reliability became a new challenge. In 1978, computer scientist Jim Gray proposed the famous two-phase Commit Protocol (2PC), which became an effective solution for distributed transaction management, successfully managing data synchronization problems between multiple nodes. However, this commit protocol has some potential transaction blocking problems when nodes fail. In the early 1980s, researchers discovered that although the two-step commit protocol was effective at synchronizing data, there could be long waits and even system crashes, with limitations. To improve this problem, people have begun to explore new and effective methods, including enhancing efficiency by reducing message communication during the protocol process. IBM's R database introduced the Assumed Commit and Assumed abort protocols, which contributed significantly to transaction management efficiency. These two protocols have greatly improved the processing efficiency of distributed transactions by reducing communication overhead and have become an important breakthrough in the technology of transaction commit protocols. By the early 1990s, with the increase in business demands and the complexity of transactions, enterprises required higher efficiency in distributed transaction processing. In order to adapt to the needs of different environments, the scientific community has gradually developed various variants of commit protocols to provide more flexible transaction management options for different needs. For example, the three-phase commit protocol promotes the commit of transactions more effectively and reduces the occurrence of blocking problems by adding a pre-commit protocol and a timeout mechanism. In the 21st century, with the popularization of mobile Internet and wireless technology, the commit protocol has been further developed, and researchers have begun to pay attention to how to reduce the blocking in the transaction process to solve the problem of broadband limitation, battery life and network instability in the mobile environment. The proposal of optimistic commit protocol marks the extension of commit technology from traditional database to the emerging mobile data field. This protocol allows transactions to temporarily use unconfirmed data, improving the user experience in cases of poor network conditions. In recent years, with the rise of blockchain and decentralized technologies, submission protocols and consensus mechanisms have gradually merged. These consensus algorithms play a role in tamper-proofing and preventing malicious attacks on node pairs in a decentralized environment. This enables commit to no longer be confined to the scope of traditional database management, but to become the core technology of trust computing and distributed ledgers, further expanding the application field of commit in the digital age. This integration has brought about extensive application impacts. Each transaction can achieve the effect of tracking global submissions through the verification of the consensus mechanism, becoming an important technical foundation for promoting the circulation of digital assets, the operation of cryptocurrencies and decentralized applications. == Commit Protocol Types == In the world of data management, a transaction is a series of database operations, such as bank transfers and order submission. In order to ensure the accuracy, consistency, and security of the data, transactions are usually completed completely, or cancelled completely, leaving no partially completed results. Commit protocol is the method used to coordinate this process. Different protocols are applicable to different submission scenarios and have their own advantages and disadvantages. There are four major commit protocols. === Two-Phase Commit (2PC) === The two-phase commit protocol is the most classic and broadest approach to distributed transactions, which includes both a preparation phase and a commit phase. This commit protocol is designed to allow the database coordinator to determine if all participating nodes agree. The preparation phase is the phase in which the coordination node sends a ready to commit request to all nodes participating in the transaction. The commit phase is a global commit after all participating nodes are ready, and if no agreement is reached, all nodes roll back the transaction and undo all previous operations. Although the two-phase commit protocol is the easiest to operate and widely used, its obvious drawback is that it can cause transactions to be blocked for a long time when nodes fail, resulting in a decline in system performance and making it difficult to terminate or continue immediately. === Three-Phase Commit (3PC) === The three-phase commit protocol is an improved non-blocking protocol based on 2PC, which is divided into three stages: preparation, pre-commit and commit. Firstly, each node sends a "preparation" request. After confirmation, a "pre-submission" stage is added. At this point, each node has completed most of the preparatory work and is waiting for the final confirmation. Finally, in the formal commit stage, after all nodes send the "commit" request, the transaction is completed and committed. Compared with 2PC, it increases the timeout mechanism, avoids the blocking problem caused by single point of failure, and improves the reliability of the system. The three-phase commit protocol significantly optimizes transaction reliability, but adds additional overhead for message transmission and state maintenance. It is more suitable for distributed application scenarios with high transaction sensitivity and no acceptance of long waiting times. === Presumed Commit (PC) and Presumed Abort (PA) === Presumed Commit (PC) is the default that the transaction will be committed successfully and rollback will be notified unless an anomaly is encountered. This commit reduces the message overhead and logging costs of a normal commits. Presumed Abort (PA) is assumed that the default state of the transaction is a rollback and will only be committed when all nodes have explicitly agreed. This commit is applicable to transactions that are not updated frequently or have a low probability of successful commit. The IBM R Distributed Database management System was the first to propose and practice the PC and PA protocols, handling distributed transaction management very efficiently and becoming a classic case in the field of database transaction management. === Optimistic Commit Protocol === With the rise of the Internet, the previous commit protocols are facing new challenges, especially in mobile scenarios with unstable networks. Excessively long transaction waiting times can affect the user experience. The Optimistic Commit Protocol allows a transaction to temporarily access uncommitted data before committing to avoid wait times. This type of commit is suitable f

    Read more →
  • Toad (software)

    Toad (software)

    Toad is a database management toolset from Quest Software for managing relational and non-relational databases using SQL aimed at database developers, database administrators, and data analysts. The Toad toolset runs against Oracle, SQL Server, IBM DB2 (LUW & z/OS), SAP and MySQL. A Toad product for data preparation supports many data platforms. == History == A practicing Oracle DBA, Jim McDaniel, designed Toad for his own use in the mid-1990s. He called it Tool for Oracle Application Developers, shortened to "TOAD". McDaniel initially distributed the tool as shareware and later online as freeware. Quest Software acquired TOAD in October 1998. Quest Software itself was acquired by Dell in 2012 to form Dell Software. In June 2016, Dell announced the sale of their software division, including the Quest business, to Francisco Partners and Elliott Management Corporation. On October 31, 2016, the sale was finalized. On November 1, 2016, the sale of Dell Software to Francisco Partners and Elliott Management was completed, and the company re-launched as Quest Software. == Features == Connection Manager - Allow users to connect natively to the vendor’s database whether on-premise or DBaaS. Browser - Allow users to browse all the different database/schema objects and their properties effective management. Editor - A way to create and maintain scripts and database code with debugging and integration with source control. Unit Testing (Oracle) - Ensures code is functionally tested before it is released into production. Static code review (Oracle) - Ensures code meets required quality level using a rules-based system. SQL Optimization - Provides developers with a way to tune and optimize SQL statements and database code without relying on a DBA. Advanced optimization enables DBAs to tune SQL effectively in production. Scalability testing and database workload replay - Ensures that database code and SQL will scale properly before it gets released into production. == Books == Toad Pocket Reference for Oracle plsql 1st Edition by Jim McDaniel and Patrick McGrath, O'Reilly, 2002 (ISBN 0596003374, ISBN 978-0-596-00337-1) Toad Pocket Reference for Oracle 2nd Edition by Jeff Smith, Bert Scalzo, and Patrick McGrath, O'Reilly, 2005 (ISBN 0596009712, ISBN 978-0-596-00971-7) TOAD Handbook by Bert Scalzo and Dan Hotka, Sams, 2003 (ISBN 0672324865, ISBN 978-0-672-32486-4) TOAD Handbook 2nd Edition by Bert Scalzo and Dan Hotka, Addison-Wesley Professional, 2009 (ISBN 0321649109, ISBN 978-0-321-64910-2). TOAD Handbook 2nd Edition by Bert Scalzo and Dan Hotka, Addison-Wesley Professional, 2009 (ISBN 0321649109, ISBN 978-0-321-64910-2).

    Read more →
  • Cryptographic High Value Product

    Cryptographic High Value Product

    Cryptographic High Value Product (CHVP) is a designation used within the information security community to identify assets that have high value, and which may be used to encrypt / decrypt secure communications, but which do not retain or store any classified information. When disconnected from the secure communication network, the CHVP equipment may be handled with a lower level of controls than required for COMSEC equipment.

    Read more →
  • Undeniable signature

    Undeniable signature

    An undeniable signature is a digital signature scheme which allows the signer to be selective to whom they allow to verify signatures. The scheme adds explicit signature repudiation, preventing a signer later refusing to verify a signature by omission; a situation that would devalue the signature in the eyes of the verifier. It was invented by David Chaum and Hans van Antwerpen in 1989. == Overview == In this scheme, a signer possessing a private key can publish a signature of a message. However, the signature reveals nothing to a recipient/verifier of the message and signature without taking part in either of two interactive protocols: Confirmation protocol, which confirms that a candidate is a valid signature of the message issued by the signer, identified by the public key. Disavowal protocol, which confirms that a candidate is not a valid signature of the message issued by the signer. The motivation for the scheme is to allow the signer to choose to whom signatures are verified. However, that the signer might claim the signature is invalid at any later point, by refusing to take part in verification, would devalue signatures to verifiers. The disavowal protocol distinguishes these cases removing the signer's plausible deniability. It is important that the confirmation and disavowal exchanges are not transferable. They achieve this by having the property of zero-knowledge; both parties can create transcripts of both confirmation and disavowal that are indistinguishable, to a third-party, of correct exchanges. The designated verifier signature scheme improves upon deniable signatures by allowing, for each signature, the interactive portion of the scheme to be offloaded onto another party, a designated verifier, reducing the burden on the signer. == Zero-knowledge protocol == The following protocol was suggested by David Chaum. A group, G, is chosen in which the discrete logarithm problem is intractable, and all operation in the scheme take place in this group. Commonly, this will be the finite cyclic group of order p contained in Z/nZ, with p being a large prime number; this group is equipped with the group operation of integer multiplication modulo n. An arbitrary primitive element (or generator), g, of G is chosen; computed powers of g then combine obeying fixed axioms. Alice generates a key pair, randomly chooses a private key, x, and then derives and publishes the public key, y = gx. === Message signing === Alice signs the message, m, by computing and publishing the signature, z = mx. === Confirmation (i.e., avowal) protocol === Bob wishes to verify the signature, z, of m by Alice under the key, y. Bob picks two random numbers: a and b, and uses them to blind the message, sending to Alice: c = magb. Alice picks a random number, q, uses it to blind, c, and then signing this using her private key, x, sending to Bob: s1 = cgq ands2 = s1x. Note that s1x = (cgq)x = (magb)xgqx = (mx)a(gx)b+q = zayb+q. Bob reveals a and b. Alice verifies that a and b are the correct blind values, then, if so, reveals q. Revealing these blinds makes the exchange zero knowledge. Bob verifies s1 = cgq, proving q has not been chosen dishonestly, and s2 = zayb+q, proving z is valid signature issued by Alice's key. Note that zayb+q = (mx)a(gx)b+q. Alice can cheat at step 2 by attempting to randomly guess s2. === Disavowal protocol === Alice wishes to convince Bob that z is not a valid signature of m under the key, gx; i.e., z ≠ mx. Alice and Bob have agreed an integer, k, which sets the computational burden on Alice and the likelihood that she should succeed by chance. Bob picks random values, s ∈ {0, 1, ..., k} and a, and sends: v1 = msga and v2 = zsya, where exponentiating by a is used to blind the sent values. Note that v2 = zsya = (mx)s(gx)a = v1x. Alice, using her private key, computes v1x and then the quotient, v1xv2−1 = (msga)x(zsgxa)−1 = msxz−s = (mxz−1)s. Thus, v1xv2−1 = 1, unless z ≠ mx. Alice then tests v1xv2−1 for equality against the values: (mxz−1)i for i ∈ {0, 1, …, k}; which are calculated by repeated multiplication of mxz−1 (rather than exponentiating for each i). If the test succeeds, Alice conjectures the relevant i to be s; otherwise, she conjectures random value. Where z = mx, (mxz−1)i = v1xv2−1 = 1 for all i, s is unrecoverable. Alice commits to i: she picks a random r and sends hash(r, i) to Bob. Bob reveals a. Alice confirms that a is the correct blind (i.e., v1 and v2 can be generated using it), then, if so, reveals r. Revealing these blinds makes the exchange zero knowledge. Bob checks hash(r, i) = hash(r, s), proving Alice knows s, hence z ≠ mx. If Alice attempts to cheat at step 3 by guessing s at random, the probability of succeeding is 1/(k + 1). So, if k = 1023 and the protocol is conducted ten times, her chances are 1 to 2100.

    Read more →
  • Group key

    Group key

    In cryptography, a group key is a cryptographic key that is shared between a group of users. Typically, group keys are distributed by sending them to individual users, either physically, or encrypted individually for each user using either that user's pre-distributed private key. A common use of group keys is to allow a group of users to decrypt a broadcast message that is intended for that entire group of users, and no one else. For example, in the Second World War, group keys (known as "iodoforms", a term invented by a classically educated non-chemist, and nothing to do with the chemical of the same name) were sent to groups of agents by the Special Operations Executive. These group keys allowed all the agents in a particular group to receive a single coded message. In present-day applications, group keys are commonly used in conditional access systems, where the key is the common key used to decrypt the broadcast signal, and the group in question is the group of all paying subscribers. In this case, the group key is typically distributed to the subscribers' receivers using a combination of a physically distributed secure cryptoprocessor in the form of a smartcard and encrypted over-the-air messages.

    Read more →
  • Statistical learning theory

    Statistical learning theory

    Statistical learning theory is a framework for machine learning drawing from the fields of statistics and functional analysis. Statistical learning theory deals with the statistical inference problem of finding a predictive function based on data. Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition, and bioinformatics. == Introduction == The goals of learning are understanding and prediction. Learning falls into many categories, including supervised learning, unsupervised learning, online learning, and reinforcement learning. From the perspective of statistical learning theory, supervised learning is best understood. Supervised learning involves learning from a training set of data. Every point in the training is an input–output pair, where the input maps to an output. The learning problem consists of inferring the function that maps between the input and the output, such that the learned function can be used to predict the output from future input. Depending on the type of output, supervised learning problems are either problems of regression or problems of classification. If the output takes a continuous range of values, it is a regression problem. Using Ohm's law as an example, a regression could be performed with voltage as input and current as an output. The regression would find the functional relationship between voltage and current to be R {\displaystyle R} , such that V = I R {\displaystyle V=IR} Classification problems are those for which the output will be an element from a discrete set of labels. Classification is very common for machine learning applications. In facial recognition, for instance, a picture of a person's face would be the input, and the output label would be that person's name. The input would be represented by a large multidimensional vector whose elements represent pixels in the picture. After learning a function based on the training set data, that function is validated on a test set of data, data that did not appear in the training set. == Formal description == Take X {\displaystyle X} to be the vector space of all possible inputs, and Y {\displaystyle Y} to be the vector space of all possible outputs. Statistical learning theory takes the perspective that there is some unknown probability distribution over the product space Z = X × Y {\displaystyle Z=X\times Y} , i.e. there exists some unknown p ( z ) = p ( x , y ) {\displaystyle p(z)=p(\mathbf {x} ,y)} . The training set is made up of n {\displaystyle n} samples from this probability distribution, and is notated S = { ( x 1 , y 1 ) , … , ( x n , y n ) } = { z 1 , … , z n } {\displaystyle S=\{(\mathbf {x} _{1},y_{1}),\dots ,(\mathbf {x} _{n},y_{n})\}=\{\mathbf {z} _{1},\dots ,\mathbf {z} _{n}\}} Every x i {\displaystyle \mathbf {x} _{i}} is an input vector from the training data, and y i {\displaystyle y_{i}} is the output that corresponds to it. In this formalism, the inference problem consists of finding a function f : X → Y {\displaystyle f:X\to Y} such that f ( x ) ∼ y {\displaystyle f(\mathbf {x} )\sim y} . Let H {\displaystyle {\mathcal {H}}} be a space of functions f : X → Y {\displaystyle f:X\to Y} called the hypothesis space. The hypothesis space is the space of functions the algorithm will search through. Let V ( f ( x ) , y ) {\displaystyle V(f(\mathbf {x} ),y)} be the loss function, a metric for the difference between the predicted value f ( x ) {\displaystyle f(\mathbf {x} )} and the actual value y {\displaystyle y} . The expected risk is defined to be I [ f ] = ∫ X × Y V ( f ( x ) , y ) p ( x , y ) d x d y {\displaystyle I[f]=\int _{X\times Y}V(f(\mathbf {x} ),y)\,p(\mathbf {x} ,y)\,d\mathbf {x} \,dy} The target function, the best possible function f {\displaystyle f} that can be chosen, is given by the f {\displaystyle f} that satisfies f = argmin h ∈ H ⁡ I [ h ] {\displaystyle f=\mathop {\operatorname {argmin} } _{h\in {\mathcal {H}}}I[h]} Because the probability distribution p ( x , y ) {\displaystyle p(\mathbf {x} ,y)} is unknown, a proxy measure for the expected risk must be used. This measure is based on the training set, a sample from this unknown probability distribution. It is called the empirical risk I S [ f ] = 1 n ∑ i = 1 n V ( f ( x i ) , y i ) {\displaystyle I_{S}[f]={\frac {1}{n}}\sum _{i=1}^{n}V(f(\mathbf {x} _{i}),y_{i})} A learning algorithm that chooses the function f S {\displaystyle f_{S}} that minimizes the empirical risk is called empirical risk minimization. == Loss functions == The choice of loss function is a determining factor on the function f S {\displaystyle f_{S}} that will be chosen by the learning algorithm. The loss function also affects the convergence rate for an algorithm. It is important for the loss function to be convex. Different loss functions are used depending on whether the problem is one of regression or one of classification. === Regression === The most common loss function for regression is the square loss function (also known as the L2-norm). This familiar loss function is used in Ordinary Least Squares regression. The form is: V ( f ( x ) , y ) = ( y − f ( x ) ) 2 {\displaystyle V(f(\mathbf {x} ),y)=(y-f(\mathbf {x} ))^{2}} The absolute value loss (also known as the L1-norm) is also sometimes used: V ( f ( x ) , y ) = | y − f ( x ) | {\displaystyle V(f(\mathbf {x} ),y)=|y-f(\mathbf {x} )|} === Classification === In some sense the 0-1 indicator function is the most natural loss function for classification. It takes the value 0 if the predicted output is the same as the actual output, and it takes the value 1 if the predicted output is different from the actual output. For binary classification with Y = { − 1 , 1 } {\displaystyle Y=\{-1,1\}} , this is: V ( f ( x ) , y ) = θ ( − y f ( x ) ) {\displaystyle V(f(\mathbf {x} ),y)=\theta (-yf(\mathbf {x} ))} where θ {\displaystyle \theta } is the Heaviside step function. == Regularization == In machine learning problems, a major problem that arises is that of overfitting. Because learning is a prediction problem, the goal is not to find a function that most closely fits the (previously observed) data, but to find one that will most accurately predict output from future input. Empirical risk minimization runs this risk of overfitting: finding a function that matches the data exactly but does not predict future output well. Overfitting is symptomatic of unstable solutions; a small perturbation in the training set data would cause a large variation in the learned function. It can be shown that if the stability for the solution can be guaranteed, generalization and consistency are guaranteed as well. Regularization can solve the overfitting problem and give the problem stability. Regularization can be accomplished by restricting the hypothesis space H {\displaystyle {\mathcal {H}}} . A common example would be restricting H {\displaystyle {\mathcal {H}}} to linear functions: this can be seen as a reduction to the standard problem of linear regression. H {\displaystyle {\mathcal {H}}} could also be restricted to polynomial of degree p {\displaystyle p} , exponentials, or bounded functions on L1. Restriction of the hypothesis space avoids overfitting because the form of the potential functions are limited, and so does not allow for the choice of a function that gives empirical risk arbitrarily close to zero. One example of regularization is Tikhonov regularization. This consists of minimizing 1 n ∑ i = 1 n V ( f ( x i ) , y i ) + γ ‖ f ‖ H 2 {\displaystyle {\frac {1}{n}}\sum _{i=1}^{n}V(f(\mathbf {x} _{i}),y_{i})+\gamma \left\|f\right\|_{\mathcal {H}}^{2}} where γ {\displaystyle \gamma } is a fixed and positive parameter, the regularization parameter. Tikhonov regularization ensures existence, uniqueness, and stability of the solution. == Bounding empirical risk == Consider a binary classifier f : X → { 0 , 1 } {\displaystyle f:{\mathcal {X}}\to \{0,1\}} . We can apply Hoeffding's inequality to bound the probability that the empirical risk deviates from the true risk to be a Sub-Gaussian distribution. P ( | R ^ ( f ) − R ( f ) | ≥ ϵ ) ≤ 2 e − 2 n ϵ 2 {\displaystyle \mathbb {P} (|{\hat {R}}(f)-R(f)|\geq \epsilon )\leq 2e^{-2n\epsilon ^{2}}} But generally, when we do empirical risk minimization, we are not given a classifier; we must choose it. Therefore, a more useful result is to bound the probability of the supremum of the difference over the whole class. P ( sup f ∈ F | R ^ ( f ) − R ( f ) | ≥ ϵ ) ≤ 2 S ( F , n ) e − n ϵ 2 / 8 ≈ n d e − n ϵ 2 / 8 {\displaystyle \mathbb {P} {\bigg (}\sup _{f\in {\mathcal {F}}}|{\hat {R}}(f)-R(f)|\geq \epsilon {\bigg )}\leq 2S({\mathcal {F}},n)e^{-n\epsilon ^{2}/8}\approx n^{d}e^{-n\epsilon ^{2}/8}} where S ( F , n ) {\displaystyle S({\mathcal {F}},n)} is the shattering number and n {\displaystyle n} is the number of samples in your dataset. The exponential term comes from Hoeffding but there is an extra cost of taking the supremum over the whole cla

    Read more →
  • Pamphlet war

    Pamphlet war

    A pamphlet war is a protracted argument or discussion through printed media, especially between the time the printing press became common, and when state intervention like copyright laws made such public discourse more difficult. The purpose was to defend or attack a certain perspective or idea. Pamphlet wars have occurred multiple times throughout history, as both social and political platforms. Pamphlet wars became viable platforms for this protracted discussion with the advent and spread of the printing press. Cheap printing presses, and increased literacy made the late 17th century a key stepping stone for the development of pamphlet wars, a period of prolific use of this type of debate. Over 2200 pamphlets were published between 1600–1715 alone. Pamphlet wars are generally credited for powering many key social changes of the era, including the Reformation and the Revolution Controversy, the English philosophical debate set off by the French Revolution. == History of the pamphlet in England == Throughout Europe in the 16th century, printed tracts were used to argue religious doctrine and foment support for religious causes. In England, Henry VIII used print literature to justify his break from the Catholic Church. During the subsequent reigns of Edward and Mary, print polemics escalated into propaganda warfare, as print media gained enormous potential to sway common opinion. By the 1560s, print was widely used to convey news. In 1562, the first pamphlets appeared, which discussed the English forces sent to aid the Protestant French Huguenots. In 1569, pamphlets reported the revolt of the Northern Earls and the subsequent Rebellion of the same year. In the 1580s, pamphlets began to replace broadsheet ballads as the means to convey information to the general public. Over the next century, the pamphlet became the principal means of garnering support for a cause or an idea, and was particularly influential during the English Civil Wars (1642-1651) and the Glorious Revolution of 1688. Through the ensuing decades, the pamphlet lost some popularity due to the emergence of newspapers and journals, but continued to be an important medium of public debate, as illustrated by the Revolution Controversy a full century later in the 1790s. == Pamphlet printing == Coming from a Latin word, "pamphlet" literally means "small book." In the early days of printing, the format of the book or pamphlet depended on the size of the paper used and the number of times it was folded. If a page was only folded once, it was called a folio. If it was folded twice, it was known as a quarto. An octave was a paper folded three times. A pamphlet was usually 1-12 sheets of paper folded in quarto, or 8-96 pages. It was sold for one or two pennies apiece. The printing of a pamphlet involved many people: the author, the printer, suppliers, print-makers, compositor, correctors, pressmen, binders, and distributors. Once the pamphleteer had written the pamphlet, it was sent to the printing house to be corrected, set into type, and printed. The papers were then given to the printer's warehouse-keeper, who bundled the copies and sent them to the bookseller, who was probably the one financing the printing. He was responsible to bind the pamphlets, usually by sewing them, and then sold them wholesale to individual bookselling vendors. The booksellers then sold them from a stall in the marketplace. == Pamphlet subjects == Pamphlets began as the means of conveyance for religious debates, and therefore religious topics were one of the main subjects they dealt with. The definition of a pamphlet came to mean a short work dealing with social, political, or religious issues. Typical topics included the Civil war, Church of England doctrines, Acts of Parliament, the Popish Plot (see below), the Stuart Era, and Cromwell propaganda. In addition, pamphlets were also used for romantic fiction, autobiography, scurrilous personal abuse, and social criticism. They contained much of the propaganda of the 17th century in the midst of the religious and political turmoil. They were also used for debates between the Puritans and the Anglican. During the Glorious Revolution, pamphlets were political weapons. == Authors == There were many authors of pamphlets. However some of the more popular authors include Daniel Defoe, Thomas Hobbes, Jonathan Swift, John Milton, and Samuel Pepys. Also included in the midst are Thomas Nashe, Joseph Addison, Richard Steele, and Matthew Prior. In 1591–1592, Robert Greene released a series of pamphlets which later inspired many other authors including Thomas Middleton and Thomas Dekker. == Critics == Pamphlets, along with their vast popularity, received criticism. There were many in the time period who believed that pamphlets were full of foolishness. They thought the pamphlets were not good enough literature and that they would turn people from "good" writing. They believed that pamphlets would be the end of the great volumes of literature and that great writing would be forgotten. == News reporting == Pamphlets made a great difference in the way news was reported to the general public. With the publication of pamphlets, it was no longer difficult for people to hear of events taking place far away. The closer the occurrence was to London, the easier and faster people heard of it. For example, the Battle of Edgehill took place on 23 October 1642. The first pamphlet reporting the incident was printed on 25 October 24 hours after some of the orders reported had been given. While not entirely accurate, and hurriedly made, the pamphlet nonetheless was able to tell the general public what had happened in the battle. A more accurate, specific, and readable account was available in a pamphlet printed on 26 October, and the "authorized" version was available only five days after the battle took place. == Marprelate pamphlets == In 1588, a series of pamphlets marked a turning point for the Puritans, dividing them from other Protestants in the country. The authors wrote under the pseudonym of Martin Marprelate and his two sons of the same name. The true identities of the authors were never discovered. The pamphlets aimed to provoke authorities to take action against censorship. The series was among the first to ask questions directly of its readers. == Early pamphlet wars == === Elizabethan pamphlet wars === As a means of forming or swaying public opinion, pamphlets like these had a part in influencing society, even as the content was itself influenced by society. During the 16th century and continuing for a short while in the early 17th century in England there was rise in the use of pamphlet wars to discuss a myriad of issues spanning from the civil war, to religious freedoms and the roles of women in society. The Queen herself participated in these discussions, making sure that she was widely read and understood by her people in order to gain favour and establish herself as the monarch despite being a woman. Examples of her use of this medium appear in To the Troops at Tilbury written in 1588, On Mary's Execution written in 1586, and many more. Another famous writer of this period to take advantage of the pamphlet was Emilia Lanier, famous for her arguments about the role of women. A common idea promoted by many literary works and the general attitude towards women, Lanier's work "Eve's Apology in Defence of Women" refuted the belief that Eve is responsible for the fall of man. A very uncommon and unpopular stance to take, Lanier accomplishes her defence through structuring it as an apology, one of the earliest subversive feminist texts. Similarly, Francis Bacon wrote his Essays to promote his idea of morality and other complicated social issues. For example, his work, "Of Love" examines the various understandings of the concept of love, particularly as it was perceived during the Elizabethan era. === Eikon Series === From 1649 until 1651, some five pamphlets were published in a debate about the execution of King Charles I of England (1600-1649). Prior to his execution, King Charles wrote the first pamphlet in the discussion, Eikon Basilike’’ (from the Greek “eikon” for image and “basileus” for king). The subtitle of this work - Portraiture of His Sacred Majesty in His Solitudes and Sufferings - indicates that Charles sought to portray himself as a martyr to the cause of regal prerogative. In the following months, several response pamphlets were published (collectively known as the "Eikon" series), including: Eikon Alethine, Eikon e Pistes, Eikonoklastes, and Eikon Aklastos,” alternately attacking or defending the king, his regicide, and his self-portrait in “Eikon Basilike.” == Popish Plot and Elizabeth Cellier == In the 1680s, after being acquitted of the "Meal-Tub Plot" for which she was accused, Elizabeth Cellier wrote Malice Defeated, which, along with The Matchless Picaro, sparked a pamphlet war surrounding debate of the ascension of a Catholic king to the thro

    Read more →
  • Cryptographic Module Testing Laboratory

    Cryptographic Module Testing Laboratory

    Cryptographic Module Testing Laboratory (CMTL) is an information technology (IT) computer security testing laboratory that is accredited to conduct cryptographic module evaluations for conformance to the FIPS 140-2 U.S. Government standard. The National Institute of Standards and Technology (NIST) National Voluntary Laboratory Accreditation Program (NVLAP) accredits CMTLs to meet Cryptographic Module Validation Program (CMVP) standards and procedures. This has been replaced by FIPS 140-2 and the Cryptographic Module Validation Program (CMVP). == CMTL requirements == These laboratories must meet the following requirements: NIST Handbook 150, NVLAP Procedures and General Requirements NIST Handbook 150-17 Information Technology Security Testing - Cryptographic Module Testing NVLAP Specific Operations Checklist for Cryptographic Module Testing == FIPS 140-2 in relation to the Common Criteria == A CMTL can also be a Common Criteria (CC) Testing Laboratory (CCTL). The CC and FIPS 140-2 are different in the abstractness and focus of evaluation. FIPS 140-2 testing is against a defined cryptographic module and provides a suite of conformance tests to four FIPS 140 security levels. FIPS 140-2 describes the requirements for cryptographic modules and includes such areas as physical security, key management, self tests, roles and services, etc. The standard was initially developed in 1994 - prior to the development of the CC. The CC is an evaluation against a Protection Profile (PP), or security target (ST). Typically, a PP covers a broad range of products. A CC evaluation does not supersede or replace a validation to either FIPS 140-1, FIPS140-2 or FIPS 140-3. The four security levels in FIPS 140-1 and FIPS 140-2 do not map directly to specific CC EALs or to CC functional requirements. A CC certificate cannot be a substitute for a FIPS 140-1 or FIPS 140-2 certificate. If the operational environment is a modifiable operational environment, the operating system requirements of the Common Criteria are applicable at FIPS Security Levels 2 and above. FIPS 140-1 required evaluated operating systems that referenced the Trusted Computer System Evaluation Criteria (TCSEC) classes C2, B1 and B2. However, TCSEC is no longer in use and has been replaced by the Common Criteria. Consequently, FIPS 140-2 now references the Common Criteria. FIPS 140-2 or FIPS 140-3 validation efforts can be in some parts reused in Common Criteria evaluations, specifically in areas related to entropy source and cryptographic algorithms.

    Read more →
  • Myrinet

    Myrinet

    Myrinet, ANSI/VITA 26-1998, is a high-speed local area networking system designed by the company Myricom to be used as an interconnect between multiple machines to form computer clusters. == Description == Myrinet was promoted as having lower protocol overhead than standards such as Ethernet, and therefore better throughput, less interference, and lower latency while using the host CPU. Although it can be used as a traditional networking system, Myrinet is often used directly by programs that "know" about it, thereby bypassing a call into the operating system. Earlier versions of Myrinet used a variety of media and connectors: Generation 2 used copper media with DC-37 (Myrinet-LAN, M2L- controllers and switches) or microribbon (Myrinet-SAN, M2M-) connectors. Generation 3 used copper media with HSSDC (Myrinet-Serial, M3S-) or microribbon (Myrinet-SAN, M3M-) connectors, or fiber with LC-connectors (Myrinet-Fiber, M3F-). The later versions of Myrinet physically consist of two fibre optic cables, upstream and downstream, connected to the host computers with a single connector. Machines are connected via low-overhead routers and switches, as opposed to connecting one machine directly to another. Myrinet includes a number of fault-tolerance features, mostly backed by the switches. These include flow control, error control, and "heartbeat" monitoring on every link. The "fourth-generation" Myrinet, called Myri-10G, supported a 10 Gbit/s data rate and can use 10 Gigabit Ethernet on PHY, the physical layer (cables, connectors, distances, signaling). Myri-10G started shipping at the end of 2005. Myrinet was approved in 1998 by the American National Standards Institute for use on the VMEbus as ANSI/VITA 26-1998. One of the earliest publications on Myrinet is a 1995 IEEE article. === Performance === Myrinet is a lightweight protocol with little overhead that allows it to operate with throughput close to the basic signaling speed of the physical layer. For supercomputing, the low latency of Myrinet is even more important than its throughput performance, since, according to Amdahl's law, a high-performance parallel system tends to be bottlenecked by its slowest sequential process, which in all but the most embarrassingly parallel supercomputer workloads is often the latency of message transmission across the network. === Deployment === According to Myricom, 141 (28.2%) of the June 2005 TOP500 supercomputers used Myrinet technology. In the November 2005 TOP500, the number of supercomputers using Myrinet was down to 101 computers, or 20.2%, in November 2006, 79 (15.8%), and by November 2007, 18 (3.6%), a long way behind gigabit Ethernet at 54% and InfiniBand at 24.2%. In the June 2014 TOP500 list, the number of supercomputers using Myrinet interconnect was 1 (0.2%). In November 2013, the assets of Myricom (including the Myrinet technology) were acquired by CSP Inc. In 2016, it was reported that Google had also offered to buy the company.

    Read more →
  • Pixel shift

    Pixel shift

    Pixel shift is a method in digital cameras for producing a super-resolution image. The method works by taking several images, after each such capture moving ("shifting") the sensor to a new position. In digital colour cameras that employ pixel shift, this avoids a major limitation inherent in using Bayer pattern for obtaining colour, and instead produces an image with increased colour resolution and, assuming a static subject or additional computational steps, an image free of colour moiré. Taking this idea further, sub-pixel shifting may increase the resolution of the final image beyond that suggested by the specified resolution of the image sensor. Additionally, assuming that the various individual captures are taken at the same sensitivity, the final combined image will have less image noise than a single capture. This can be thought of as an averaging effect (for instance, in a pixel shift image composed of four individual frames with a classic Bayer pattern, every pixel in the final colour image is based on two measurements of the green channel). == List of cameras implementing pixel shift == All of the following cameras are fabricated with one imaging sensor, thus any kind of pixel shift requires a movement of the whole sensor. === Canon === Canon R5: Contains a 45 Mpixel sensor. The High-Resolution Mode shifts the sensor by one pixel to obtain a sequence of nine images that are merged into a 400 Mpixel image. === Fujifilm === Fujifilm GFX50S II: contains a 51 Mpixel sensor. The Pixel Shift Multi-Shot mode shifts the imaging sensor by 0.5-pixel movements to obtain a sequence of 16 images that are subsequently merged into a 200 Mpixel image. Fujifilm GFX100, Fujifilm GFX100 II: contains a 102 Mpixel sensor. A sequence of 16 pixel shifted images are merged into a 400 Mpixel image. Fujifilm GFX100S, Fujifilm GFX100S II: contains a 102 Mpixel sensor. A sequence of 16 pixel shifted images are merged into a 400 Mpixel image Fujifilm GFX100IR: contains a 102 Mpixel sensor. A sequence of 16 pixel shifted images are merged into a 400 Mpixel image Fujifilm X-H2: contains a 40 Mpixel sensor. A sequence of 20 shifted images are merged into a 160 Mpixel image. Fujifilm X-T5: contains a 40 Mpixel sensor. A sequence of 20 shifted images are merged into a 160 Mpixel image. === Nikon === Nikon Z8: contains a 47.5 Mpixel sensor. The High Res shot mode shifts the imaging sensor by 0.5-pixel movements to obtain a sequence of up to 32 images that can be merged in Nikon's NX studio software. Nikon Zf: contains a 24 Mpixel sensor. The High Res shot mode shifts the imaging sensor by 0.5-pixel movements to obtain a sequence of up to 32 images that can be merged in Nikon's NX studio software. === Olympus === Olympus OM-D E-M1 Mark II: contains a 20.4 Mpixel sensor. The High Res shot mode produces a 50 Mpixel image. Olympus OM-D E-M5 Mark II: contains a 16 Mpixel sensor. The High Res shot mode shifts the imaging sensor by 0.5-pixel movements to obtain a sequence of 8 images that are subsequently merged into a 40 Mpixel image. Olympus OM-D E-M5 Mark III: contains a 20.4 Mpixel sensor. The High Res shot mode shifts the imaging sensor by 0.5-pixel movements to obtain a sequence of 8 images that are subsequently merged into a 50 Mpixel image. Olympus OM-D E-M1X: contains a 20.4 Mpixel sensor. The camera sports two pixel shift mode: (a) the 80Mp Tripod mode produces an 80 Mpixel image, (b) the Handheld High Res shot mode produces a 50 Mpixel image. Olympus PEN-F: contains a 20.4 Mpixel sensor. The High Res Shot mode takes multiple images, continually shifting the position of the sensor in sub-pixel increments. Combining these images results in either a 50MP JPEG or an 80MP Raw file. ==== OM System ==== OM System OM-1: contains a 20MPix sensor. The High Res Shot mode takes multiple images, and it can be used handheld or on a tripod. Handheld it will internally produce 50 Mpix files and 80 Mpix when mounted on a tripod. OM System OM-5: contains a 20MPix sensor. The High Res Shot mode takes multiple images, and it can be used handheld or on a tripod. Handheld it will internally produce 50 Mpix files and 80 Mpix when mounted on a tripod. === Panasonic === Panasonic Lumix DC-G9: contains a 20.3 Mpixel sensor. The High Resolution Mode takes a sequence of 8 shots in quick succession between which the sensor is shifted by 0.5 pixel for each image. These are subsequently merged into an 80 Mpixel image. Panasonic Lumix DC-S1: contains a 24.2 Mpixel sensor. The High Resolution Mode takes a sequence of shots in quick succession between which the sensor is shifted by a small amount. These are subsequently merged into a 96 Mpixel image. Panasonic Lumix DC-S1R: contains a 47.3 Mpixel sensor. The High Resolution Mode shifts the imaging sensor by a small increments to obtain a sequence of 8 images that are subsequently merged into a 187 Mpixel image. Panasonic Lumix DC-S1H Panasonic Lumix DC-S5 === Pentax === Pentax K-70: contains a 24.3 Mpixel sensor. The pixel shift mode takes a sequence of 4 shots between which the sensor is shifted by 1 pixel. These are subsequently merged into an image sporting 'all color data in each pixel to deliver super-high-resolution images'. Pentax KP: contains a 24.3 Mpixel sensor. The pixel shift mode takes a sequence of 4 shots between which the sensor is shifted by 1 pixel. These are subsequently merged into an image sporting 'high-resolution images with more accurate colours and much finer details'. Pentax K-3 II: contains a 24.3 Mpixel sensor. The pixel shift mode takes a sequence of 4 shots between which the sensor is shifted by 1 pixel. These are subsequently merged into an image sporting 'super-high-resolution images with far more truthful color reproduction and much finer details'. Pentax K-3 III: contains a 25.7 Mpixel sensor. The pixel shift mode takes a sequence of 4 shots between which the sensor is shifted by 1 pixel. These are subsequently merged into an image sporting 'a cancelling out of the Bayer pattern and removal of the need for sharpness-sapping demosaicing'. Pentax K-1: contains a 36.4 Mpixel sensor. The pixel shift mode takes a sequence of 4 shots between which the sensor is shifted by 1 pixel. These are subsequently merged into an image sporting 'improved detail and colour resolution'. Pentax K-1 II: contains a 36.4 Mpixel sensor. The camera sports two pixel shift mode: (a) a series of 4 tripod-stabilised images shifted by 1 pixel each are subsequently combined into a 47.3 Mpixel image, (b) a series of images taken in handheld mode are combined into a 47.3 Mpixel image that is, within limits, able to cope even with moving subjects. === Sony === Sony a6600: contains a 24.3 Mpixel sensor. The pixel shift mode takes a sequence of 4 shots between which the sensor is shifted by 1 pixel. These are subsequently merged into an image sporting 'all color data in each pixel to deliver super-high-resolution images'. Sony α7R III: contains a 42.4 Mpixel sensor. The pixel shift mode takes a sequence of 4 shots between which the sensor is shifted by 1 pixel. These are subsequently merged into a 42.4 Mpixel image with improved tonal resolution. Sony α7R IV: contains a 61 Mpixel sensor. The camera has two pixel shift modes, (a) the first takes a sequence of 4 shots between which the sensor is shifted by 1 pixel. These are subsequently merged into a 61 Mpixel image with improved tonal resolution, (b) the other takes a sequence of 16 shots between which the sensor is shifted by 0.5 pixel. These are subsequently merged into a 240 Mpixel image with both enhanced detail and improved tonal resolution. Sony α1: contains a 50 Mpixel sensor. The camera has two pixel shift modes, (a) the first takes a sequence of 4 shots between which the sensor is shifted by 1 pixel. These are subsequently merged into a 50 Mpixel image with improved tonal resolution, (b) the other takes a sequence of 16 shots between which the sensor is shifted by 0.5 pixel. These are subsequently merged into a 200 Mpixel image with both enhanced detail and improved tonal resolution. === Hasselblad === Hasselblad H3DII: the model H3DII-39 sports a 39 Mpixel sensor, the model H3DII-50 a 50 Mpixel sensor. Both enable a pixel shift mode which takes a sequence of 4 shots between which the sensor is shifted by 1 pixel. These are subsequently merged into a single image. Hasselblad H4D series: the model H4D-200MS contains a 50 Mpixel sensor. The sensor sports 3 different pixel shift modes which take (a) a sequence of 6 shots taken at slight offsets, (b) a sequence of 4 shots between which the sensor is shifted by 1 pixel, (c) a sequence of 4 shots between which the sensor is shifted by 0.5 pixels. Images obtained by all three modes are subsequently merged into 200 Mpixel images. Hasselblad H5D series: both models H5D-50c MS and H5D-200c MS contain a 50 Mpixel sensor. This sensor sports 2 different pixel shift modes which take (a) a sequence of 6 shots with full and half pixel moveme

    Read more →
  • Media intelligence

    Media intelligence

    Media intelligence uses data mining and data science to analyze public, social and editorial media content. It refers to marketing systems that synthesize billions of online conversations into relevant information. This allow organizations to measure and manage content performance, understand trends, and drive communications and business strategy. Media intelligence can include software as a service using big data terminology. This includes questions about messaging efficiency, share of voice, audience geographical distribution, message amplification, influencer strategy, journalist outreach, creative resonance, and competitor performance in all these areas. Media intelligence differs from business intelligence in that it uses and analyzes data outside company firewalls. Examples of that data are user-generated content on social media sites, blogs, comment fields, and wikis etc. It may also include other public data sources like press releases, news, blogs, legal filings, reviews and job postings. Media intelligence may also include competitive intelligence, wherein information that is gathered from publicly available sources such as social media, press releases, and news announcements are used to better understand the strategies and tactics being deployed by competing businesses. Media intelligence is enhanced by means of emerging technologies like ambient intelligence, machine learning, semantic tagging, natural language processing, sentiment analysis and machine translation. == Technologies used == Different media intelligence platforms use different technologies for monitoring, curating content, engaging with content, data analysis and measurement of communications and marketing campaign success. These technology providers may obtain content by scraping content directly from websites or by connecting to the API provided by social media, or other content platforms that are created for 3rd party developers to develop their own applications and services that access data. Technology companies may also get data from a data reseller. Some social media monitoring and analytics companies use calls to data providers each time an end-user develops a query. Others archive and index social media posts to provide end users with on-demand access to historical data and enable methodologies and technologies leveraging network and relational data. Additional monitoring companies use crawlers and spidering technology to find keyword references, known as semantic analysis or natural language processing. Basic implementation involves curating data from social media on a large scale and analyzing the results to make sense out of it.

    Read more →
  • Trusted Computing

    Trusted Computing

    Trusted Computing (TC) is a technology developed and promoted by the Trusted Computing Group. The term is taken from the field of trusted systems and has a specialized meaning that is distinct from the field of confidential computing. With Trusted Computing, the computer will consistently behave in expected ways, and those behaviors will be enforced by computer hardware and software. Enforcing this behavior is achieved by loading the hardware with a unique encryption key that is inaccessible to the rest of the system and the owner. TC is controversial as the hardware is not only secured for its owner, but also against its owner, leading opponents of the technology like free software activist Richard Stallman to deride it as "treacherous computing", and certain scholarly articles to use scare quotes when referring to the technology. Trusted Computing proponents such as International Data Corporation, the Enterprise Strategy Group and Endpoint Technologies Associates state that the technology will make computers safer, less prone to viruses and malware, and thus more reliable from an end-user perspective. They also state that Trusted Computing will allow computers and servers to offer improved computer security over that which is currently available. Opponents often state that this technology will be used primarily to enforce digital rights management policies (imposed restrictions to the owner) and not to increase computer security. Chip manufacturers Intel and AMD, hardware manufacturers such as HP and Dell, and operating system providers such as Microsoft include Trusted Computing in their products if enabled. The U.S. Army requires that every new PC it purchases comes with a Trusted Platform Module (TPM). As of July 3, 2007, so does virtually the entire United States Department of Defense. == Key concepts == Trusted Computing encompasses six key technology concepts, of which all are required for a fully Trusted system, that is, a system compliant to the TCG specifications: Endorsement key Secure input and output Memory curtaining / protected execution Sealed storage Remote attestation Trusted Third Party (TTP) === Endorsement key === The endorsement key is a 2048-bit RSA public and private key pair that is created randomly on the chip at manufacture time and cannot be changed. The private key never leaves the chip, while the public key is used for attestation and for encryption of sensitive data sent to the chip, as occurs during the TPM_TakeOwnership command. This key is used to allow the execution of secure transactions: every Trusted Platform Module (TPM) is required to be able to sign a random number (in order to allow the owner to show that he has a genuine trusted computer), using a particular protocol created by the Trusted Computing Group (the direct anonymous attestation protocol) in order to ensure its compliance of the TCG standard and to prove its identity; this makes it impossible for a software TPM emulator with an untrusted endorsement key (for example, a self-generated one) to start a secure transaction with a trusted entity. The TPM should be designed to make the extraction of this key by hardware analysis hard, but tamper resistance is not a strong requirement. === Memory curtaining === Memory curtaining extends common memory protection techniques to provide full isolation of sensitive areas of memory—for example, locations containing cryptographic keys. Even the operating system does not have full access to curtained memory. The exact implementation details are vendor specific. === Sealed storage === Sealed storage protects private information by binding it to platform configuration information including the software and hardware being used. This means the data can be released only to a particular combination of software and hardware. Sealed storage can be used for DRM enforcing. For example, users who keep a song on their computer that has not been licensed to be listened will not be able to play it. Currently, a user can locate the song, listen to it, and send it to someone else, play it in the software of their choice, or back it up (and in some cases, use circumvention software to decrypt it). Alternatively, the user may use software to modify the operating system's DRM routines to have it leak the song data once, say, a temporary license was acquired. Using sealed storage, the song is securely encrypted using a key bound to the trusted platform module so that only the unmodified and untampered music player on his or her computer can play it. In this DRM architecture, this might also prevent people from listening to the song after buying a new computer, or upgrading parts of their current one, except after explicit permission of the vendor of the song. === Remote attestation === Remote attestation allows changes to the user's computer to be detected by authorized parties. For example, software companies can identify unauthorized changes to software, including users modifying their software to circumvent commercial digital rights restrictions. It works by having the hardware generate a certificate stating what software is currently running. The computer can then present this certificate to a remote party to show that unaltered software is currently executing. Numerous remote attestation schemes have been proposed for various computer architectures, including Intel, RISC-V, and ARM. Remote attestation is usually combined with public-key encryption so that the information sent can only be read by the programs that requested the attestation, and not by an eavesdropper. To take the song example again, the user's music player software could send the song to other machines, but only if they could attest that they were running an authorized copy of the music player software. Combined with the other technologies, this provides a more restricted path for the music: encrypted I/O prevents the user from recording it as it is transmitted to the audio subsystem, memory locking prevents it from being dumped to regular disk files as it is being worked on, sealed storage curtails unauthorized access to it when saved to the hard drive, and remote attestation prevents unauthorized software from accessing the song even when it is used on other computers. To preserve the privacy of attestation responders, Direct Anonymous Attestation has been proposed as a solution, which uses a group signature scheme to prevent revealing the identity of individual signers. Proof of space (PoS) have been proposed to be used for malware detection, by determining whether the L1 cache of a processor is empty (e.g., has enough space to evaluate the PoSpace routine without cache misses) or contains a routine that resisted being evicted. === Trusted third party === == Known applications == The Microsoft products Windows Vista, Windows 7, Windows 8 and Windows RT make use of a Trusted Platform Module to facilitate BitLocker Drive Encryption. Other known applications with runtime encryption and the use of secure enclaves include the Signal messenger and the e-prescription service ("E-Rezept") by the German government. == Possible applications == === Digital rights management === Trusted Computing would allow companies to create a digital rights management (DRM) system which would be very hard to circumvent, though not impossible. An example is downloading a music file. Sealed storage could be used to prevent the user from opening the file with an unauthorized player or computer. Remote attestation could be used to authorize play only by music players that enforce the record company's rules. The music would be played from curtained memory, which would prevent the user from making an unrestricted copy of the file while it is playing, and secure I/O would prevent capturing what is being sent to the sound system. Circumventing such a system would require either manipulation of the computer's hardware, capturing the analogue (and thus degraded) signal using a recording device or a microphone, or breaking the security of the system. New business models for use of software (services) over Internet may be boosted by the technology. By strengthening the DRM system, one could base a business model on renting programs for a specific time periods or "pay as you go" models. For instance, one could download a music file which could only be played a certain number of times before it becomes unusable, or the music file could be used only within a certain time period. === Preventing cheating in online games === Trusted Computing could be used to combat cheating in online games. Some players modify their game copy in order to gain unfair advantages in the game; remote attestation, secure I/O and memory curtaining could be used to determine that all players connected to a server were running an unmodified copy of the software. === Verification of remote computation for grid computing === Trusted Computing could be used to guarantee participants in a grid computing sys

    Read more →