Solomonoff's theory of inductive inference proves that, under its common sense assumptions (axioms), the best possible scientific model is the shortest algorithm that generates the empirical data under consideration. In addition to the choice of data, other assumptions are that, to avoid the post-hoc fallacy, the programming language must be chosen prior to the data and that the environment being observed is generated by an unknown algorithm. This is also called a theory of induction. Due to its basis in the dynamical (state-space model) character of Algorithmic Information Theory, it encompasses statistical as well as dynamical information criteria for model selection. It was introduced by Ray Solomonoff, based on probability theory and theoretical computer science. In essence, Solomonoff's induction derives the posterior probability of any computable theory, given a sequence of observed data. This posterior probability is derived from Bayes' rule and some universal prior, that is, a prior that assigns a positive probability to any computable theory. Solomonoff proved that this induction is incomputable (or more precisely, lower semi-computable), but noted that "this incomputability is of a very benign kind", and that it "in no way inhibits its use for practical prediction" (as it can be approximated from below more accurately with more computational resources). It is only "incomputable" in the benign sense that no scientific consensus is able to prove that the best current scientific theory is the best of all possible theories. However, Solomonoff's theory does provide an objective criterion for deciding among the current scientific theories explaining a given set of observations. Solomonoff's induction naturally formalizes Occam's razor by assigning larger prior credences to theories that require a shorter algorithmic description. == Origin == === Philosophical === The theory is based in philosophical foundations, and was founded by Ray Solomonoff around 1960. It is a mathematically formalized combination of Occam's razor and the Principle of Multiple Explanations. All computable theories which perfectly describe previous observations are used to calculate the probability of the next observation, with more weight put on the shorter computable theories. Marcus Hutter's universal artificial intelligence builds upon this to calculate the expected value of an action. === Principle === Solomonoff's induction has been argued to be the computational formalization of pure Bayesianism. To understand, recall that Bayesianism derives the posterior probability P [ T | D ] {\displaystyle \mathbb {P} [T|D]} of a theory T {\displaystyle T} given data D {\displaystyle D} by applying Bayes rule, which yields P [ T | D ] = P [ D | T ] P [ T ] P [ D | T ] P [ T ] + ∑ A ≠ T P [ D | A ] P [ A ] {\displaystyle \mathbb {P} [T|D]={\frac {\mathbb {P} [D|T]\mathbb {P} [T]}{\mathbb {P} [D|T]\mathbb {P} [T]+\sum _{A\neq T}\mathbb {P} [D|A]\mathbb {P} [A]}}} where theories A {\displaystyle A} are alternatives to theory T {\displaystyle T} . For this equation to make sense, the quantities P [ D | T ] {\displaystyle \mathbb {P} [D|T]} and P [ D | A ] {\displaystyle \mathbb {P} [D|A]} must be well-defined for all theories T {\displaystyle T} and A {\displaystyle A} . In other words, any theory must define a probability distribution over observable data D {\displaystyle D} . Solomonoff's induction essentially boils down to demanding that all such probability distributions be computable. Interestingly, the set of computable probability distributions is a subset of the set of all programs, which is countable. Similarly, the sets of observable data considered by Solomonoff were finite. Without loss of generality, we can thus consider that any observable data is a finite bit string. As a result, Solomonoff's induction can be defined by only invoking discrete probability distributions. Solomonoff's induction then allows to make probabilistic predictions of future data F {\displaystyle F} , by simply obeying the laws of probability. Namely, we have P [ F | D ] = E T [ P [ F | T , D ] ] = ∑ T P [ F | T , D ] P [ T | D ] {\displaystyle \mathbb {P} [F|D]=\mathbb {E} _{T}[\mathbb {P} [F|T,D]]=\sum _{T}\mathbb {P} [F|T,D]\mathbb {P} [T|D]} . This quantity can be interpreted as the average predictions P [ F | T , D ] {\displaystyle \mathbb {P} [F|T,D]} of all theories T {\displaystyle T} given past data D {\displaystyle D} , weighted by their posterior credences P [ T | D ] {\displaystyle \mathbb {P} [T|D]} . === Mathematical === The proof of the "razor" is based on the known mathematical properties of a probability distribution over a countable set. These properties are relevant because the infinite set of all programs is a denumerable set. The sum S of the probabilities of all programs must be exactly equal to one (as per the definition of probability) thus the probabilities must roughly decrease as we enumerate the infinite set of all programs, otherwise S will be strictly greater than one. To be more precise, for every ϵ {\displaystyle \epsilon } > 0, there is some length l such that the probability of all programs longer than l is at most ϵ {\displaystyle \epsilon } . This does not, however, preclude very long programs from having very high probability. Fundamental ingredients of the theory are the concepts of algorithmic probability and Kolmogorov complexity. The universal prior probability of any prefix p of a computable sequence x is the sum of the probabilities of all programs (for a universal computer) that compute something starting with p. Given some p and any computable but unknown probability distribution from which x is sampled, the universal prior and Bayes' theorem can be used to predict the yet unseen parts of x in optimal fashion. == Mathematical guarantees == === Solomonoff's completeness === The remarkable property of Solomonoff's induction is its completeness. In essence, the completeness theorem guarantees that the expected cumulative errors made by the predictions based on Solomonoff's induction are upper-bounded by the Kolmogorov complexity of the (stochastic) data generating process. The errors can be measured using the Kullback–Leibler divergence or the square of the difference between the induction's prediction and the probability assigned by the (stochastic) data generating process. === Solomonoff's uncomputability === Unfortunately, Solomonoff also proved that Solomonoff's induction is uncomputable. In fact, he showed that computability and completeness are mutually exclusive: any complete theory must be uncomputable. The proof of this is derived from a game between the induction and the environment. Essentially, any computable induction can be tricked by a computable environment, by choosing the computable environment that negates the computable induction's prediction. This fact can be regarded as an instance of the no free lunch theorem. == Modern applications == === Artificial intelligence === Though Solomonoff's inductive inference is not computable, several AIXI-derived algorithms approximate it in order to make it run on a modern computer. The more computing power they are given, the closer their predictions are to the predictions of inductive inference (their mathematical limit is Solomonoff's inductive inference). Another direction of inductive inference is based on E. Mark Gold's model of learning in the limit from 1967 and has developed since then more and more models of learning. The general scenario is the following: Given a class S of computable functions, is there a learner (that is, recursive functional) which for any input of the form (f(0),f(1),...,f(n)) outputs a hypothesis (an index e with respect to a previously agreed on acceptable numbering of all computable functions; the indexed function may be required consistent with the given values of f). A learner M learns a function f if almost all its hypotheses are the same index e, which generates the function f; M learns S if M learns every f in S. Basic results are that all recursively enumerable classes of functions are learnable while the class REC of all computable functions is not learnable. Many related models have been considered and also the learning of classes of recursively enumerable sets from positive data is a topic studied from Gold's pioneering paper in 1967 onwards. A far reaching extension of the Gold’s approach is developed by Schmidhuber's theory of generalized Kolmogorov complexities, which are kinds of super-recursive algorithms.
Viola–Jones object detection framework
The Viola–Jones object detection framework is a machine learning object detection framework proposed in 2001 by Paul Viola and Michael Jones. It was motivated primarily by the problem of face detection, although it can be adapted to the detection of other object classes. In short, it consists of a sequence of classifiers. Each classifier is a single perceptron with several binary masks (Haar features). To detect faces in an image, a sliding window is computed over the image. For each image, the classifiers are applied. If at any point, a classifier outputs "no face detected", then the window is considered to contain no face. Otherwise, if all classifiers output "face detected", then the window is considered to contain a face. The algorithm is efficient for its time, able to detect faces in 384 by 288 pixel images at 15 frames per second on a conventional 700 MHz Intel Pentium III. It is also robust, achieving high precision and recall. While it has lower accuracy than more modern methods such as convolutional neural network, its efficiency and compact size (only around 50k parameters, compared to millions of parameters for typical CNN like DeepFace) means it is still used in cases with limited computational power. For example, in the original paper, they reported that this face detector could run on the Compaq iPAQ at 2 fps (this device has a low power StrongARM without floating point hardware). == Problem description == Face detection is a binary classification problem combined with a localization problem: given a picture, decide whether it contains faces, and construct bounding boxes for the faces. To make the task more manageable, the Viola–Jones algorithm only detects full view (no occlusion), frontal (no head-turning), upright (no rotation), well-lit, full-sized (occupying most of the frame) faces in fixed-resolution images. The restrictions are not as severe as they appear, as one can normalize the picture to bring it closer to the requirements for Viola-Jones. any image can be scaled to a fixed resolution for a general picture with a face of unknown size and orientation, one can perform blob detection to discover potential faces, then scale and rotate them into the upright, full-sized position. the brightness of the image can be corrected by white balancing. the bounding boxes can be found by sliding a window across the entire picture, and marking down every window that contains a face. This would generally detect the same face multiple times, for which duplication removal methods, such as non-maximal suppression, can be used. The "frontal" requirement is non-negotiable, as there is no simple transformation on the image that can turn a face from a side view to a frontal view. However, one can train multiple Viola-Jones classifiers, one for each angle: one for frontal view, one for 3/4 view, one for profile view, a few more for the angles in-between them. Then one can at run time execute all these classifiers in parallel to detect faces at different view angles. The "full-view" requirement is also non-negotiable, and cannot be simply dealt with by training more Viola-Jones classifiers, since there are too many possible ways to occlude a face. == Components of the framework == A full presentation of the algorithm is in. Consider an image I ( x , y ) {\displaystyle I(x,y)} of fixed resolution ( M , N ) {\displaystyle (M,N)} . Our task is to make a binary decision: whether it is a photo of a standardized face (frontal, well-lit, etc) or not. Viola–Jones is essentially a boosted feature learning algorithm, trained by running a modified AdaBoost algorithm on Haar feature classifiers to find a sequence of classifiers f 1 , f 2 , . . . , f k {\displaystyle f_{1},f_{2},...,f_{k}} . Haar feature classifiers are crude, but allows very fast computation, and the modified AdaBoost constructs a strong classifier out of many weak ones. At run time, a given image I {\displaystyle I} is tested on f 1 ( I ) , f 2 ( I ) , . . . f k ( I ) {\displaystyle f_{1}(I),f_{2}(I),...f_{k}(I)} sequentially. If at any point, f i ( I ) = 0 {\displaystyle f_{i}(I)=0} , the algorithm immediately returns "no face detected". If all classifiers return 1, then the algorithm returns "face detected". For this reason, the Viola-Jones classifier is also called "Haar cascade classifier". === Haar feature classifiers === Consider a perceptron f w , b {\displaystyle f_{w,b}} defined by two variables w ( x , y ) , b {\displaystyle w(x,y),b} . It takes in an image I ( x , y ) {\displaystyle I(x,y)} of fixed resolution, and returns f w , b ( I ) = { 1 , if ∑ x , y w ( x , y ) I ( x , y ) + b > 0 0 , else {\displaystyle f_{w,b}(I)={\begin{cases}1,\quad {\text{if }}\sum _{x,y}w(x,y)I(x,y)+b>0\\0,\quad {\text{else}}\end{cases}}} A Haar feature classifier is a perceptron f w , b {\displaystyle f_{w,b}} with a very special kind of w {\displaystyle w} that makes it extremely cheap to calculate. Namely, if we write out the matrix w ( x , y ) {\displaystyle w(x,y)} , we find that it takes only three possible values { + 1 , − 1 , 0 } {\displaystyle \{+1,-1,0\}} , and if we color the matrix with white on + 1 {\displaystyle +1} , black on − 1 {\displaystyle -1} , and transparent on 0 {\displaystyle 0} , the matrix is in one of the 5 possible patterns shown on the right. Each pattern must also be symmetric to x-reflection and y-reflection (ignoring the color change), so for example, for the horizontal white-black feature, the two rectangles must be of the same width. For the vertical white-black-white feature, the white rectangles must be of the same height, but there is no restriction on the black rectangle's height. ==== Rationale for Haar features ==== The Haar features used in the Viola-Jones algorithm are a subset of the more general Haar basis functions, which have been used previously in the realm of image-based object detection. While crude compared to alternatives such as steerable filters, Haar features are sufficiently complex to match features of typical human faces. For example: The eye region is darker than the upper-cheeks. The nose bridge region is brighter than the eyes. Composition of properties forming matchable facial features: Location and size: eyes, mouth, bridge of nose Value: oriented gradients of pixel intensities Further, the design of Haar features allows for efficient computation of f w , b ( I ) {\displaystyle f_{w,b}(I)} using only constant number of additions and subtractions, regardless of the size of the rectangular features, using the summed-area table. === Learning and using a Viola–Jones classifier === Choose a resolution ( M , N ) {\displaystyle (M,N)} for the images to be classified. In the original paper, they recommended ( M , N ) = ( 24 , 24 ) {\displaystyle (M,N)=(24,24)} . ==== Learning ==== Collect a training set, with some containing faces, and others not containing faces. Perform a certain modified AdaBoost training on the set of all Haar feature classifiers of dimension ( M , N ) {\displaystyle (M,N)} , until a desired level of precision and recall is reached. The modified AdaBoost algorithm would output a sequence of Haar feature classifiers f 1 , f 2 , . . . , f k {\displaystyle f_{1},f_{2},...,f_{k}} . The details of the modified AdaBoost algorithm is detailed below. ==== Using ==== To use a Viola-Jones classifier with f 1 , f 2 , . . . , f k {\displaystyle f_{1},f_{2},...,f_{k}} on an image I {\displaystyle I} , compute f 1 ( I ) , f 2 ( I ) , . . . f k ( I ) {\displaystyle f_{1}(I),f_{2}(I),...f_{k}(I)} sequentially. If at any point, f i ( I ) = 0 {\displaystyle f_{i}(I)=0} , the algorithm immediately returns "no face detected". If all classifiers return 1, then the algorithm returns "face detected". === Learning algorithm === The speed with which features may be evaluated does not adequately compensate for their number, however. For example, in a standard 24x24 pixel sub-window, there are a total of M = 162336 possible features, and it would be prohibitively expensive to evaluate them all when testing an image. Thus, the object detection framework employs a variant of the learning algorithm AdaBoost to both select the best features and to train classifiers that use them. This algorithm constructs a "strong" classifier as a linear combination of weighted simple “weak” classifiers. h ( x ) = sgn ( ∑ j = 1 M α j h j ( x ) ) {\displaystyle h(\mathbf {x} )=\operatorname {sgn} \left(\sum _{j=1}^{M}\alpha _{j}h_{j}(\mathbf {x} )\right)} Each weak classifier is a threshold function based on the feature f j {\displaystyle f_{j}} . h j ( x ) = { − s j if f j < θ j s j otherwise {\displaystyle h_{j}(\mathbf {x} )={\begin{cases}-s_{j}&{\text{if }}f_{j}<\theta _{j}\\s_{j}&{\text{otherwise}}\end{cases}}} The threshold value θ j {\displaystyle \theta _{j}} and the polarity s j ∈ ± 1 {\displaystyle s_{j}\in \pm 1} are determined in the training, as well as the coefficients α j {\displaystyle \alpha _{j}} . Here a simplified version of the lea
SIPRNet
The Secret Internet Protocol Router Network (SIPRNet) is "a system of interconnected computer networks used by the U.S. Department of Defense and the U.S. Department of State to transmit classified information (up to and including information classified SECRET) by packet switching over the 'completely secure' environment". It also provides services such as hypertext document access and electronic mail. SIPRNet is a component of the Defense Information Systems Network. Other components handle communications with other security needs, such as the NIPRNet, which is used for nonsecure communications, and the Joint Worldwide Intelligence Communications System (JWICS), which is used for Top Secret communications. == Access == According to the U.S. Department of State Web Development Handbook, domain structure and naming conventions are the same as for the open internet, except for the addition of a second-level domain, like, e.g., "sgov" between state and gov: openforum.state.sgov.gov. Files originating from SIPRNet are marked by a header tag "SIPDIS" (SIPrnet DIStribution). A corresponding second-level domain smil.mil exists for DoD users. Access is also available to a "...small pool of trusted allies, including Australia, Canada, the United Kingdom and New Zealand...". This group (including the US) is known as the Five Eyes. SIPRNet was one of the networks accessed by Chelsea Manning, convicted of leaking the video used in WikiLeaks' "Collateral Murder" release as well as the source of the US diplomatic cables published by WikiLeaks in November 2010. == Alternate names == SIPRNet and NIPRNet are referred to colloquially as SIPPERnet and NIPPERnet (or simply sipper and nipper), respectively.
Data marketplace
Data marketplace is an online platform for sharing and consuming data in the form of data assets or data products. Part of the data management stack, it aims to bring together data producers and data consumers (including business users and AI) in a single space, with the objective of increasing access to understandable, high-quality data. Included within its Data Marketplaces and Exchange (DME) category by Gartner, data marketplaces can provide data internally within an organization, externally with partners, or as open data. == Concept == Digitization has dramatically increased data volumes within organizations, with IDC predicting that by 2025 the world will contain 175 zettabytes of data. This has created a need to both manage this data and provide access to it to enable business intelligence and data analysis. However, data is often scattered within multiple systems (such as data warehouses and data lakes), and is in formats that are only understandable by technical experts, such as data scientists. According to IDC, 81% of IT leaders cite data silos as a major barrier to digital transformation. This means that data is not freely available to business users or external audiences such as partners or citizens, limiting its value, and holding back AI deployments. Data marketplaces solve this issue, providing seamless, self-service access to high-quality data in an understandable, secure and auditable manner. They break down data silos, reduce friction in data access, and enable a broader range of users, including non-technical profiles, to find, understand, and consume data autonomously. Data assets on the marketplace can be raw data, data visualizations or data products. Data marketplaces combine data management functions such as data governance with the user-friendly experience offered by e-commerce marketplaces in order to increase the usage of data. These include features such as powerful search engines, feedback, ratings, subscriptions and product description sheets. According to Gartner, data marketplaces provide infrastructure, transactional capabilities, and services for both consumers and providers of data assets. == History and timeline == Data marketplaces have evolved since they first emerged in terms of both their scope and usage. === 2000s === With the rise of the internet, data brokers began collecting, aggregating, distributing and selling personal, financial and marketing data to third parties online. Data marketplaces were deployed to monetize this data, making it discoverable and accessible to users, either through subscriptions or one-off purchases. At the same time, regulations, such as the US Open Government Initiative of 2009 and others around the world mandated greater transparency and data sharing with the public. Data sharing portals were created by public and government bodies to make this information available through self-service to all users. === 2010s === Due to the growth of big data and cloud platforms, cloud-based data exchange platforms emerged. These were offered by major infrastructure providers, and included Amazon Web Services (AWS) Data Exchange, Snowflake Data Marketplace, and the Google Cloud Platform. These platforms moved beyond simple data brokerage or open data by providing structured, catalogued data sharing between organizations. === 2020s === Driven by a need to increase internal data sharing with both business users and AI, organizations are now looking to adopt internal data marketplaces. These aim to democratize data consumption by providing seamless access for all employees and AI to trusted data, including data products, through an intuitive, e-commerce style experience. According to Gartner analyst Richa Jha, "by providing a single, governed platform for discovering, sharing, and scaling data products, data marketplaces drive productivity, collaboration, and ROI across the enterprise." == Data marketplaces within the overall data architecture == Data marketplaces provide a consumption and collaboration layer for data. That means they complement and integrate with other parts of the overall data architecture, including: === Data warehouses and data lakes === Data marketplaces connect to data sources, such as data warehouses or data lakes, to provide intuitive access to the data stored within them, enabling data to be shared and distributed to non-technical audiences. Access can be direct, with data and data products stored within the data marketplace or virtualized. === Data catalog === A data catalog provides a technical inventory of an organization's data estate. It collects technical information on all available data assets within an organization, based on metadata descriptions. This ensures traceability, and supports compliance and governance requirements. Unlike a data marketplace, a data catalog does not provide access to data, and is designed to be used by data professionals, rather than the business. This means it lacks an intuitive, understandable interface and is consequently not easily accessible by business users. === Data mesh === Data mesh is an architecture and framework for data management, first defined by Zhamak Dehghani in 2019. It aims to decentralize data ownership to delegate responsibility, empowering teams and focusing on delivering data to users in the form of self-service data products. The data marketplace is a central pillar of data mesh, providing intuitive access to these data products, and creating a collaboration space for data owners and data consumers. === Data product === Data products are high-value, consumable data assets that package high-quality data and associated tools to enable seamless usage by business users at scale. First defined by McKinsey in 2022, they have an identified owner, a service level agreement (SLA), and a reusability logic. == Core components of a data marketplace == A data marketplace typically includes specific core components: === E-commerce style interface === An e-commerce style experience that engages non-technical users, minimizes the need for training and builds confidence and trust in data. Look and feel should be customizable to incorporate corporate design guidelines to ensure consistency with other organizational applications. === Built-in data catalog === As in a standalone data catalog, this indexes all available data, based on metadata that includes type, source, owner, freshness, and quality level. === Discovery and search engine === This enables users to search, filter, explore and discover available data intuitively. As in an e-commerce marketplace, it should be intelligent, and provide relevant results based on natural language queries. === Access control and security management === Data marketplaces will contain data that needs to be protected under regulations such as the General Data Protection Regulation (GDPR) in Europe, the California Consumer Privacy Act (CCPA) in the United States, and sector-specific frameworks in industries such as finance and healthcare. To ensure both security and compliance while maximizing data consumption, the data marketplace should include granular access management and a full audit trail. === Semantic layer and business glossary === Different parts of the business are likely to use different terms to describe data. This leads to inconsistencies and an inability to share data across systems and teams. The semantic layer and business glossary standardize a shared vocabulary and common definitions of business indicators and concepts, providing a single language for data across the business and for AI agents. === Data governance mechanisms === These enforce corporate data governance policies, ensuring data traceability through data lineage, quality certification, usage monitoring, and continuous improvement through user feedback loops. === Collaboration features === As on an e-commerce website, a data marketplace should provide collaboration features that bring together data users and data owners. This includes the ability to rate data products, share use cases, and provide feedback to data owners, creating a community around data and supporting a data-driven culture. == Types of data marketplace == While they share the same underlying technology, data marketplaces can be deployed in three broad ways: === Internal data marketplaces === These bring together data from across an organization and make it available via self-service to employees from across the business. They aim to widen access to data and consequently to improve decision-making and reporting, increase performance and maximize efficiency. === Ecosystem data marketplaces === These extend sharing beyond a single organization, enabling multiple partners (public institutions, industry players, research bodies) to share and consume data within a governed framework. Data can be provided by all parties or simply by one organization and consumed by others. Ecosystem data marketplaces are particularly relevant in
Point-to-point encryption
Point-to-point encryption (P2PE) is a standard established by the PCI Security Standards Council. Payment solutions that offer similar encryption but do not meet the P2PE standard are referred to as end-to-end encryption (E2EE) solutions. The objective of P2PE and E2EE is to provide a payment security solution that instantaneously converts confidential payment card (credit and debit card) data and information into indecipherable code at the time the card is swiped, in order to prevent hacking and fraud. It is designed to maximize the security of payment card transactions in an increasingly complex regulatory environment. == The standard == The P2PE Standard defines the requirements that a "solution" must meet in order to be accepted as a PCI-validated P2PE solution. A "solution" is a complete set of hardware, software, gateway, decryption, device handling, etc. Only "solutions" can be validated; individual pieces of hardware such as card readers cannot be validated. It is also a common mistake to refer to P2PE validated solutions as "certified"; there is no such certification. The determination of whether or not a solution meets the P2PE standard is the responsibility of a P2PE Qualified Security Assessor (P2PE-QSA). P2PE-QSA companies are independent third-party companies who employ assessors that have met the PCI Security Standards Council's requirements for education and experience, and have passed the requisite exam. The PCI Security Standards Council does not validate solutions. == How it works == As a payment card is swiped through a card reading device, referred to as a point of interaction (POI) device, at the merchant location or point of sale, the device immediately encrypts the card information. A device that is part of a PCI-validated P2PE solution uses an algorithmic calculation to encrypt the confidential payment card data. From the POI, the encrypted, indecipherable codes are sent to the payment gateway or processor for decryption. The keys for encryption and decryption are never available to the merchant, making card data entirely invisible to the retailer. Once the encrypted codes are within the secure data zone of the payment processor, the codes are decrypted to the original card numbers and then passed to the issuing bank for authorization. The bank either approves or rejects the transaction, depending upon the card holder's payment account status. The merchant is then notified if the payment is accepted or rejected to complete the process along with a token that the merchant can store. This token is a unique number reference to the original transaction that the merchant can use should they ever be needed to perform research or refund the customer without ever knowing the customer's card information (tokenization). There are also Qualified Integrator and Reseller (QIR) Companies, which are businesses authorized to "implement, configure, and/or support validated" PA-DSS Payment Applications, and perform qualified installations. == Solution providers == According to the PCI Security Standards Council:The P2PE solution provider is a third-party entity (for example, a processor, acquirer, or payment gateway) that has overall responsibility for the design and implementation of a specific P2PE solution, and manages P2PE solutions for its merchant customers. The solution provider has overall responsibility for ensuring that all P2PE requirements are met, including any P2PE requirements performed by third-party organizations on behalf of the solution provider (for example, certification authorities and key-injection facilities). == Benefits == === Customer benefits === P2PE significantly reduces the risk of payment card fraud by instantaneously encrypting confidential cardholder data at the moment a payment card is swiped or "dipped" if it is a chip card at the card reading device (payment terminal) or POI. === Merchant benefits === P2PE significantly facilitates merchant responsibilities: With a P2PE validated solution, merchants save significant time and money as PCI requirements may be greatly reduced. Payment Card Industry Data Security Standard (PCI DSS). For organizations who use a P2PE validated solution provider, the PCI Self Assessment Questionnaire is reduced from 12 sections to 4 sections and the controls are reduced from 329 questions to just 35. In the event of fraud, the P2PE Solution Provider, not the merchant, is held accountable for data loss and resulting fines that may be assessed by the card brands (American Express, Visa, MasterCard, Discover, and JCB). The PCI Security Standards Council does not assess penalties on Solution Providers or Merchants. The payment process with P2PE is quicker than other transaction processes, thus creating simpler and faster customer–merchant transactions. == Point-to-point encryption versus end-to-end encryption == === Point-to-point === A point-to-point connection directly links system 1 (the point of payment card acceptance) to system 2 (the point of payment processing). A true P2PE solution is determined with three main factors: The solution uses a hardware-to-hardware encryption and decryption process along with a POI device that has SRED (Secure Reading and Exchange of Data) listed as a function. The solution has been validated to the PCI P2PE Standard which includes specific POI device requirements such as strict controls regarding shipping, receiving, tamper-evident packaging, and installation. A solution includes merchant education in the form of a P2PE Instruction Manual, which guides the merchant on POI device use, storage, return for repairs, and regular PCI reporting. === End-to-end === End-to-end encryption as the name suggests has the advantage over P2PE that card details are not unencrypted between the two endpoints. If the endpoints are a PCI PED validated PIN pad and a POS acquirer, there is no opportunity for the card details to be intercepted. It is obviously important that the endpoints (the PED and gateway) are provided by PCI accredited organisations. == PCI point-to-point encryption requirements == The requirements include: Secure encryption of payment card data at the point of interaction (POI), P2PE validated application(s) at the point of interaction, Secure management of encryption and decryption devices, Management of the decryption environment and all decrypted account data, Use of secure encryption methodologies and cryptographic key operations, including key generation, distribution, loading/injection, administration, and usage.
Amazon Kinesis
Amazon Kinesis is a family of services provided by Amazon Web Services (AWS) for processing and analyzing real-time streaming data at a large scale. Launched in November 2013, it offers developers the ability to build applications that can consume and process data from multiple sources simultaneously. Kinesis supports multiple use cases, including real-time analytics, log and event data collection, and real-time processing of data generated by IoT devices. == History == Amazon Kinesis was launched by Amazon Web Services (AWS) in November 2013 as a managed service for processing and analyzing real-time streaming data at a large scale. The service was introduced to address the growing need for businesses to process and analyze data as it was generated, rather than in batches, allowing for real-time insights and decision-making. Since its launch, the Amazon Kinesis family of services has expanded to include four main components: Kinesis Data Streams, Kinesis Data Firehose, Kinesis Data Analytics, and Kinesis Video Streams. Each of these components serves a specific purpose in the processing and analysis of real-time streaming data. In August 2015, AWS announced the availability of Kinesis Data Firehose, a fully managed service for delivering real-time streaming data to destinations such as Amazon S3, Amazon Redshift, and Amazon Elasticsearch. A year later in August 2016, AWS launched Kinesis Data Analytics, enabling customers to analyze streaming data in real time using standard SQL queries. AWS introduced Kinesis Video Streams, a fully managed service for securely capturing, processing, and storing video streams for analytics and machine learning applications, was introduced by AWS in November 2017. == Components == Amazon Kinesis is composed of four main services: Kinesis Data Streams, Kinesis Data Firehose, Kinesis Data Analytics, and Kinesis Video Streams. === Kinesis Data Streams === Kinesis Data Streams is a scalable and durable real-time data streaming service that captures and processes gigabytes of data per second from multiple sources. It enables the storage and processing of data in real time, making it useful for applications that require immediate insights, such as monitoring and alerting. === Kinesis Data Firehose === Kinesis Data Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon S3, Amazon Redshift, Amazon Elasticsearch, and AWS-partner data stores. With Data Firehose, users can configure and scale data delivery without manual intervention. === Kinesis Data Analytics === Kinesis Data Analytics enables the analysis of streaming data in real time using standard SQL or Apache Flink. === Kinesis Video Streams === Kinesis Video Streams is a fully managed service for securely capturing, processing, and storing video streams for analytics and machine learning. It supports multiple video codecs and streaming protocols, making it suitable for various use cases, such as security and surveillance, video-enabled IoT devices, and live event broadcasting. == Integration == Amazon Kinesis can be easily integrated with other AWS services, such as AWS Lambda, Amazon S3, Amazon Redshift, and Amazon OpenSearch. This integration enables developers to build end-to-end streaming data processing applications, taking advantage of the extensive AWS ecosystem. == Use cases == Some common use cases for Amazon Kinesis include: Real-time analytics: Analyzing streaming data in real time to provide immediate insights and make data-driven decisions. Log and event data collection: Collecting, processing, and analyzing log and event data generated by applications, infrastructure, and devices. IoT data processing: Processing and analyzing large volumes of data generated by IoT devices in real time. Machine learning: Ingesting and processing video streams for machine learning applications, such as object recognition, facial recognition, and sentiment analysis. == Pricing == Amazon Kinesis follows a pay-as-you-go pricing model, with costs depending on the chosen service, data volume, and processing power required. AWS provides a free tier for Kinesis Data Streams and Kinesis Data Firehose, allowing users to get started with the services at no cost.
Embedded analytics
Embedded analytics enables organisations to integrate analytics capabilities into their own, often software as a service, applications, portals, or websites. This differs from embedded software and web analytics (also commonly known as product analytics). This integration typically provides contextual insights, quickly, easily and conveniently accessible since these insights should be present on the web page right next to the other, operational, parts of the host application. Insights are provided through interactive data visualisations, such as charts, diagrams, filters, gauges, maps and tables often in combination as dashboards embedded within the system. This setup enables easier, in-depth data analysis without the need to switch and log in between multiple applications. Embedded analytics is also known as customer facing analytics. Embedded analytics is the integration of analytic capabilities into a host, typically browser-based, business-to-business, software as a service, application. These analytic capabilities would typically be relevant and contextual to the use-case of the host application. == History == The term "embedded analytics" was first used by Howard Dresner: consultant, author, former Gartner analyst and inventor of the term "business intelligence" said Howard Dresner while he was working for Hyperion Solutions, a company that Oracle bought in 2007. Oracle started then to use the term "embedded analytics" at their press release for Oracle Rapid Planning on 2009 . == Considerations with embedded analytics == When evaluating embedding analytics, consideration would normally be given to integration at various levels, these would likely include: security integration, data integration, application logic integration, business rules integration, and user experience integration. This is in contrast to traditional BI, which expects users to leave their workflow applications to look at data insights in a separate set of tools. This immediacy makes embedded analytics much more intuitive and likely to be valued by users. A December 2016 report from Nucleus Research found that using BI tools, which require toggling between applications, can take up as much as 1–2 hours of an employee's time each week, whereas embedded analytics eliminate the need to toggle between apps.