AI Generator Reader

AI Generator Reader — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Sentence embedding

    Sentence embedding

    In natural language processing, a sentence embedding is a representation of a sentence as a vector of numbers which encodes meaningful semantic information. State of the art embeddings are based on the learned hidden layer representation of dedicated sentence transformer models. BERT pioneered an approach involving the use of a dedicated [CLS] token prepended to the beginning of each sentence inputted into the model; the final hidden state vector of this token encodes information about the sentence and can be fine-tuned for use in sentence classification tasks. In practice however, BERT's sentence embedding with the [CLS] token achieves poor performance, often worse than simply averaging non-contextual word embeddings. SBERT later achieved superior sentence embedding performance by fine tuning BERT's [CLS] token embeddings through the usage of a siamese neural network architecture on the SNLI dataset. Other approaches are loosely based on the idea of distributional semantics applied to sentences. Skip-Thought trains an encoder-decoder structure for the task of neighboring sentences predictions; this has been shown to achieve worse performance than approaches such as InferSent or SBERT. An alternative direction is to aggregate word embeddings, such as those returned by Word2vec, into sentence embeddings. The most straightforward approach is to simply compute the average of word vectors, known as continuous bag-of-words (CBOW). However, more elaborate solutions based on word vector quantization have also been proposed. One such approach is the vector of locally aggregated word embeddings (VLAWE), which demonstrated performance improvements in downstream text classification tasks. == Applications == In recent years, sentence embedding has seen a growing level of interest due to its applications in natural language queryable knowledge bases through the usage of vector indexing for semantic search. LangChain for instance utilizes sentence transformers for purposes of indexing documents. In particular, an indexing is generated by generating embeddings for chunks of documents and storing (document chunk, embedding) tuples. Then given a query in natural language, the embedding for the query can be generated. A top k similarity search algorithm is then used between the query embedding and the document chunk embeddings to retrieve the most relevant document chunks as context information for question answering tasks. This approach is also known formally as retrieval-augmented generation. Though not as predominant as BERTScore, sentence embeddings are commonly used for sentence similarity evaluation which sees common use for the task of optimizing a Large language model's generation parameters is often performed via comparing candidate sentences against reference sentences. By using the cosine-similarity of the sentence embeddings of candidate and reference sentences as the evaluation function, a grid-search algorithm can be utilized to automate hyperparameter optimization. == Evaluation == A way of testing sentence encodings is to apply them on Sentences Involving Compositional Knowledge (SICK) corpus for both entailment (SICK-E) and relatedness (SICK-R). In the best results are obtained using a BiLSTM network trained on the Stanford Natural Language Inference (SNLI) Corpus. The Pearson correlation coefficient for SICK-R is 0.885 and the result for SICK-E is 86.3. A slight improvement over previous scores is presented in: SICK-R: 0.888 and SICK-E: 87.8 using a concatenation of bidirectional Gated recurrent unit.

    Read more →
  • VK (service)

    VK (service)

    VK (short for its original name VKontakte; Russian: ВКонтакте, lit. 'InContact') is a Russian online social media and social networking service based in Saint Petersburg. VK is available in multiple languages but it is predominantly used by Russian speakers. VK users can message each other publicly or privately, edit messages, create groups, public pages, and events; share and tag images, audio, and video; and play browser-based games. As of August 2018, VK had at least 500 million accounts. As of November 2022, it was the sixth most popular website in Russia. The network was also popular in Ukraine until it was banned by the Verkhovna Rada in 2017. According to Semrush, in 2024, VK was the 30th most visited website in the world; as YouTube is subject to blocking in Russia, VK Video overtook Google's top position in monthly web traffic for the first time in December 2024, as part of the major substitution to domestic business. == History == VKontakte was conceived in 2006 when Pavel Durov, creator of the popular student forum spbgu.ru, met his former classmate Vyacheslav Mirilashvili in St. Petersburg after graduating from the Faculty of Philology at St Petersburg State University. Vyacheslav showed Durov the increasingly popular Facebook, after which the friends decided to create a new Russian social network. Lev Leviev, an Israeli classmate of Vyacheslav Mirilashivili, became the third co-founder. Vyacheslav Mirilashvili borrowed the money from his billionaire father and became the largest shareholder. Lev Leviev took over operational management, and Durov became CEO. Pavel Durov convinced his older brother Nikolai, a multiple winner of international math and programming competitions, to develop the site. Durov launched VKontakte for beta testing in September 2006. The following month, the domain name Vkontakte.ru was registered. The new project was incorporated on 19 January 2007 as a Russian private limited company. In February 2007 the site reached a user base of over 100,000 and was recognized as the second largest company in Russia's nascent social network market. In the same month, the site was subjected to a severe DDoS attack, which briefly put it offline. The user base reached 1 million in July 2007, and 10 million in April 2008. In December 2008 VK overtook rival Odnoklassniki as Russia's most popular social networking service. == Website == Similar to many social networks, the platform's fundamental features revolve around private messaging, sharing photos, posting status updates, and exchanging links with friends. VK also provides tools for administering online communities and managing celebrity pages. The site allows its users to upload, search and stream media content, such as videos and music. VK features an advanced search engine, that allows complex queries for finding friends, as well as a real-time news search. VK updated its features and design in April 2016. === Features === Messaging. VK Private Messages can be exchanged between groups of 2 to 500 people. An email address can also be specified as the recipient. Each message may contain up to 10 attachments: Photos, Videos, Audio Files, Maps (an embedded map with a manually placed marker), and Documents. News. VK users can post on their profile walls, each post may contain up to 10 attachments – media files, maps, and documents (see above). User mentions and hashtags are supported. In the case of multiple photo attachments, the previews are automatically scaled and arranged in a magazine-style layout. The news feed can be switched between all news (default) and most interesting modes. The site features a news-recommendation engine, global real-time search, and individual search for posts and comments on specific users' walls. Communities. VK features three types of communities. Groups are better suited for decentralized communities (discussion boards, wiki-style articles, editable by all members, etc.). Public pages is a news feed-orientated broadcasting tool for celebrities and businesses. The two types are largely interchangeable, the main difference being in the default settings. The third type of community is called Events, which are used for appropriately organizing concerts and events in an appropriate way. Like buttons. VK like buttons for posts, comments, media, and external sites operate differently from Facebook. Liked content doesn't get automatically pushed to the user's wall, but is saved in the private Favorites section instead. The user has to press a second 'share with friends' button to share an item on their wall or send it via private message to a friend. Privacy. Users can control the availability of their content within the network and on the Internet. Blanket and granular privacy settings are available for pages and individual content. Synchronization with other social networks. Any news published on the VK wall will appear on Facebook or Twitter. Certain news may not be published by clicking on the logo next to the "Send" button. Editing a post in VK does not change the post in Facebook or Twitter and vice versa. However, removing the news in VK will remove it from other social networks. SMS service. Russian users can receive and reply to a private message or leave a comment for community news using SMS. Music. Users have access to the audio files uploaded by other users. In addition, users can upload the audio files themselves, create playlists and share audios with others by attaching to messages and wall posts. The uploaded audio files cannot violate copyright laws. === Popularity === As of May 2017, according to Alexa Internet ranking, VK is one of the most visited websites in some Eurasian countries. It is: 4th most visited in Russia; 3rd most visited in Belarus; 6th most visited in Kazakhstan; 8th most visited in Kyrgyzstan and Moldova; 12th most visited in Latvia. It was the fourth most viewed site in Ukraine until, in May 2017, the Ukrainian government banned the use of VK in Ukraine. According to a study for May 2018 conducted by Factum Group Ukraine VK remained the fourth most viewed site in Ukraine, but Facebook was twice as much visited. For 2019, VK appeared as the most visited social network in Ukraine according to Alexa. According to the Internet Association of Ukraine the share of Ukrainian Internet users who visit VK daily had fallen from 54% to 10% from September 2016 to September 2019. They also claimed in November 2019 that Facebook was the most popular social network. VK was expected to gain most of the users lost by Facebook and Instagram after they were blocked in Russia in 2022, according to a Calltouch poll. == Ownership == Initially, founder and CEO Pavel Durov owned 20% of shares (although he had majority voting power through proxy votes), and a trio of Russian-Israeli investors Yitzchak Mirilashvili, his father Mikhael Mirilashvili, and Lev Leviev owned 60%, 10%, and 10% respectively. In 2007, Digital Sky Technologies, an investment company managed by Yuri Milner, acquired a total of 24.99% of the shares from shareholders, investing $16.3 million. In preparation for the IPO in September 2010, DST separated international and Russian assets: the former formed the DST Global fund, while the latter, including VKontakte and rival social network Odnoklassniki, were merged into Mail.ru Group. Mail.ru Group used part of the money to acquire 7.5% of the social network for $112.5 million at a valuation of the entire project of 1.5 billion dollars. After exercising a 7.5% option in July 2011 for $111.7 million, Mail.ru Group accumulated a 39.99% stake in VKontakte. The head of Mail.ru Group, Dmitry Grishin, voiced the company's intention to gain 100% control over VKontakte. MRG was discussing with shareholders to buy out shares from the valuation of the entire company in $2-3 billion. In the summer of 2011, Mirilashvili and Leviev were ready to accept in payment owned by Mail.ru Group shares of Facebook, Groupon, and Zynga, but the deal failed due to Durov's unwillingness to sell a stake on MRG terms. Later, the co-founders considered VKontakte's IPO as an alternative. In March 2012, Durov "accidentally" became plugged into the negotiations where Mirilashvili and Leviev discussed selling their stakes directly to Mail.ru Group's main investor, Alisher Usmanov. On the same day, Durov deleted the pages of the first co-investors, stopped contacting them, and soon announced that VKontakte would postpone its IPO indefinitely. On 29 May 2012, Mail.ru Group announced its decision to yield control of the company to Durov by offering him the voting rights on its shares. Combined with Durov's personal 12% stake, this gave him 52% of the votes. In April 2013, the Mirilashvili family sold its 40% share in VK to United Capital Partners for $1.12 billion, while Lev Leviev sold his 8% share in the same deal, giving United Capital Partners 48% ownership. In January 2014, VK's founder Pavel Durov sold his 12% stake in the company to I

    Read more →
  • Ciphertext expansion

    Ciphertext expansion

    In cryptography, the term ciphertext expansion refers to the length increase of a message when it is encrypted. Many modern cryptosystems cause some degree of expansion during the encryption process, for instance when the resulting ciphertext must include a message-unique Initialization Vector (IV). Probabilistic encryption schemes cause ciphertext expansion, as the set of possible ciphertexts is necessarily greater than the set of input plaintexts. Certain schemes, such as Cocks Identity Based Encryption, or the Goldwasser-Micali cryptosystem result in ciphertexts hundreds or thousands of times longer than the plaintext. Ciphertext expansion may be offset or increased by other processes which compress or expand the message, e.g., data compression or error correction coding. == Reasons why Ciphertext expansion can occur == === Probabilistic Encryption === Probabilistic encryption schemes, such as the Goldwasser-Micali cryptosystem, necessarily produce ciphertexts that are longer than the original plaintexts. This is because the set of possible ciphertexts must be larger than the set of plaintexts to achieve semantic security. === Initialization Vectors (IVs) === Many block cipher modes of operation, like Cipher Block Chaining (CBC), require the use of an Initialization Vector (IV) that is unique for each message. The IV is typically appended to the ciphertext, resulting in expansion. === Redundancy and Error Correction === Some cryptographic schemes intentionally introduce redundancy or error correction codes into the ciphertext to protect against tampering or transmission errors. This added data increases the ciphertext size. === Specific Cryptosystems === Certain cryptographic schemes, such as Cocks Identity-Based Encryption, can produce ciphertexts that are hundreds or thousands of times longer than the original plaintext. This extreme expansion is a design choice to achieve the desired security properties. Ciphertext expansion can be offset or increased by other processes that compress or expand the message, such as data compression or error correction coding. The overall impact on message size depends on the relative strengths of these competing effects.

    Read more →
  • SCinet

    SCinet

    SCinet is the high-performance network built annually by volunteers in support of SC (formerly Supercomputing, the International Conference for High Performance Computing, Networking, Storage and Analysis). SCinet is the primary network for the yearly conference and is used by attendees and exhibitors to demonstrate and test high-performance computing and networking applications. == International Community == SCinet is also a hub for the international networking community. It provides a platform to share the latest research, technologies, and demonstrations for networks, network technology providers, and even software developers who are in charge of supporting HPC communities at their own institutions or organizations. == Volunteers == Nearly 200 volunteers from educational institutions, high performance computing sites, equipment vendors, research and education networks, government agencies and telecommunications carriers collaborate via technology and in-person to design, build and operate SCinet. While many of these credentialed individuals have volunteered at SCinet for years, first timers join the team each year. They include international students and participants in the National Science Foundation-funded Women in IT Networking at SC (WINS) program. The 2017 SCinet team included women and men from high performance computing institutions in the U.S. and throughout the world. == History == Originated in 1991 as an initiative within the SC conference to provide networking to attendees, SCinet has grown to become the "World's Fastest Network" during the duration of the conference. For 29 years, SCinet has provided SC attendees and the high performance computing (HPC) community with the innovative network platform necessary to internationally interconnect, transport, and display HPC research during SC. Historically, SCinet has been used as a platform to test networking technology and applications which have found their way into common use. == Research and development == In the past years, SCinet deployed conference wide networking technologies such as ATM, FDDI, HiPPi before they were deployed commercially.

    Read more →
  • Semantic space

    Semantic space

    Semantic spaces in the natural language domain aim to create representations of natural language that are capable of capturing meaning. The original motivation for semantic spaces stems from two core challenges of natural language: Vocabulary mismatch (the fact that the same meaning can be expressed in many ways) and ambiguity of natural language (the fact that the same term can have several meanings). The application of semantic spaces in natural language processing (NLP) aims at overcoming limitations of rule-based or model-based approaches operating on the keyword level. The main drawback with these approaches is their brittleness, and the large manual effort required to create either rule-based NLP systems or training corpora for model learning. Rule-based and machine learning based models are fixed on the keyword level and break down if the vocabulary differs from that defined in the rules or from the training material used for the statistical models. Research in semantic spaces dates back more than 20 years. In 1996, two papers were published that raised a lot of attention around the general idea of creating semantic spaces: latent semantic analysis and Hyperspace Analogue to Language. However, their adoption was limited by the large computational effort required to construct and use those semantic spaces. A breakthrough with regard to the accuracy of modelling associative relations between words (e.g. "spider-web", "lighter-cigarette", as opposed to synonymous relations such as "whale-dolphin", "astronaut-driver") was achieved by explicit semantic analysis (ESA) in 2007. ESA was a novel (non-machine learning) based approach that represented words in the form of vectors with 100,000 dimensions (where each dimension represents an Article in Wikipedia). However practical applications of the approach are limited due to the large number of required dimensions in the vectors. More recently, advances in neural network techniques in combination with other new approaches (tensors) led to a host of new recent developments: Word2vec from Google, GloVe from Stanford University, and fastText from Facebook AI Research (FAIR) labs.

    Read more →
  • Netsukuku

    Netsukuku

    Netsukuku is an experimental peer-to-peer routing system, developed by the FreakNet MediaLab in 2005, created to build up a distributed network, anonymous and censorship-free, fully independent but not necessarily separated from the Internet, without the support of any server, Internet service provider and no central authority. Netsukuku is designed to handle up to 2128 nodes without any servers or central systems, with minimal CPU and memory resources. This mesh network can be built using existing network infrastructure components such as Wi-Fi. The project has been in slow development since 2005, never abandoning a beta state. It has also never been tested on large scale. == Operation == As of December 2011, the latest theoretical work on Netsukuku could be found in the author's master thesis Scalable Mesh Networks and the Address Space Balancing problem. The following description takes into account only the basic concepts of the theory. Netsukuku uses a custom routing protocol called QSPN (Quantum Shortest Path Netsukuku) that strives to be efficient and not taxing on the computational capabilities of each node. The current version of the protocol is QSPNv2. It adopts a hierarchical structure. 256 nodes are grouped inside a gnode (group node), 256 gnodes are grouped in a single ggnode (group of group nodes), 256 ggnodes are grouped in a single gggnode, and so on. This offers a set of advantages main documentation. The protocol relies on the fact that the nodes are not mobile and that the network structure does not change quickly, as several minutes may be required before a change in the network is propagated. However, a node that joins the network is immediately able to communicate using the routes of its neighbors. When a node joins the mesh network, Netsukuku automatically adapts and all other nodes come to know the fastest and most efficient routes to communicate with the newcomer. Each node has no more privileges or restrictions than the other nodes. The domain name system (DNS) is replaced by a decentralised and distributed system called ANDNA (Abnormal Netsukuku Domain Name Anarchy). The ANDNA database is included in the Netsukuku system, so each node includes such database that occupies at most 355 kilobytes of memory. Simplifying, ANDNA works as follows: to resolve a symbolic name the host applies a function Hash on its behalf. The Hash function returns an address that the host contacts asking for the resolution generated by the hash. The contacted node receives a request, searches in its ANDNA database for the address associated with the name and returns it to the applicant host. Recording works in a similar way: for example, let's suppose that the node X wants to register the address FreakNet.andna; X calculates the hash name and obtains the address 11.22.33.44 associated with node Y. The node X contacts Y asking to register 11.22.33.44 as its own. Y stores the request in its database and any request for resolution of 11.22.33.44 hash, will answer with the X's address. The protocol is a little more complex than this, as the system provides a public/private key to authenticate the hosts and prevent unauthorized changes to the ANDNA database. Furthermore, the protocol provides redundancy in the database to make the protocol resistant to failure and also provides for the migration of the database if the network topology changes. The protocol does not provide for the possibility of revoking a symbolic name; after a certain period of inactivity (currently 3 days) it is simply deleted from the database. The protocol also prevents a single host from recording an excessive number of symbolic names (at present 256 names) in order to prevent spammers from storing a high number of terms to perform cybersquatting.

    Read more →
  • Control-flow diagram

    Control-flow diagram

    A control-flow diagram (CFD) is a diagram to describe the control flow of a business process, process or review. Control-flow diagrams were developed in the 1950s, and are widely used in multiple engineering disciplines. They are one of the classic business process modeling methodologies, along with flow charts, drakon-charts, data flow diagrams, functional flow block diagram, Gantt charts, PERT diagrams, and IDEF. == Overview == A control-flow diagram can consist of a subdivision to show sequential steps, with if-then-else conditions, repetition, and/or case conditions. Suitably annotated geometrical figures are used to represent operations, data, or equipment, and arrows are used to indicate the sequential flow from one to another. There are several types of control-flow diagrams, for example: Change-control-flow diagram, used in project management Configuration-decision control-flow diagram, used in configuration management Process-control-flow diagram, used in process management Quality-control-flow diagram, used in quality control. In software and systems development, control-flow diagrams can be used in control-flow analysis, data-flow analysis, algorithm analysis, and simulation. Control and data are most applicable for real time and data-driven systems. These flow analyses transform logic and data requirements text into graphic flows which are easier to analyze than the text. PERT, state transition, and transaction diagrams are examples of control-flow diagrams. == Types of control-flow diagrams == === Process-control-flow diagram === A flow diagram can be developed for the process [control system] for each critical activity. Process control is normally a closed cycle in which a sensor. The application determines if the sensor information is within the predetermined (or calculated) data parameters and constraints. The results of this comparison, which controls the critical component. This [feedback] may control the component electronically or may indicate the need for a manual action. This closed-cycle process has many checks and balances to ensure that it stays safe. It may be fully computer controlled and automated, or it may be a hybrid in which only the sensor is automated and the action requires manual intervention. Further, some process control systems may use prior generations of hardware and software, while others are state of the art. === Performance-seeking control-flow diagram === The figure presents an example of a performance-seeking control-flow diagram of the algorithm. The control law consists of estimation, modeling, and optimization processes. In the Kalman filter estimator, the inputs, outputs, and residuals were recorded. At the compact propulsion-system-modeling stage, all the estimated inlet and engine parameters were recorded. In addition to temperatures, pressures, and control positions, such estimated parameters as stall margins, thrust, and drag components were recorded. In the optimization phase, the operating-condition constraints, optimal solution, and linear-programming health-status condition codes were recorded. Finally, the actual commands that were sent to the engine through the DEEC were recorded.

    Read more →
  • Ciphertext expansion

    Ciphertext expansion

    In cryptography, the term ciphertext expansion refers to the length increase of a message when it is encrypted. Many modern cryptosystems cause some degree of expansion during the encryption process, for instance when the resulting ciphertext must include a message-unique Initialization Vector (IV). Probabilistic encryption schemes cause ciphertext expansion, as the set of possible ciphertexts is necessarily greater than the set of input plaintexts. Certain schemes, such as Cocks Identity Based Encryption, or the Goldwasser-Micali cryptosystem result in ciphertexts hundreds or thousands of times longer than the plaintext. Ciphertext expansion may be offset or increased by other processes which compress or expand the message, e.g., data compression or error correction coding. == Reasons why Ciphertext expansion can occur == === Probabilistic Encryption === Probabilistic encryption schemes, such as the Goldwasser-Micali cryptosystem, necessarily produce ciphertexts that are longer than the original plaintexts. This is because the set of possible ciphertexts must be larger than the set of plaintexts to achieve semantic security. === Initialization Vectors (IVs) === Many block cipher modes of operation, like Cipher Block Chaining (CBC), require the use of an Initialization Vector (IV) that is unique for each message. The IV is typically appended to the ciphertext, resulting in expansion. === Redundancy and Error Correction === Some cryptographic schemes intentionally introduce redundancy or error correction codes into the ciphertext to protect against tampering or transmission errors. This added data increases the ciphertext size. === Specific Cryptosystems === Certain cryptographic schemes, such as Cocks Identity-Based Encryption, can produce ciphertexts that are hundreds or thousands of times longer than the original plaintext. This extreme expansion is a design choice to achieve the desired security properties. Ciphertext expansion can be offset or increased by other processes that compress or expand the message, such as data compression or error correction coding. The overall impact on message size depends on the relative strengths of these competing effects.

    Read more →
  • Coupled pattern learner

    Coupled pattern learner

    Coupled Pattern Learner (CPL) is a machine learning algorithm which couples the semi-supervised learning of categories and relations to forestall the problem of semantic drift associated with boot-strap learning methods. == Coupled Pattern Learner == Semi-supervised learning approaches using a small number of labeled examples with many unlabeled examples are usually unreliable as they produce an internally consistent, but incorrect set of extractions. CPL solves this problem by simultaneously learning classifiers for many different categories and relations in the presence of an ontology defining constraints that couple the training of these classifiers. It was introduced by Andrew Carlson, Justin Betteridge, Estevam R. Hruschka Jr. and Tom M. Mitchell in 2009. == CPL overview == CPL is an approach to semi-supervised learning that yields more accurate results by coupling the training of many information extractors. Basic idea behind CPL is that semi-supervised training of a single type of extractor such as ‘coach’ is much more difficult than simultaneously training many extractors that cover a variety of inter-related entity and relation types. Using prior knowledge about the relationships between these different entities and relations CPL makes unlabeled data as a useful constraint during training. For e.g., ‘coach(x)’ implies ‘person(x)’ and ‘not sport(x)’. == CPL description == === Coupling of predicates === CPL primarily relies on the notion of coupling the learning of multiple functions so as to constrain the semi-supervised learning problem. CPL constrains the learned function in two ways. Sharing among same-arity predicates according to logical relations Relation argument type-checking === Sharing among same-arity predicates === Each predicate P in the ontology has a list of other same-arity predicates with which P is mutually exclusive. If A is mutually exclusive with predicate B, A’s positive instances and patterns become negative instances and negative patterns for B. For example, if ‘city’, having an instance ‘Boston’ and a pattern ‘mayor of arg1’, is mutually exclusive with ‘scientist’, then ‘Boston’ and ‘mayor of arg1’ will become a negative instance and a negative pattern respectively for ‘scientist.’ Further, Some categories are declared to be a subset of another category. For e.g., ‘athlete’ is a subset of ‘person’. === Relation argument type-checking === This is a type checking information used to couple the learning of relations and categories. For example, the arguments of the ‘ceoOf’ relation are declared to be of the categories ‘person’ and ‘company’. CPL does not promote a pair of noun phrases as an instance of a relation unless the two noun phrases are classified as belonging to the correct argument types. === Algorithm description === Following is a quick summary of the CPL algorithm. Input: An ontology O, and a text corpus C Output: Trusted instances/patterns for each predicate for i=1,2,...,∞ do foreach predicate p in O do EXTRACT candidate instances/contextual patterns using recently promoted patterns/instances; FILTER candidates that violate coupling; RANK candidate instances/patterns; PROMOTE top candidates; end end ==== Inputs ==== A large corpus of Part-Of-Speech tagged sentences and an initial ontology with predefined categories, relations, mutually exclusive relationships between same-arity predicates, subset relationships between some categories, seed instances for all predicates, and seed patterns for the categories. ==== Candidate extraction ==== CPL finds new candidate instances by using newly promoted patterns to extract the noun phrases that co-occur with those patterns in the text corpus. CPL extracts, Category Instances Category Patterns Relation Instances Relation Patterns ==== Candidate filtering ==== Candidate instances and patterns are filtered to maintain high precision, and to avoid extremely specific patterns. An instance is only considered for assessment if it co-occurs with at least two promoted patterns in the text corpus, and if its co-occurrence count with all promoted patterns is at least three times greater than its co-occurrence count with negative patterns. ==== Candidate ranking ==== CPL ranks candidate instances using the number of promoted patterns that they co-occur with so that candidates that occur with more patterns are ranked higher. Patterns are ranked using an estimate of the precision of each pattern. ==== Candidate promotion ==== CPL ranks the candidates according to their assessment scores and promotes at most 100 instances and 5 patterns for each predicate. Instances and patterns are only promoted if they co-occur with at least two promoted patterns or instances, respectively. == Meta-Bootstrap Learner == Meta-Bootstrap Learner (MBL) was also proposed by the authors of CPL. Meta-Bootstrap learner couples the training of multiple extraction techniques with a multi-view constraint, which requires the extractors to agree. It makes addition of coupling constraints on top of existing extraction algorithms, while treating them as black boxes, feasible. MBL assumes that the errors made by different extraction techniques are independent. Following is a quick summary of MBL. Input: An ontology O, a set of extractors ε Output: Trusted instances for each predicate for i=1,2,...,∞ do foreach predicate p in O do foreach extractor e in ε do Extract new candidates for p using e with recently promoted instances; end FILTER candidates that violate mutual-exclusion or type-checking constraints; PROMOTE candidates that were extracted by all extractors; end end Subordinate algorithms used with MBL do not promote any instance on their own, they report the evidence about each candidate to MBL and MBL is responsible for promoting instances. == Applications == In their paper authors have presented results showing the potential of CPL to contribute new facts to existing repository of semantic knowledge, Freebase

    Read more →
  • Influencer

    Influencer

    An influencer is an individual who has the capacity to shape the attitudes, behavior, or decisions of others through authority, knowledge, position, or the nature of the relationship with the audience. The term is used in various fields such as media, business, politics, religion, and communication, referring to influencers such as social media influencers, podcasters, public speakers, religious influencers, writers, and newsletter writers etc who have dedicated followings in various areas. One writer defines influencers as "a range of third parties who exercise influence over the organization and its potential customers." Another writer defines an influencer as a "third party who significantly shapes the customer's purchasing decision but may never be accountable for it." According to another writer, influencers are "well-connected, create an impact, have active minds, and are trendsetters". Just because a person has many followers does not necessarily mean they have much influence over those people. In contemporary usage, the term frequently refers to a social media influencer, (also known as an online influencer or simply influencer) a person who builds a grassroots online presence through engaging content such as photos, videos, and updates. This is done by using direct audience interaction to establish authenticity, expertise, and appeal, and by standing apart from traditional celebrities by growing their platform through social media rather than pre-existing fame. The modern referent of the term is commonly a paid role in which a business entity pays for the social media influence-for-hire activity to promote its products and services, known as influencer marketing. A 1% increase in spending on influencer marketing can lead to a 0.5% increase in audience engagement. As such, an influencer effectively acts as a modern salesperson or a marketer. Types of influencers include fashion influencer, travel influencer, and virtual influencer, and they involve content creators and streamers. Some influencers are associated primarily with specific social media apps such as TikTok, Instagram, or Pinterest; many influencers are also considered internet celebrities. As of 2023, Instagram is the social media platform businesses spend the most advertising money towards marketing with influencers. However, influencers can have an impact on any social media network. == History == === Origins === The word influencer in its general sense of a person or thing that exerts influence, is attested in historical sources at least since the 17th century. The Oxford English Dictionary (OED) gives 1664 as the earliest example of usage and cites a sentence from Henry More's A Modest Enquiry into the Mystery of Iniquity: "The head and influencer of the whole Church". The origins of online influencing can be traced back to the emergence of digital blogs and platforms in the early 2000s. Nevertheless, recent studies demonstrate that Instagram, an application with more than one billion users, harbors the majority of the influencer demographic. These individuals are sometimes referred to as "Instagrammers" or "Instafamous". A crucial aspect of influencing is their association with sponsors. The 2015 debut of Vamp, a company that links influencers with sponsorships, transformed the landscape of influencing. There is much debate about whether social media influencers can be considered celebrities, as their path to fame is often less traditional and arguably easier. Melody Nouri addressed the differences between the two types in her article "The Power of Influence: Traditional Celebrities vs Social Media Influencer". Nouri asserts that social media platforms have a greater negative impact on young, impressionable audiences in comparison with traditional media such as magazines, billboards, advertisements, and tabloids featuring celebrities. Online, it is thought to be simpler to manipulate an image and lifestyle in such a way that viewers are more susceptible to believing it. One theory considers the former American First Lady Eleanor Roosevelt (1884–1962) to be the "original media influencer." While she achieved celebrity in her role as First Lady, she built a global personal brand as a wise, informative, trustworthy American woman. Her voice was her own, unrestricted by political advisors and powerful men, and with it, Roosevelt exerted unprecedented social and cultural influence in radio, print, public speaking, film, and television until she died. In one notable example, it may have been Roosevelt's television support of John F. Kennedy which nudged his "hairline victory" during the 1960 Presidential campaign. In another example, David Ogilvy paid Roosevelt more than a quarter of a million dollars in today's currency to make a TV commercial for Good Luck margarine (1959), in which Roosevelt also managed to mention world hunger. As a content creator, she wrote My Day, a popular daily newspaper column that ran nationwide for twenty-six years. Like a social media post, My Day covered all aspects of her life, and in it Roosevelt often recommended movies, books, and products that she admired. Roosevelt also had a hand in designing all three of her public affairs television shows. Unlike contemporary influencers, she was less motivated by a pay-to-play situation than by a desire to educate and inspire; but she did use her influence to benefit the entertainment industry careers of her children, and she welcomed the revenue that her influence bought, most of which was donated to charity. === 2000s === The early 2000s showed corporate endeavors to leverage the internet for influence, with some companies participating in forums for promotions or providing bloggers with complimentary products in return for favorable reviews. A few of these practices were viewed as unethical for taking advantage of the labor of young individuals without providing remuneration. In 2004, The Blogstar Network was established by Ted Murphy of MindComet. Bloggers were encouraged to join an email list and receive remunerated offers from corporations in exchange for creating specific posts. For instance, bloggers were compensated for writing reviews of fast-food meals on their blogs. Blogstar is widely regarded as the first influencer marketing network. Murphy succeeded Blogstar with PayPerPost, which was introduced in 2006. This platform compensated significant posters on prominent forums and social media platforms for every post made about a corporate product. Payment rates were determined by the influencer's status. Though very popular, PayPerPost, received a great deal of criticism as these influencers were not required to disclose their involvement with PayPerPost as traditional journalism would have. With the success of PayPerPost, the public became aware that there was a drive for corporate interests to influence what some people were posting to these sites. The platform also incentivized other firms to establish comparable programs. Despite concerns, marketing networks with influencers continued to grow throughout the 2000s and into the 2010s. The influencer marketing industry was worth as much as $8 billion in 2019, according to estimates from Business Insider Intelligence, which are based on Mediakix data. Evan Asano, the Former CEO and founder of the agency Mediakix, previously spoke with Business Insider and said he believed influencer marketing on Instagram would continue to grow despite likes being hidden. === 2010s === By the 2010s, the term "influencer" described digital content creators with a large following, distinctive brand persona, and a patterned relationship with commercial sponsors. By this period, influencer marketing had become a widely researched field globally, with systematic reviews drawing on hundreds of studies that documented the growing role of authenticity, audience engagement, and parasocial relationships in shaping how consumers responded to influencer content across different markets. During this period, influencer culture also developed through distinct channels outside Western markets. In South Korea, the global spread of Korean pop culture, also called K-Pop, through platforms such as YouTube, Facebook, and Twitter gave rise to what scholars have called 'Hallyu 2.0' or the 'New Korean Wave', where fans throughout Southeast Asia, North America, Latin America, and Europe shared, subtitled, and redistributed Korean music and film content on a large scale. This helped Korean entertainers to build substantial followings internationally. Consumers often mistakenly view celebrities as reliable, leading to trust and confidence in the products being promoted. A 2001 study from Rutgers University discovered that individuals were using "internet forums as influential sources of consumer information." The study proposes that consumers preferred internet forums and social media when making purchasing decisions over conventional advertising and print sources. An in

    Read more →
  • Control-flow diagram

    Control-flow diagram

    A control-flow diagram (CFD) is a diagram to describe the control flow of a business process, process or review. Control-flow diagrams were developed in the 1950s, and are widely used in multiple engineering disciplines. They are one of the classic business process modeling methodologies, along with flow charts, drakon-charts, data flow diagrams, functional flow block diagram, Gantt charts, PERT diagrams, and IDEF. == Overview == A control-flow diagram can consist of a subdivision to show sequential steps, with if-then-else conditions, repetition, and/or case conditions. Suitably annotated geometrical figures are used to represent operations, data, or equipment, and arrows are used to indicate the sequential flow from one to another. There are several types of control-flow diagrams, for example: Change-control-flow diagram, used in project management Configuration-decision control-flow diagram, used in configuration management Process-control-flow diagram, used in process management Quality-control-flow diagram, used in quality control. In software and systems development, control-flow diagrams can be used in control-flow analysis, data-flow analysis, algorithm analysis, and simulation. Control and data are most applicable for real time and data-driven systems. These flow analyses transform logic and data requirements text into graphic flows which are easier to analyze than the text. PERT, state transition, and transaction diagrams are examples of control-flow diagrams. == Types of control-flow diagrams == === Process-control-flow diagram === A flow diagram can be developed for the process [control system] for each critical activity. Process control is normally a closed cycle in which a sensor. The application determines if the sensor information is within the predetermined (or calculated) data parameters and constraints. The results of this comparison, which controls the critical component. This [feedback] may control the component electronically or may indicate the need for a manual action. This closed-cycle process has many checks and balances to ensure that it stays safe. It may be fully computer controlled and automated, or it may be a hybrid in which only the sensor is automated and the action requires manual intervention. Further, some process control systems may use prior generations of hardware and software, while others are state of the art. === Performance-seeking control-flow diagram === The figure presents an example of a performance-seeking control-flow diagram of the algorithm. The control law consists of estimation, modeling, and optimization processes. In the Kalman filter estimator, the inputs, outputs, and residuals were recorded. At the compact propulsion-system-modeling stage, all the estimated inlet and engine parameters were recorded. In addition to temperatures, pressures, and control positions, such estimated parameters as stall margins, thrust, and drag components were recorded. In the optimization phase, the operating-condition constraints, optimal solution, and linear-programming health-status condition codes were recorded. Finally, the actual commands that were sent to the engine through the DEEC were recorded.

    Read more →
  • InfiniBand

    InfiniBand

    InfiniBand (IB) is a computer networking standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also used as either a direct or switched interconnect between servers and storage systems, as well as an interconnect between storage systems. It is designed to be scalable and uses a switched fabric network topology. Between 2014 and June 2016, it was the most commonly used interconnect in the TOP500 list of supercomputers. Mellanox (acquired by Nvidia) manufactures InfiniBand host bus adapters and network switches, which are used by large computer system and database vendors in their product lines. As a computer cluster interconnect, IB competes with Ethernet, Fibre Channel, and Intel Omni-Path. The technology is promoted by the InfiniBand Trade Association. == History == InfiniBand originated in 1999 from the merger of two competing designs: Future I/O and Next Generation I/O (NGIO). NGIO was led by Intel, with a specification released in 1998, and joined by Sun Microsystems and Dell. Future I/O was backed by Compaq, IBM, and Hewlett-Packard. This led to the formation of the InfiniBand Trade Association (IBTA), which included both sets of hardware vendors as well as software vendors such as Microsoft. At the time it was thought some of the more powerful computers were approaching the interconnect bottleneck of the PCI bus, in spite of upgrades like PCI-X. Version 1.0 of the InfiniBand Architecture Specification was released in 2000. Initially the IBTA vision for IB was simultaneously a replacement for PCI in I/O, Ethernet in the machine room, cluster interconnect and Fibre Channel. IBTA also envisaged decomposing server hardware on an IB fabric. Mellanox had been founded in 1999 to develop NGIO technology, but by 2001 shipped an InfiniBand product line called InfiniBridge at 10 Gbit/second speeds. Following the burst of the dot-com bubble there was hesitation in the industry to invest in such a far-reaching technology jump. By 2002, Intel announced that instead of shipping IB integrated circuits ("chips"), it would focus on developing PCI Express, and Microsoft discontinued IB development in favor of extending Ethernet. Sun Microsystems and Hitachi continued to support IB. In 2003, the System X supercomputer built at Virginia Tech used InfiniBand in what was estimated to be the third largest computer in the world at the time. The OpenIB Alliance (later renamed OpenFabrics Alliance) was founded in 2004 to develop an open set of software for the Linux kernel. By February, 2005, the support was accepted into the 2.6.11 Linux kernel. In November 2005 storage devices finally were released using InfiniBand from vendors such as Engenio. Cisco, desiring to keep technology superior to Ethernet off the market, adopted a "buy to kill" strategy. Cisco successfully killed InfiniBand switching companies such as Topspin via acquisition. Of the top 500 supercomputers in 2009, Gigabit Ethernet was the internal interconnect technology in 259 installations, compared with 181 using InfiniBand. In 2010, market leaders Mellanox and Voltaire merged, leaving just one other IB vendor, QLogic, primarily a Fibre Channel vendor. At the 2011 International Supercomputing Conference, links running at about 56 gigabits per second (known as FDR, see below), were announced and demonstrated by connecting booths in the trade show. In 2012, Intel acquired QLogic's InfiniBand technology, leaving only one independent supplier. By 2014, InfiniBand was the most popular internal connection technology for supercomputers, although within two years, 10 Gigabit Ethernet started displacing it. In 2016, it was reported that Oracle Corporation (an investor in Mellanox) might engineer its own InfiniBand hardware. In 2019 Nvidia acquired Mellanox, the last independent supplier of InfiniBand products. == Specification == Specifications are published by the InfiniBand trade association. === Performance === Original names for speeds were single-data rate (SDR), double-data rate (DDR) and quad-data rate (QDR) as given below. Subsequently, other three-letter initialisms were added for even higher data rates. Notes Each link is duplex. Links can be aggregated: most systems use a 4 link/lane connector (QSFP). HDR often makes use of 2x links (aka HDR100, 100 Gb link using 2 lanes of HDR, while still using a QSFP connector). NDR introduced OSFP connectors which host one or two links at 2x (NDR200) or 4x (NDR400). They are not logically configured as a single 8x link, even when connecting switches together with an OSFP cable. InfiniBand provides remote direct memory access (RDMA) capabilities for low CPU overhead. === Topology === InfiniBand uses a switched fabric topology, as opposed to early shared medium Ethernet. All transmissions begin or end at a channel adapter. Each processor contains a host channel adapter (HCA) and each peripheral has a target channel adapter (TCA). These adapters can also exchange information for security or quality of service (QoS). === Messages === InfiniBand transmits data in packets of up to 4 KB that are taken together to form a message. A message can be: a remote direct memory access read or write a channel send or receive a transaction-based operation (that can be reversed) a multicast transmission an atomic operation === Physical interconnection === In addition to a board form factor connection, it can use both active and passive copper (up to 10 meters) and optical fiber cable (up to 10 km). QSFP connectors are used. The InfiniBand Association also specified the CXP connector system for speeds up to 120 Gbit/s over copper, active optical cables, and optical transceivers using parallel multi-mode fiber cables with 24-fiber MPO connectors. === Software interfaces === Mellanox operating system support is available for Solaris, FreeBSD, Red Hat Enterprise Linux, SUSE Linux Enterprise Server (SLES), Windows, HP-UX, VMware ESX, and AIX. InfiniBand has no specific standard application programming interface (API). The standard only lists a set of verbs such as ibv_open_device or ibv_post_send, which are abstract representations of functions or methods that must exist. The syntax of these functions is left to the vendors. Sometimes for reference this is called the verbs API. The de facto standard software is developed by OpenFabrics Alliance and called the Open Fabrics Enterprise Distribution (OFED). It is released under two licenses GPL2 or BSD license for Linux and FreeBSD, and as Mellanox OFED for Windows (product names: WinOF / WinOF-2; attributed as host controller driver for matching specific ConnectX 3 to 5 devices) under a choice of BSD license for Windows. It has been adopted by most of the InfiniBand vendors, for Linux, FreeBSD, and Microsoft Windows. IBM refers to a software library called libibverbs, for its AIX operating system, as well as "AIX InfiniBand verbs". The Linux kernel support was integrated in 2005 into the kernel version 2.6.11. === Ethernet over InfiniBand === Ethernet over InfiniBand, abbreviated to EoIB, is an Ethernet implementation over the InfiniBand protocol and connector technology. EoIB enables multiple Ethernet bandwidths varying on the InfiniBand (IB) version. Ethernet's implementation of the Internet Protocol Suite, usually referred to as TCP/IP, is different in some details compared to the direct InfiniBand protocol in IP over IB (IPoIB).

    Read more →
  • Automation

    Automation

    Automation describes a wide range of technologies that reduce human intervention in processes, mainly by predetermining decision criteria, subprocess relationships, and related actions, as well as embodying those predeterminations in machines. Automation has been achieved by various means including mechanical, hydraulic, pneumatic, electrical, electronic devices, and computers, usually in combination. Complicated systems, such as modern factories, airplanes, and ships typically use combinations of all of these techniques. The benefits of automation includes labor savings, reducing waste, savings in electricity costs, savings in material costs, and improvements to quality, accuracy, and precision. Automation includes the use of various equipment and control systems such as machinery, processes in factories, boilers, and heat-treating ovens, switching on telephone networks, steering, stabilization of ships, aircraft and other applications and vehicles with reduced human intervention. Examples range from a household thermostat controlling a boiler to a large industrial control system with tens of thousands of input measurements and output control signals. In the simplest type of an automatic control loop, a controller compares a measured value of a process with a desired set value and processes the resulting error signal to change some input to the process, in such a way that the process stays at its set point despite disturbances. This closed-loop control is an application of negative feedback to a system. The mathematical basis of control theory began in the 18th century and advanced rapidly in the 20th. The term automation, inspired by the earlier word automatic (coming from automaton), was not widely used before 1947, when Ford established an automation department. It was during this time that the industry was rapidly adopting feedback controllers, Technological advancements introduced in the 1930s revolutionized various industries significantly. The World Bank's World Development Report of 2019 shows evidence that the new industries and jobs in the technology sector outweigh the economic effects of workers being displaced by automation. Job losses and downward mobility blamed on automation have been cited as one of many factors in the resurgence of nationalist, protectionist and populist politics in the US, UK and France, among other countries since the 2010s. == History == === Early history === It was a preoccupation of the Greeks and Arabs (in the period between about 300 BC and about 1200 AD) to keep an accurate track of time. In Ptolemaic Egypt, about 270 BC, Ctesibius described a float regulator for a water clock, a device not unlike the ball and cock in a modern flush toilet. This was the earliest feedback-controlled mechanism. The appearance of the mechanical clock in the 14th century made the water clock and its feedback control system obsolete. The Persian Banū Mūsā brothers, in their Book of Ingenious Devices (850 AD), described a number of automatic controls. Two-step level controls for fluids, a form of discontinuous variable structure controls, were developed by the Banu Musa brothers. They also described a feedback controller. The design of feedback control systems up through the Industrial Revolution was by trial-and-error, together with a great deal of engineering intuition. It was not until the mid-19th century that the stability of feedback control systems was analyzed using mathematics, the formal language of automatic control theory. The centrifugal governor was invented by Christiaan Huygens in the seventeenth century, and used to adjust the gap between millstones. === Industrial Revolution in Western Europe === The introduction of prime movers, or self-driven machines advanced grain mills, furnaces, boilers, and the steam engine created a new requirement for automatic control systems including temperature regulators (invented in 1624; see Cornelius Drebbel), pressure regulators (1681), float regulators (1700) and speed control devices. Another control mechanism was used to tent the sails of windmills. It was patented by Edmund Lee in 1745. Also in 1745, Jacques de Vaucanson invented the first automated loom. Around 1800, Joseph Marie Jacquard created a punch-card system to program looms. In 1771 Richard Arkwright invented the first fully automated spinning mill driven by water power, known at the time as the water frame. An automatic flour mill was developed by Oliver Evans in 1785, making it the first completely automated industrial process. A centrifugal governor was used by Mr. Bunce of England in 1784 as part of a model steam crane. The centrifugal governor was adopted by James Watt for use on a steam engine in 1788 after Watt's partner Boulton saw one at a flour mill Boulton & Watt were building. The governor could not actually hold a set speed; the engine would assume a new constant speed in response to load changes. The governor was able to handle smaller variations such as those caused by fluctuating heat load to the boiler. Also, there was a tendency for oscillation whenever there was a speed change. As a consequence, engines equipped with this governor were not suitable for operations requiring constant speed, such as cotton spinning. Several improvements to the governor, plus improvements to valve cut-off timing on the steam engine, made the engine suitable for most industrial uses before the end of the 19th century. Advances in the steam engine stayed well ahead of science, both thermodynamics and control theory. The governor received relatively little scientific attention until James Clerk Maxwell published a paper that established the beginning of a theoretical basis for understanding control theory. === 20th century === Relay logic was introduced with factory electrification, which underwent rapid adaptation from 1900 through the 1920s. Central electric power stations were also undergoing rapid growth and the operation of new high-pressure boilers, steam turbines and electrical substations created a great demand for instruments and controls. Central control rooms became common in the 1920s, but as late as the early 1930s, most process controls were on-off. Operators typically monitored charts drawn by recorders that plotted data from instruments. To make corrections, operators manually opened or closed valves or turned switches on or off. Control rooms also used color-coded lights to send signals to workers in the plant to manually make certain changes. The development of the electronic amplifier during the 1920s, which was important for long-distance telephony, required a higher signal-to-noise ratio, which was solved by negative feedback noise cancellation. This and other telephony applications contributed to the control theory. In the 1940s and 1950s, German mathematician Irmgard Flügge-Lotz developed the theory of discontinuous automatic controls, which found military applications during the Second World War to fire control systems and aircraft navigation systems. Controllers, which were able to make calculated changes in response to deviations from a set point rather than on-off control, began being introduced in the 1930s. Controllers allowed manufacturing to continue showing productivity gains to offset the declining influence of factory electrification. Factory productivity was greatly increased by electrification in the 1920s. U.S. manufacturing productivity growth fell from 5.2%/yr 1919–29 to 2.76%/yr 1929–41. Alexander Field notes that spending on non-medical instruments increased significantly from 1929 to 1933 and remained strong thereafter. The First and Second World Wars saw major advancements in the field of mass communication and signal processing. Other key advances in automatic controls include differential equations, stability theory and system theory (1938), frequency domain analysis (1940), ship control (1950), and stochastic analysis (1941). Starting in 1958, various systems based on solid-state digital logic modules for hard-wired programmed logic controllers (the predecessors of programmable logic controllers [PLC]) emerged to replace electro-mechanical relay logic in industrial control systems for process control and automation, including early Telefunken/AEG Logistat, Siemens Simatic, Philips/Mullard/Valvo Norbit, BBC Sigmatronic, ACEC Logacec, Akkord Estacord, Krone Mibakron, Bistat, Datapac, Norlog, SSR, or Procontic systems. In 1959 Texaco's Port Arthur Refinery became the first chemical plant to use digital control. Conversion of factories to digital control began to spread rapidly in the 1970s as the price of computer hardware fell. === Significant applications === The automatic telephone switchboard was introduced in 1892 along with dial telephones. By 1929, 31.9% of the Bell system was automatic. Automatic telephone switching originally used vacuum tube amplifiers and electro-mechanical switches, which consumed a large amount of electricity. Call volume eve

    Read more →
  • Cryptovirology

    Cryptovirology

    Cryptovirology refers to the study of cryptography use in malware, such as ransomware and asymmetric backdoors. Traditionally, cryptography and its applications are defensive in nature, and provide privacy, authentication, and security to users. Cryptovirology employs a twist on cryptography, showing that it can also be used offensively. It can be used to mount extortion based attacks that cause loss of access to information, loss of confidentiality, and information leakage, tasks which cryptography typically prevents. The field was born with the observation that public-key cryptography can be used to break the symmetry between what an antivirus analyst sees regarding malware and what the attacker sees. The antivirus analyst sees a public key contained in the malware, whereas the attacker sees the public key contained in the malware as well as the corresponding private key (outside the malware) since the attacker created the key pair for the attack. The public key allows the malware to perform trapdoor one-way operations on the victim's computer that only the attacker can undo. == Overview == The field encompasses covert malware attacks in which the attacker securely steals private information such as symmetric keys, private keys, PRNG state, and the victim's data. Examples of such covert attacks are asymmetric backdoors. An asymmetric backdoor is a backdoor (e.g., in a cryptosystem) that can be used only by the attacker, even after it is found. This contrasts with the traditional backdoor that is symmetric, i.e., anyone that finds it can use it. Kleptography, a subfield of cryptovirology, is the study of asymmetric backdoors in key generation algorithms, digital signature algorithms, key exchanges, pseudorandom number generators, encryption algorithms, and other cryptographic algorithms. The NIST Dual EC DRBG random bit generator has an asymmetric backdoor in it. The EC-DRBG algorithm utilizes the discrete-log kleptogram from kleptography, which by definition makes the EC-DRBG a cryptotrojan. Like ransomware, the EC-DRBG cryptotrojan contains and uses the attacker's public key to attack the host system. The cryptographer Ari Juels indicated that NSA effectively orchestrated a kleptographic attack on users of the Dual EC DRBG pseudorandom number generation algorithm and that, although security professionals and developers have been testing and implementing kleptographic attacks since 1996, "you would be hard-pressed to find one in actual use until now." Due to public outcry about this cryptovirology attack, NIST rescinded the EC-DRBG algorithm from the NIST SP 800-90 standard. Covert information leakage attacks carried out by cryptoviruses, cryptotrojans, and cryptoworms that, by definition, contain and use the public key of the attacker is a major theme in cryptovirology. In "deniable password snatching," a cryptovirus installs a cryptotrojan that asymmetrically encrypts host data and covertly broadcasts it. This makes it available to everyone, noticeable by no one (except the attacker), and only decipherable by the attacker. An attacker caught installing the cryptotrojan claims to be a virus victim. An attacker observed receiving the covert asymmetric broadcast is one of the thousands, if not millions of receivers, and exhibits no identifying information whatsoever. The cryptovirology attack achieves "end-to-end deniability." It is a covert asymmetric broadcast of the victim's data. Cryptovirology also encompasses the use of private information retrieval (PIR) to allow cryptoviruses to search for and steal host data without revealing the data searched for even when the cryptotrojan is under constant surveillance. By definition, such a cryptovirus carries within its own coding sequence the query of the attacker and the necessary PIR logic to apply the query to host systems. == History == The first cryptovirology attack and discussion of the concept was by Adam L. Young and Moti Yung, at the time called "cryptoviral extortion" and it was presented at the 1996 IEEE Security & Privacy conference. In this attack, a cryptovirus, cryptoworm, or cryptotrojan contains the public key of the attacker and hybrid encrypts the victim's files. The malware prompts the user to send the asymmetric ciphertext to the attacker who will decipher it and return the symmetric decryption key it contains for a fee. The victim needs the symmetric key to decrypt the encrypted files if there is no way to recover the original files (e.g., from backups). The 1996 IEEE paper predicted that cryptoviral extortion attackers would one day demand e-money, long before Bitcoin even existed. Many years later, the media relabeled cryptoviral extortion as ransomware. In 2016, cryptovirology attacks on healthcare providers reached epidemic levels, prompting the U.S. Department of Health and Human Services to issue a Fact Sheet on Ransomware and HIPAA. The fact sheet states that when electronic protected health information is encrypted by ransomware, a breach has occurred, and the attack therefore constitutes a disclosure that is not permitted under HIPAA, the rationale being that an adversary has taken control of the information. Sensitive data might never leave the victim organization, but the break-in may have allowed data to be sent out undetected. California enacted a law that defines the introduction of ransomware into a computer system with the intent of extortion as being against the law. == Examples == === Tremor virus === While viruses in the wild have used cryptography in the past, the only purpose of such usage of cryptography was to avoid detection by antivirus software. For example, the tremor virus used polymorphism as a defensive technique in an attempt to avoid detection by anti-virus software. Though cryptography does assist in such cases to enhance the longevity of a virus, the capabilities of cryptography are not used in the payload. The One-half virus was amongst the first viruses known to have encrypted affected files. === Tro_Ransom.A virus === An example of a virus that informs the owner of the infected machine to pay a ransom is the virus nicknamed Tro_Ransom.A. This virus asks the owner of the infected machine to send $10.99 to a given account through Western Union. Virus.Win32.Gpcode.ag is a classic cryptovirus. This virus partially uses a version of 660-bit RSA and encrypts files with many different extensions. It instructs the owner of the machine to email a given mail ID if the owner desires the decryptor. If contacted by email, the user will be asked to pay a certain amount as ransom in return for the decryptor. === CAPI === It has been demonstrated that using just 8 different calls to Microsoft's Cryptographic API (CAPI), a cryptovirus can satisfy all its encryption needs. == Other uses of cryptography-enabled malware == Apart from cryptoviral extortion, there are other potential uses of cryptoviruses, such as deniable password snatching, cryptocounters, private information retrieval, and in secure communication between different instances of a distributed cryptovirus.

    Read more →
  • Data monetization

    Data monetization

    Data monetization, a form of monetization, may refer to the act of generating measurable economic benefits from available data sources (analytics). Less commonly, it may also refer to the act of monetizing data services. In the case of analytics, typically, these benefits accrue as revenue or expense savings, but may also include market share or corporate market value gains. Data monetization leverages data generated through business operations, available exogenous data or content, as well as data associated with individual actors such as that collected via electronic devices and sensors participating in the internet of things. For example, the ubiquity of the internet of things is generating location data and other data from sensors and mobile devices at an ever-increasing rate. When this data is collated against traditional databases, the value and utility of both sources of data increases, leading to tremendous potential to mine data for social good, research and discovery, and achievement of business objectives. Closely associated with data monetization are the emerging data as a service models for transactions involving data by the data item. There are three ethical and regulatory vectors involved in data monetization due to the sometimes conflicting interests of actors involved in the digital supply chain. The individual data creator who generates files and records through his own efforts or owns a device such as a sensor or a mobile phone that generates data has a claim to ownership of data. The business entity that generates data in the course of its operations, such as its transactions with financial institutions or risk factors discovered through feedback from customers also has a claim on data captured through their systems and platforms. However, the person that contributed the data may also have a legitimate claim on the data. Internet platforms and service providers, such as Google or Facebook that require a user to forgo some ownership interest in their data in exchange for use of the platform also have a legitimate claim on the data. Thus the practice of data monetization, although common since 2000, is now getting increasing attention from regulators. The European Union and the United States Congress have begun to address these issues. For instance, in the financial services industry, regulations involving data are included in the Gramm–Leach–Bliley Act and Dodd-Frank. Some individual creators of data are shifting to using personal data vaults and implementing vendor relationship management concepts as a reflection of an increasing resistance to their data being federated or aggregated and resold without compensation. Groups such as the Personal Data Ecosystem Consortium, Patient privacy rights, and others are also challenging corporate cooptation of data without compensation. Financial services companies are a relatively good example of an industry focused on generating revenue by leveraging data. Credit card issuers and retail banks use customer transaction data to improve targeting of cross-sell offers. Partners are increasingly promoting merchant based reward programs which leverage a bank’s data and provide discounts to customers at the same time. == Types of data monetization == Internal data monetization - An organization's data is used internally, resulting in economic benefit. This is commonly the case in organizations using analytics to uncover insights, resulting in improved profit, cost savings or the avoidance of risk. Internal data monetization is currently the most common form of monetization, requiring far fewer security, intellectual property, and legal precautions when compared to other types. The potential economic gains from this type of data monetization are limited by the organization's internal structure and situation. External data monetization - A person or organization makes data they possess available on a for-fee basis to external parties, or as a broker for same. This type of monetization is less common and requires various methods to distribute the data to potential buyers and consumers. However, the economic gain that results from collecting data, packaging and distributing it, can be quite large. == Steps == Identification of available data sources – this includes data currently available for monetization as well as other external data sources that may enhance the value of what’s currently available. Connect, aggregate, attribute, validate, authenticate, and exchange data - this allows data to be converted directly into actionable or revenue generating insight or services. Set terms and prices and facilitate data trading - methods for data vetting, storage, and access. For example, many global corporations have locked and siloed data storage infrastructures, which hinders efficient access to data and cooperative and real-time exchange. Perform Research and analytics – draw predictive insights from existing data as a basis for using data for to reduce risk, enhance product development or performance, or improve customer experience or business outcomes. Action and leveraging – the last phase of monetizing data includes determining alternative or improved data centric products, ideas, or services. Examples may include real-time actionable triggered notifications or enhanced channels such as web or mobile response mechanisms. == Pricing variables and factors == A fee for use of a platform to connect buyers and sellers use of a platform to configure, organize, and otherwise process data included in a data trade connecting or including a device or sensor into a data supply chain connecting and credentialing a creator of a data source and a data buyer – often through a federated identity connecting a data source to other data sources to be included in a data supply chain use of an internet service or other transmission services for uploading and downloading data – sometimes, for an individual, through a personal cloud use of encrypted keys to achieve secure data transfer use of a search algorithm specifically designed to tag data sources that contain data points of value to the data buyer linking a data creator or generator to a data collection protocol or form server actions – such as a notification – triggered by an update to a data item or data source included in a data supply chain A price or exchange or other trade value assigned by a data creator or generator to a data item or a data source offered by a data buyer to a data creator assigned by a data buyer for a data item or a data source formatted according to criteria set by a data buyer An incremental fee assigned by a data buyer for a data item or a data set scaled to the reputation of the data creator == Benefits == Improved decision-making that leads to real time crowd sourced research, improved profits, decreased costs, reduced risk and improved compliance More impactful decisions (e.g., make real-time decisions) More timely (lower latency) decisions (e.g., a vendor making purchase recommendations while the customer is still on the phone or in the store, a customer connecting with multiple vendors to discover the best price, triggered notifications when thresholds are reached for data values) More granular decisions (e.g., localized pricing decisions at an individual or device or sensor level versus larger aggregates). Targeted Marketing (e.g., Vendors with access to big data can make targeted advertisements to specific customers within a set data pool decreasing costs for the advertiser and reaching most interested customers) == Frameworks == There are a wide variety of industries, firms and business models related to data monetization. The following frameworks have been offered to help understand the types of business models that are used: Roger Ehrenberg of IA Ventures, a venture capital firm that invests in this sector, has defined three basic types of data product firms: Contributory databases. The magic of these businesses is that a customer provides their own data in exchange for receiving a more robust set of aggregated data back that provides insight into the broader marketplace, or provides a vehicle for expressing a view. Give a little, get a lot back in return – a pretty compelling value proposition, and one that frequently results in a payment from the data contributor in exchange for receiving enriched, aggregated data. Once these contributory databases are developed and customers become reliant on their insights, they become extremely valuable and persistent data assets. Data processing platforms. These businesses create barriers through a combination of complex data architectures, proprietary algorithms, and rich analytics to help customers consume data in whatever form they please. Often these businesses have special relationships with key data providers, that when combined with other data and processed as a whole create valuable differentiation and competitive barriers. Bloomberg is an example of a powerful

    Read more →