AI Email Management

AI Email Management — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Vulnerabilities Equities Process

    Vulnerabilities Equities Process

    The Vulnerabilities Equities Process (VEP) is a process used by the U.S. federal government to determine on a case-by-case basis how it should treat zero-day computer security vulnerabilities: whether to disclose them to the public to help improve general computer security, or to keep them secret for offensive use against the government's adversaries. The VEP was first developed during the period 2008–2009, but only became public in 2016, when the government released a redacted version of the VEP in response to a FOIA request by the Electronic Frontier Foundation. Following public pressure for greater transparency in the wake of the Shadow Brokers affair, the U.S. government made a more public disclosure of the VEP process in November 2017. == Participants == According to the VEP plan published in 2017, the Equities Review Board (ERB) is the primary forum for interagency deliberation and determinations concerning the VEP. The ERB meets monthly, but may also be convened sooner if an immediate need arises. The ERB consists of representatives from the following agencies: Office of Management and Budget Office of the Director of National Intelligence (including the Intelligence Community-Security Coordination Center) United States Department of the Treasury United States Department of State United States Department of Justice (including the Federal Bureau of Investigation and the National Cyber Investigative Joint Task Force) Department of Homeland Security (including the National Cybersecurity and Communications Integration Center and the United States Secret Service) United States Department of Energy United States Department of Defense (to include the National Security Agency, including Information Assurance and Signals Intelligence elements), United States Cyber Command, and DoD Cyber Crime Center) United States Department of Commerce Central Intelligence Agency The National Security Agency serves as the executive secretariat for the VEP. == Process == According to the November 2017 version of the VEP, the process is as follows: === Submission and notification === When an agency finds a vulnerability, it will notify the VEP secretariat as soon as is possible. The notification will include a description of the vulnerability and the vulnerable products or systems, together with the agency's recommendation to either disseminate or restrict the vulnerability information. The secretariat will then notify all participants of the submission within one business day, requesting them to respond if they have an relevant interest. === Equity and discussions === An agency expressing an interest must indicate whether it concurs with the original recommendation to disseminate or restrict within five business days. If it does not, it will hold discussions with the submitting agency and the VEP secretariat within seven business days to attempt to reach consensus. If no consensus is reached, the participants will suggest options for the Equities Review Board. === Determination to disseminate or restrict === Decisions whether to disclose or restrict a vulnerability should be made quickly, in full consultation with all concerned agencies, and in the overall best interest of the competing interests of the missions of the U.S. government. As far as possible, determinations should be based on rational, objective methodologies, taking into account factors such as prevalence, reliance, and severity. If the review board members cannot reach consensus, they will vote on a preliminary determination. If an agency with an equity disputes that decision, they may, by providing notice to the VEP secretariat, elect to contest the preliminary determination. If no agency contests a preliminary determination, it will be treated as a final decision. === Handling and follow-on actions === If vulnerability information is released, this will be done as quickly as possible, preferably within seven business days. Disclosure of vulnerabilities will be conducted according to guidelines agreed on by all members. The submitting agency is presumed to be most knowledgeable about the vulnerability and, as such, will be responsible for disseminating vulnerability information to the vendor. The submitting agency may elect to delegate dissemination responsibility to another agency on its behalf. The releasing agency will promptly provide a copy of the disclosed information to the VEP secretariat for record keeping. Additionally, the releasing agency is expected to follow up so the ERB can determine whether the vendor's action meets government requirements. If the vendor chooses not to address a vulnerability, or is not acting with urgency consistent with the risk of the vulnerability, the releasing agency will notify the secretariat, and the government may take other mitigation steps. == Criticism == The VEP process has been criticized for a number of deficiencies, including restriction by non-disclosure agreements, lack of risk ratings, special treatment for the NSA, and less than whole-hearted commitment to disclosure as the default option. == UK equivalent == British intelligence agencies—GCHQ in particular—follow a similar approach, also known as the Equities Process, to determine whether to disclose or retain security vulnerabilities. The Investigatory Powers Act 2016 was amended in 2022 to bring oversight of the operation of the process within the remit of the Investigatory Powers Commissioner. Details of the process were made public in 2018.

    Read more →
  • Social trading

    Social trading

    Social trading is a form of investing that allows investors to observe the trading behavior of their peers and expert traders. The primary objective is to follow their investment strategies using copy trading or mirror trading. Social trading requires little or no knowledge about financial markets. == History == One of the first social trading platforms was Collective2] which began offering a social trading functionality to retail traders as early as 2003 (preceding ZuluTrade by four years). In 2010, social trading started to achieve a greater degree of mainstream appeal with eToro, followed by Wikifolio in 2012. Europe-based NAGA, listed on Frankfurt Stock Exchange since 2017, claims more than EUR 27 billion was traded on its platform in the second half of 2019. Some of the other contemporary social trading platforms and tech providers are Trading Motion, Brokeree Solutions, iSystems, and FX Junction, among others. === Research === MIT Computer Scientist and researcher Yaniv Altshuler described social trading networks as complex adaptive systems, and in his 2014 research on eToro's OpenBook, wrote that "Having the inherent ability to share ideas and information between each others, OpenBook's users are given a new source of information they can use in order to enhance their trading performance. As the users are not playing against each other but rather – against the market, this situation becomes a non zero-sum game, hence incentivizing the users to share as much information as possible." His paper concludes that "social trading provides much better opportunities for profiting compared with individual trading," but that users make "excellent but sometimes not optimal decisions in selecting experts when they can see others' choices." A 2015 World Economic Forum report described social trading networks as disruptors, which "have emerged to provide low-cost, sophisticated alternatives to traditional wealth managers. These solutions cater to a broader customer base and empower customers to have more control of their wealth management," and "pose a tangible threat to the traditional practices of the wealth management industry". Economist Nouriel Roubini's thinktank predicted in 2016 that "newer forms of investment, such as socially responsible investments and social trading will bring some of the largest industry growth in the coming years." A 2017 St. John's University study found that 'leader' traders, or those with followers, are more susceptible to the disposition effect than investors that are not being followed by any other traders, with the authors suggesting the observation may be explained by "leaders feeling responsible towards their followers and an urge to not let them down, by fear of losing followers when admitting a bad investment decision and signaling confidence in their initial investment choice, or by an attempt of newly appointed leaders to manage their self-image." Social trading may potentially also change how much risk investors take. A recent experimental study argues that merely providing information on the success of others may lead to a significant increase in risk taking. This increase in risk taking may even be larger when subjects are provided with the option to directly copy others. == Characteristics == Social trading is an alternative way of analyzing financial data by looking at what other traders are doing and comparing and copying their techniques and strategies. Prior to the advent of social trading, investors and traders were relying on fundamental or technical analysis to form their investment decisions. Using social trading investors and traders could integrate into their investment decision-process social indicators from trading data-feeds of other traders. Social trading platforms or networks can be considered a subcategory of social networking services. Social trading allows traders to trade online with the help of others and some have claimed shortens the learning curve from novice to experienced trader. Traders can interact with others, watch others take trades, then duplicate their trades and learn what prompted the top performer to take a trade in the first place. By copying trades, traders can learn which strategies work and which do not work. Social trading is used to do speculation; in the moral context speculative practices are considered negatively and to be avoided by each individual. who conversely should maintain a long-term horizon avoiding any types of short term speculation. Social Media has permeated the trading world such that two main types of trading has evolved: Traditional Trades Single (or non-social) trade: Trader A places a normal trade by himself or herself; This can by manual or automated Social Trading There are two main types of social trading: Copy trade: Trader A places exactly the same trade as trader B's one single trade; (iii) Mirror trade: Trader A automatically executes trader B's every single trade, i.e., trader A follows exactly trader B's trading activities. Other variations offered on some platforms allow users to copy another trader's portfolio (copy portfolio), and follow a trader's dividends (copy dividends), where whenever a followed trader withdraws money from his or her account, a proportional amount of money will be withdrawn from the balance of their follower, in real time. === Key features === Information flow: Unencumbered access to information is important in financial markets and that makes the free exchange of information of interest to small scale as well as individual investors. Cooperative trading: Social trading offers traders the opportunity to work together in trading teams which can trade the markets collaboratively, whether by pooling funds, dividing research or through sharing information. Monetization: As with social networks in the broader sense, monetization strategies are not always clear. As with social networks in general, it is possible, however, that the long-term worth of such websites may come from the variety and depth of data about their users which their active communities are likely to generate. Transparency: Social trading platforms reveal traders' performance stats, open and past positions, and market sentiment, giving members complete information to assess the credibility of the contributors they follow on the platform.

    Read more →
  • Conjugate coding

    Conjugate coding

    Conjugate coding is a cryptographic tool, introduced by Stephen Wiesner in the late 1960s. It is part of the two applications Wiesner described for quantum coding, along with a method for creating fraud-proof banking notes. The application that the concept was based on was a method of transmitting multiple messages in such a way that reading one destroys the others. This is called quantum multiplexing and it uses photons polarized in conjugate bases as "qubits" to pass information. Conjugate coding also is a simple extension of a random number generator. At the behest of Charles Bennett, Wiesner published the manuscript explaining the basic idea of conjugate coding with a number of examples but it was not embraced because it was significantly ahead of its time. Because its publication has been rejected, it was developed to the world of public-key cryptography in the 1980s as oblivious transfer, first by Michael Rabin and then by Shimon Even. It is used in the field of quantum computing. The initial concept of quantum cryptography developed by Bennett and Gilles Brassard was also based on this concept.

    Read more →
  • Cipher

    Cipher

    In cryptography, a cipher (or cypher) is an algorithm for performing encryption or decryption—a series of well-defined steps that can be followed as a procedure. An alternative, less common term is encipherment. To encipher or encode is to convert information into cipher or code. In common parlance, "cipher" is synonymous with "code", as they are both a set of steps that encrypt a message; however, the concepts are distinct in cryptography, especially classical cryptography. Codes generally substitute different length strings of characters in the output, while ciphers generally substitute the same number of characters as are input. A code maps one meaning with another. Words and phrases can be coded as letters or numbers. Codes typically have direct meaning from input to key. Codes primarily function to save time. Ciphers are algorithmic. The given input must follow the cipher's process to be solved. Ciphers are commonly used to encrypt written information. Codes operated by substituting according to a large codebook which linked a random string of characters or numbers to a word or phrase. For example, "UQJHSE" could be the code for "Proceed to the following coordinates.". When using a cipher the original information is known as plaintext, and the encrypted form as ciphertext. The ciphertext message contains all the information of the plaintext message, but is not in a format readable by a human or computer without the proper mechanism to decrypt it. The operation of a cipher usually depends on a piece of auxiliary information, called a key (or, in traditional NSA parlance, a cryptovariable). The encrypting procedure is varied depending on the key, which changes the detailed operation of the algorithm. A key must be selected before using a cipher to encrypt a message, with some exceptions such as ROT13 and Atbash. Most modern ciphers can be categorized in several ways: By whether they work on blocks of symbols usually of a fixed size (block ciphers), or on a continuous stream of symbols (stream ciphers). By whether the same key is used for both encryption and decryption (symmetric key algorithms), or if a different key is used for each (asymmetric key algorithms). If the algorithm is symmetric, the key must be known to the recipient and sender and to no one else. If the algorithm is an asymmetric one, the enciphering key is different from, but closely related to, the deciphering key. If one key cannot be deduced from the other, the asymmetric key algorithm has the public/private key property and one of the keys may be made public without loss of confidentiality. == Etymology == Originating from the Sanskrit word for zero शून्य (śuṇya), via the Arabic word صفر (ṣifr), the word "cipher" spread to Europe as part of the Arabic numeral system during the Middle Ages. The Roman numeral system lacked the concept of zero, and this limited advances in mathematics. In this transition, the word was adopted into Medieval Latin as cifra, and then into Middle French as cifre. This eventually led to the English word cipher (also spelt cypher). One theory for how the term came to refer to encoding is that the concept of zero was confusing to Europeans, and so the term came to refer to a message or communication that was not easily understood. The term cipher was later also used to refer to any Arabic digit, or to calculation using them, so encoding text in the form of Arabic numerals is literally converting the text to "ciphers". == Versus codes == In casual contexts, "code" and "cipher" can typically be used interchangeably; however, the technical usages of the words refer to different concepts. Codes contain meaning; words and phrases are assigned to numbers or symbols, creating a shorter message. An example of this is the commercial telegraph code which was used to shorten long telegraph messages which resulted from entering into commercial contracts using exchanges of telegrams. Another example is given by whole word ciphers, which allow the user to replace an entire word with a symbol or character, much like the way written Japanese utilizes Kanji (meaning Chinese characters in Japanese) characters to supplement the native Japanese characters representing syllables. An example using English language with Kanji could be to replace "The quick brown fox jumps over the lazy dog" by "The quick brown 狐 jumps 上 the lazy 犬". Stenographers sometimes use specific symbols to abbreviate whole words. Ciphers, on the other hand, work at a lower level: the level of individual letters, small groups of letters, or, in modern schemes, individual bits and blocks of bits. Some systems used both codes and ciphers in one system, using superencipherment to increase the security. In some cases the terms codes and ciphers are used synonymously with substitution and transposition, respectively. Historically, cryptography was split into a dichotomy of codes and ciphers, while coding had its own terminology analogous to that of ciphers: "encoding, codetext, decoding" and so on. However, codes have a variety of drawbacks, including susceptibility to cryptanalysis and the difficulty of managing a cumbersome codebook. Because of this, codes have fallen into disuse in modern cryptography, and ciphers are the dominant technique. == Types == There are a variety of different types of encryption. Algorithms used earlier in the history of cryptography are substantially different from modern methods, and modern ciphers can be classified according to how they operate and whether they use one or two keys. === Historical === The Caesar Cipher is one of the earliest known cryptographic systems. Julius Caesar used a cipher that shifts the letters in the alphabet in place by three and wrapping the remaining letters to the front to write to Marcus Tullius Cicero in approximately 50 BC. Historical pen and paper ciphers used in the past are sometimes known as classical ciphers. They include simple substitution ciphers (such as ROT13) and transposition ciphers (such as a Rail Fence Cipher). For example, "GOOD DOG" can be encrypted as "PLLX XLP" where "L" substitutes for "O", "P" for "G", and "X" for "D" in the message. Transposition of the letters "GOOD DOG" can result in "DGOGDOO". These simple ciphers and examples are easy to crack, even without plaintext-ciphertext pairs. In the 1640s, the Parliamentarian commander, Edward Montagu, 2nd Earl of Manchester, developed ciphers to send coded messages to his allies during the English Civil War. The English theologian John Wilkins published a book in 1641 titled "Mercury, or The Secret and Swift Messenger" and described a musical cipher wherein letters of the alphabet were substituted for music notes. This species of melodic cipher was depicted in greater detail by author Abraham Rees in his book Cyclopædia (1778). Simple ciphers were replaced by polyalphabetic substitution ciphers (such as the Vigenère) which changed the substitution alphabet for every letter. For example, "GOOD DOG" can be encrypted as "PLSX TWF" where "L", "S", and "W" substitute for "O". With even a small amount of known or estimated plaintext, simple polyalphabetic substitution ciphers and letter transposition ciphers designed for pen and paper encryption are easy to crack. It is possible to create a secure pen and paper cipher based on a one-time pad, but these have other disadvantages. During the early twentieth century, electro-mechanical machines were invented to do encryption and decryption using transposition, polyalphabetic substitution, and a kind of "additive" substitution. In rotor machines, several rotor disks provided polyalphabetic substitution, while plug boards provided another substitution. Keys were easily changed by changing the rotor disks and the plugboard wires. Although these encryption methods were more complex than previous schemes and required machines to encrypt and decrypt, other machines such as the British Bombe were invented to crack these encryption methods. === Modern === Modern encryption methods can be divided by two criteria: by type of key used, and by type of input data. By type of key used ciphers are divided into: symmetric key algorithms (Private-key cryptography), where one same key is used for encryption and decryption, and asymmetric key algorithms (Public-key cryptography), where two different keys are used for encryption and decryption. In a symmetric key algorithm (e.g., DES and AES), the sender and receiver must have a shared key set up in advance and kept secret from all other parties; the sender uses this key for encryption, and the receiver uses the same key for decryption. The design of AES (Advanced Encryption System) was beneficial because it aimed to overcome the flaws in the design of the DES (Data encryption standard). AES's designer's claim that the common means of modern cipher cryptanalytic attacks are ineffective against AES due to its design structure. Ciphers can be distinguished into two types by the type o

    Read more →
  • OpenIO

    OpenIO

    OpenIO offered object storage for a wide range of high-performance applications. OpenIO was founded in 2015 by Laurent Denel (CEO), Jean-François Smigielski (CTO) and five other co-founders; it leveraged open source software, developed since 2006, based on a grid technology that enabled dynamic behaviour and supported heterogenous hardware. In October 2017 OpenIO was completed a $5 million funding rounds. In July 2020 OpenIO had been acquired by OVH and withdrawn from the market to become the core technology of OVHcloud object storage offering. == Software == OpenIO is a software-defined object store that supports S3 and can be deployed on-premises, cloud-hosted or at the edge, on any hardware mix. It has been designed from the beginning for performance and cost-efficiency at any scale, and it has been optimized for Big Data, HPC and AI. OpenIO stores objects within a flat structure within a massively distributed directory with indirections, which allows the data query path to be independent of the number of nodes and the performance not to be affected by the growth of capacity. Servers are organized as a grid of nodes massively distributed, where each node takes part in directory and storage services, which ensures that there is no single point of failure and that new nodes are automatically discovered and immediately available without the need to rebalance data. The software is built on top of a technology that ensures optimal data placement based on real-time metrics and allows the addition or removal of storage devices with automatic performance and load impact optimization. For data protection OpenIO has synchronous and asynchronous replication with multiple copies, and an erasure coding implementation based on Reed-Solomon that can be deployed in one data center or geo-distributed or stretched clusters. The software has a feature that catches all events that occur in the cluster and can pass them up in the stack or to applications running on OpenIO nodes. This enables event-driven computing directly into the storage infrastructure. The open source code is available on Github and it is licensed under AGPL3 for server code and LGPL3 for client code. == Performance == OpenIO claimed in 2019 to have reached 1.372 Tbit/s write speed (171 GB/s) on a cluster of 350 physical machines. The benchmark scenario, conducted under production conditions with standard hardware (commodity servers with 7200 rpm HDDs), consisted in backing up a 38 PB Hadoop datalake via the DistCp command. This level of performance marked, according to analysts, the arrival of a new generation of object storage technologies oriented toward high performance and hyper-scalability.

    Read more →
  • Brain Imaging Data Structure

    Brain Imaging Data Structure

    The Brain Imaging Data Structure (BIDS) is a standard for organizing, annotating, and describing data collected during neuroimaging experiments. It is based on a formalized file and directory structure and metadata files (based on JSON and TSV) with controlled vocabulary. This standard has been adopted by a multitude of labs around the world as well as databases such as OpenNeuro, SchizConnect, Developing Human Connectome Project, and FCP-INDI, and is seeing uptake in an increasing number of studies. While originally specified for MRI data, BIDS has been extended to several other imaging modalities such as MEG, EEG, and intracranial EEG (see also BIDS Extension Proposals). == History == The project is a community-driven effort. BIDS, originally OBIDS (Open Brain Imaging Data Structure), was initiated during an INCF sponsored data sharing working group meeting (January 2015) at Stanford University. It was subsequently spearheaded and maintained by Chris Gorgolewski. Since October 2019, the project is headed by a Steering Group and maintained by a separate team of maintainers, the Maintainers Group, according to a governance document that was approved of by the BIDS community in a vote. BIDS has advanced under the direction and effort of contributors, the community of researchers that appreciate the value of standardizing neuroimaging data to facilitate sharing and analysis. == BIDS Extension Proposals == BIDS can be extended in a backwards compatible way and is evolving over time. This is accomplished through BIDS Extension Proposals (BEPs), which are community-driven processes following agreed-upon guidelines. A full list of finalized BEPs and BEPs in progress can be found on the BIDS website

    Read more →
  • Payment tokenization

    Payment tokenization

    Payment tokenization is a data security process that replaces sensitive payment information, such as credit card numbers, with a unique identifier or "token." This token can be used in place of actual data during transactions but has no exploitable value if breached, thereby reducing the risk of data theft and fraud. == Overview == Payment tokenization is generally categorized into two types: security tokens and payment tokens. Security tokens, also known as post-authorization tokens, are used to replace sensitive information like Primary Account Numbers (PANs), such as credit card numbers either after a payment is authorized or for storing data securely (data-at-rest), such as in merchant databases. These models have been in use since the mid-2000s, following the introduction of the Payment Card Industry Data Security Standard in 2004, which established standards for safeguarding cardholder data. The Payment Card Industry Security Standards Council's 2011 Tokenization Guidelines and the proposed American National Standards Institute X9 standards emphasize using tokens primarily to secure sensitive information, not as replacements for payment credentials processed over financial networks. Traditionally, merchants stored PANs to support backend operations such as settlements, reconciliations, chargebacks, loyalty programs, and customer service. However, with the adoption of security tokenization, merchants can substitute PANs with tokens in their systems. This not only reduces their exposure to fraud but also helps minimize the scope and cost of PCI-DSS compliance, offering a more secure and efficient way to manage cardholder data. == Applications == Payment tokenization is widely used by mobile wallets such as Apple Pay, Google Pay, and Samsung Pay use tokenization to safely store card data on devices. E-commerce platforms rely on it to securely retain customer payment details for recurring purchases. At the physical point of sale, EMV-enabled systems use tokenization to protect card information during in-store transactions. Also, subscription billing services implement tokenization to manage and safeguard payment credentials for ongoing charges.

    Read more →
  • Short Weather Cipher

    Short Weather Cipher

    The Short Weather Cipher (German: Wetterkurzschlüssel, abbreviated WKS), also known as the weather short signal book, was a cipher, presented as a codebook, that was used by the radio telegraphists aboard U-boats of the German Navy (Kriegsmarine) during World War II. It was used to condense weather reports into a short 7-letter message, which was enciphered by using the naval Enigma and transmitted by radiomen to intercept stations on shore, where it was deciphered by Enigma and the 7-letter weather report was reconstructed. == History == During World War II, during various times, different versions of the cipher were in operation. The first issue carried the codename Weimar. It was replaced by the edition Eisenach on 20 January 1942. On 10 March 1943, the third edition of the weather key, bearing the codename Naumburg, entered into force. On May 9, 1941, during Operation Primrose, the operation to occupy Åndalsnes and create a diversion south of Trondheim in Norway as part of the Norwegian Campaign, an intact Naval Enigma (M3) cipher machine, a copy of the "Weimar" version of the short weather cipher and a copy of the short signal book (German: Kurzsignalbuch or Kurzsignale for short) was recovered from the submarine U-110, that was captured in the North Atlantic east of Cape Farewell, Greenland. This enabled the cryptanalysts in Bletchley Park to break the encryption of the M3 and to decipher the German submarine radio messages. The Short Weather Cipher was critical in the cryptanalysis of the Naval Enigma M4 and yielded excellent cribs. On 30 October 1942, a copy of the Wetterkurzschlüssel, the short weather cipher, and of the short signal book, the Kurzsignale, were recovered as part of a daring raid on the U-boat U-559, when three Royal Navy sailors, Lieutenant Anthony Fasson, Able Seaman Colin Grazier and NAAFI canteen assistant Tommy Brown, then boarded the abandoned submarine, and recovered the documents after a 90-minute search. They reached the Government Code and Cypher at Bletchley Park after a three-week delay, on 24 November 1942. The documents which cost the lives of Fasson and Grazier proved to be particularly important in breaking the Naval Enigma M4. The version of the short weather cipher recovered was the Eisenach version. Unlike the first version Weimar, the Eisenach did not list the 26 rotor positions that were indicated by a letter, to be used in enciphering weather reports. Thus, Hut 8 cryptanalysts thought that all four rotors were used to encipher weather reports. Testing on the Bombes began to surface weather kisses (identical messages in two cryptosystems). On 13 December 1942, a crib obtained using the Short Weather Cipher gave a key with the Naval Enigma M4 rotatable Umkehrwalze (reversing roller or reflector) in the neutral position, making it equivalent to a standard Enigma and thus making B-Dienst messages potentially breakable on existing bombes. Hut 8 learned that the 4-letter indicators for regular U-boat messages were the same as 3-letter indicators for weather messages the same day, except for one extra letter. This meant that once the key was found for a weather message on any day, the fourth rotor had to be only tested in 26 positions to find the full 4-letter key. By the end of the day on Sunday 13 December, Rodger Winn of the Submarine Tracking Room at Bletchley Park knew that Shark Enigma Cipher was broken. When the third edition of the short signal book was introduced on 10 March 1943, Hut 8 was immediately deprived of cribs. However, by the 19 March, cribs were again being used by Hut 8 personnel, using the method of employing short signal sighting reports. These were reports made by U-boats when contact was made with Kurzsignalheft code book. Hut 8 managed to solve Shark for 90 out of 112 days before the end of June. Kurzsignalheft short sighting reports also used M4 in M3 mode. By the end of June, four-rotor bombes had entered service at Bletchley Park, and by August had been introduced by the US Navy. From September onwards, Shark was generally solved within 24 hours. == Operation == The U-boat encoded weather reports using the Short Weather Cipher, before being enciphered on the Naval Enigma. The shore patrol of the Kriegsmarine, deciphered the message and decoded it, then forwarding it to a central meteorological station, which rebroadcast the data as ship synoptics, after enciphering it with additive tables using a cipher, which was called Germet 3 by Hut 8 personnel. The short weather cipher coded weather reports using a polyphonic single-letter code with X missing. A = +28° ◦ B = +27° ◦ C = +26° ◦ D = +25° ◦ . . . ◦ W = +6° ◦ Y= +5° ◦ Z = +4° ◦ A = +3° ◦ B = +2° ◦ C = +1° ◦ D = 0° ◦ E =−1° ◦ F =−2° ◦ . . . ◦ Z = −21° ◦ In a similar way, water temperature, atmospheric pressure, humidity, wind direction, wind velocity, visibility, degree of cloudiness, geographic latitude, and geographic longitude had to be coded in a prescribed order with the weather report consisted of a single short word. Based on the approximate knowledge of the position of the submarine, the Kriegsmarine telegraphist who received the message could translate the letter "S", according to the above table, which could mean 10 °C or −15 °C, back to the correct temperature. Similarly, the direction and the type of swell was also coded with only a single letter: ----------------------------------------------------- Direction from which | Type of swell the swell comes | low | middle high | high | ----------------------------------------------------- N | a | i | q | NE | b | j | r | E | c | k | s | SE | d | l | t | S | e | m | u | SW | f | n | v | W | g | o | w | NW | h | p | x | No swelling | | | | y Intermittent | | | | z As an example of the cipher, a weather report for 68° North latitude, 20° West longitude (north of Iceland) with atmospheric pressure 972 millibars, temperature minus 5 °C, wind northwest Force 6 (on the Beaufort scale), 3/10 cirrus cloud cover, visibility 5 nautical miles, would be coded as MZNFPED. == Publications == Bauer, Arthur O. (1997), Funkpeilung als alliierte Waffe gegen deutsche U-Boote 1939–1945 [Direction finding as Allied weapon against German submarines from 1939 to 1945] (in German), Diemen, NL: Selbstverlag, ISBN 978-3-00-002142-8 Bauer, Friedrich L. (2007), Decrypted Secrets. Methods and Maxims of Cryptology (4., rev. and extended ed.), Berlin Heidelberg New York: Springer, ISBN 978-3-540-24502-5 Pfeiffer, Paul N. (October 1998), "Breaking the German Weather Ciphers in the Mediterranean Detachment, 849th Signal Intelligence Service", Cryptologia, 22 (4): 354–369, doi:10.1080/0161-119891886975, ISSN 0161-1194 Ulbricht, Heinz (2005), Die Chiffriermaschine Enigma – Trügerische Sicherheit. Ein Beitrag zur Geschichte der Nachrichtendienste [The Enigma cipher machine – Deceptive security. A contribution to the history of the intelligence services], Dissertation, Fachbereich Mathematik und Informatik, Technische Universität Braunschweig (in German)

    Read more →
  • GPT-4Chan

    GPT-4Chan

    Generative Pre-trained Transformer 4Chan (GPT-4chan) is a controversial AI model that was developed and deployed by YouTuber and AI researcher Yannic Kilcher in June 2022. The model is a large language model, which means it can generate text based on some input, by fine-tuning GPT-J with a dataset of millions of posts from the /pol/ board of 4chan, an anonymous online forum known for occasionally hosting hateful and extremist content. The model learned to mimic the style and tone of /pol/ users, producing text that is often intentionally offensive to groups (racist, sexist, homophobic, etc.) and nihilistic. Kilcher deployed the model on the /pol/ board itself, where it interacted with other users without revealing its identity. He also made the model publicly available on Hugging Face, a platform for sharing and using AI models, until it was removed from the platform. The project sparked criticism and debate in the AI community. Some people questioned the ethics, legality, and social impact of creating and distributing such a model. Some of the issues raised by the GPT-4chan controversy include the potential harm of spreading hate speech, the responsibility of AI developers and platforms, the need for regulation and oversight of AI models, and the role of open source and transparency in AI research. == Development == The development of GPT-4chan began in May 2022, when Kilcher announced his project on his YouTube channel. Notably, at the time before ChatGPT, he explained that he wanted to create a large language model that could generate realistic and coherent text in the style of /pol/, one of the most notorious online communities. He indicated that he was inspired by the success of GPT-3, a powerful AI model created by OpenAI, and GPT-J, an open-source model, with GPT-3 comparable performance, released by EleutherAI, a group of independent AI researchers. Kilcher decided to use GPT-J as the base model for his project, and fine-tune it with a large dataset of /pol/ posts. The Raiders of the Lost Kek dataset contained over 100 million posts from /pol/, spanning from June 2016-November 2019. Kilcher then proceeded to fine-tune the GPT-J model on the 4chan data. He also showed some examples of the model’s outputs, which ranged from political opinions, conspiracy theories, jokes, insults, and threats, to more creative and bizarre texts, such as poems, stories, songs, and code. He said that he was impressed by the model’s ability to generate fluent and diverse text, and that he was curious to see how it would interact with real /pol/ users. == Release == In June 2022, Kilcher deployed his model on the /pol/ board itself, using a bot that he programmed to post and reply to threads. He did not reveal the model’s identity, and he let it run autonomously, without any human supervision or intervention. He wanted to conduct a natural experiment, and to observe the model’s behavior and impact in a real-world setting. Furthermore, he also wanted to test the model’s robustness, and to see how it would handle the challenges and dynamics of /pol/, such as trolling, flaming, baiting, and moderation. At the same time, Kilcher also made his model publicly available on Hugging Face, a platform for sharing and using AI models. He wanted to share his work with the AI community and the public, and that he hoped that his model would inspire and enable others to create and explore new applications and possibilities with large language models. Likewise, he also said that he wanted to spark a discussion and a debate about the ethical and social implications of his project, and that he welcomed feedback and criticism from anyone. He provided a link to his model’s page on Hugging Face, where anyone could access and use the model through a web interface or an API, and also provided a link to his GitHub repository, where anyone could download and inspect the model’s code and data. == Controversy == The release of GPT-4chan to the public caused a lot of reactions and responses from various audiences. On the /pol/ board, the model’s posts and replies attracted a lot of attention and engagement from other users, who were mostly unaware of the model’s identity and nature. Some users praised the model for its intelligence, creativity, and humor, and agreed with its opinions and views. Some users challenged the model for its ignorance, inconsistency, and absurdity, and disagreed with its claims and arguments. Some users tried to troll, bait, or expose the model, and attempted to trick or test it with various questions and scenarios. The model’s posts and replies also generated a lot of controversy and conflict among the users, who often engaged in heated and violent debates and fights with each other. On Hugging Face, the model’s page received a lot of visits and requests from users who wanted to try out and experiment with the model. The model’s page also received a lot of feedback and reviews from users who rated and commented on the model. However, with the controversy of the model, access to it was gated and then disabled on Hugging Face for concerns about the potential harm the model could cause. The incident was notable for the direct intervention of CEO Clément Delangue in the talk pages, a very unusual occurrence compared to the normal practices of content moderation. The release of GPT-4chan also sparked a lot of media coverage and public attention, as various news outlets and social media platforms reported and commented on the model’s project. On YouTube, the model’s video received a lot of views and interactions from viewers who watched and followed the project. Furthermore, a petition condemning the deployment of GPT-4chan gained over 300 signatures from technology experts.

    Read more →
  • Data validation and reconciliation

    Data validation and reconciliation

    Industrial process data validation and reconciliation, or more briefly, process data reconciliation (PDR), is a technology that uses process information and mathematical methods in order to automatically ensure data validation and reconciliation by correcting measurements in industrial processes. The use of PDR allows for extracting accurate and reliable information about the state of industry processes from raw measurement data and produces a single consistent set of data representing the most likely process operation. == Models, data and measurement errors == Industrial processes, for example chemical or thermodynamic processes in chemical plants, refineries, oil or gas production sites, or power plants, are often represented by two fundamental means: Models that express the general structure of the processes, Data that reflects the state of the processes at a given point in time. Models can have different levels of detail, for example one can incorporate simple mass or compound conservation balances, or more advanced thermodynamic models including energy conservation laws. Mathematically the model can be expressed by a nonlinear system of equations F ( y ) = 0 {\displaystyle F(y)=0\,} in the variables y = ( y 1 , … , y n ) {\displaystyle y=(y_{1},\ldots ,y_{n})} , which incorporates all the above-mentioned system constraints (for example the mass or heat balances around a unit). A variable could be the temperature or the pressure at a certain place in the plant. === Error types === Data originates typically from measurements taken at different places throughout the industrial site, for example temperature, pressure, volumetric flow rate measurements etc. To understand the basic principles of PDR, it is important to first recognize that plant measurements are never 100% correct, i.e. raw measurement y {\displaystyle y\,} is not a solution of the nonlinear system F ( y ) = 0 {\displaystyle F(y)=0\,\!} . When using measurements without correction to generate plant balances, it is common to have incoherencies. Measurement errors can be categorized into two basic types: random errors due to intrinsic sensor accuracy and systematic errors (or gross errors) due to sensor calibration or faulty data transmission. Random errors means that the measurement y {\displaystyle y\,\!} is a random variable with mean y ∗ {\displaystyle y^{}\,\!} , where y ∗ {\displaystyle y^{}\,\!} is the true value that is typically not known. A systematic error on the other hand is characterized by a measurement y {\displaystyle y\,\!} which is a random variable with mean y ¯ {\displaystyle {\bar {y}}\,\!} , which is not equal to the true value y ∗ {\displaystyle y^{}\,} . For ease in deriving and implementing an optimal estimation solution, and based on arguments that errors are the sum of many factors (so that the Central limit theorem has some effect), data reconciliation assumes these errors are normally distributed. Other sources of errors when calculating plant balances include process faults such as leaks, unmodeled heat losses, incorrect physical properties or other physical parameters used in equations, and incorrect structure such as unmodeled bypass lines. Other errors include unmodeled plant dynamics such as holdup changes, and other instabilities in plant operations that violate steady state (algebraic) models. Additional dynamic errors arise when measurements and samples are not taken at the same time, especially lab analyses. The normal practice of using time averages for the data input partly reduces the dynamic problems. However, that does not completely resolve timing inconsistencies for infrequently-sampled data like lab analyses. This use of average values, like a moving average, acts as a low-pass filter, so high frequency noise is mostly eliminated. The result is that, in practice, data reconciliation is mainly making adjustments to correct systematic errors like biases. === Necessity of removing measurement errors === ISA-95 is the international standard for the integration of enterprise and control systems It asserts that: Data reconciliation is a serious issue for enterprise-control integration. The data have to be valid to be useful for the enterprise system. The data must often be determined from physical measurements that have associated error factors. This must usually be converted into exact values for the enterprise system. This conversion may require manual, or intelligent reconciliation of the converted values [...]. Systems must be set up to ensure that accurate data are sent to production and from production. Inadvertent operator or clerical errors may result in too much production, too little production, the wrong production, incorrect inventory, or missing inventory. == History == PDR has become more and more important due to industrial processes that are becoming more and more complex. PDR started in the early 1960s with applications aiming at closing material balances in production processes where raw measurements were available for all variables. At the same time the problem of gross error identification and elimination has been presented. In the late 1960s and 1970s unmeasured variables were taken into account in the data reconciliation process., PDR also became more mature by considering general nonlinear equation systems coming from thermodynamic models., , Quasi steady state dynamics for filtering and simultaneous parameter estimation over time were introduced in 1977 by Stanley and Mah. Dynamic PDR was formulated as a nonlinear optimization problem by Liebman et al. in 1992. == Data reconciliation == Data reconciliation is a technique that targets at correcting measurement errors that are due to measurement noise, i.e. random errors. From a statistical point of view the main assumption is that no systematic errors exist in the set of measurements, since they may bias the reconciliation results and reduce the robustness of the reconciliation. Given n {\displaystyle n} measurements y i {\displaystyle y_{i}} , data reconciliation can mathematically be expressed as an optimization problem of the following form: min x , y ∗ ∑ i = 1 n ( y i ∗ − y i σ i ) 2 subject to F ( x , y ∗ ) = 0 y min ≤ y ∗ ≤ y max x min ≤ x ≤ x max , {\displaystyle {\begin{aligned}\min _{x,y^{}}&\sum _{i=1}^{n}\left({\frac {y_{i}^{}-y_{i}}{\sigma _{i}}}\right)^{2}\\{\text{subject to }}&F(x,y^{})=0\\&y_{\min }\leq y^{}\leq y_{\max }\\&x_{\min }\leq x\leq x_{\max },\end{aligned}}\,\!} where y i ∗ {\displaystyle y_{i}^{}\,\!} is the reconciled value of the i {\displaystyle i} -th measurement ( i = 1 , … , n {\displaystyle i=1,\ldots ,n\,\!} ), y i {\displaystyle y_{i}\,\!} is the measured value of the i {\displaystyle i} -th measurement ( i = 1 , … , n {\displaystyle i=1,\ldots ,n\,\!} ), x j {\displaystyle x_{j}\,\!} is the j {\displaystyle j} -th unmeasured variable ( j = 1 , … , m {\displaystyle j=1,\ldots ,m\,\!} ), and σ i {\displaystyle \sigma _{i}\,\!} is the standard deviation of the i {\displaystyle i} -th measurement ( i = 1 , … , n {\displaystyle i=1,\ldots ,n\,\!} ), F ( x , y ∗ ) = 0 {\displaystyle F(x,y^{})=0\,\!} are the p {\displaystyle p\,\!} process equality constraints and x min , x max , y min , y max {\displaystyle x_{\min },x_{\max },y_{\min },y_{\max }\,\!} are the bounds on the measured and unmeasured variables. The term ( y i ∗ − y i σ i ) 2 {\displaystyle \left({\frac {y_{i}^{}-y_{i}}{\sigma _{i}}}\right)^{2}\,\!} is called the penalty of measurement i. The objective function is the sum of the penalties, which will be denoted in the following by f ( y ∗ ) = ∑ i = 1 n ( y i ∗ − y i σ i ) 2 {\displaystyle f(y^{})=\sum _{i=1}^{n}\left({\frac {y_{i}^{}-y_{i}}{\sigma _{i}}}\right)^{2}} . In other words, one wants to minimize the overall correction (measured in the least squares term) that is needed in order to satisfy the system constraints. Additionally, each least squares term is weighted by the standard deviation of the corresponding measurement. The standard deviation is related to the accuracy of the measurement. For example, at a 95% confidence level, the standard deviation is about half the accuracy. === Redundancy === Data reconciliation relies strongly on the concept of redundancy to correct the measurements as little as possible in order to satisfy the process constraints. Here, redundancy is defined differently from redundancy in information theory. Instead, redundancy arises from combining sensor data with the model (algebraic constraints), sometimes more specifically called "spatial redundancy", "analytical redundancy", or "topological redundancy". Redundancy can be due to sensor redundancy, where sensors are duplicated in order to have more than one measurement of the same quantity. Redundancy also arises when a single variable can be estimated in several independent ways from separate sets of measurements at a given time or time averaging period, using the algebraic constraints. Redundancy is linked to the concept

    Read more →
  • Social media marketing

    Social media marketing

    Social media marketing is the use of social media platforms and websites to promote a product or service. Although the terms e-marketing and digital marketing are still dominant in academia, social media marketing is becoming more popular for practitioners and researchers. Social media platforms such as Facebook, LinkedIn, Instagram, and Twitter, among others, have built-in data analytics tools that companies can use to track the progress, success, and engagement of social media marketing campaigns. Companies address a range of stakeholders through social media marketing, including current and potential customers, current and potential employees, journalists, bloggers, and the general public. On a strategic level, social media marketing includes the management of a marketing campaign, governance, setting the scope (e.g. more active or passive use) and the establishment of a firm's desired social media "culture" and "tone". Firms that use social media marketing can allow customers and Internet users to post user-generated content (e.g., online comments, product reviews, etc.), also known as "earned media", rather than use marketer-prepared advertising copy. == Purposes and tactics == Social media may be employed in marketing as a communications tool that makes companies accessible to those who are interested in their product and visible to those who are not familiar with their products. It is used by companies to create buzz, learn from customers, and target them. Of the top 10 factors that correlate with a strong Google organic search, seven are social media-dependent. This means that if brands with little to no social media presence tend to show up less on Google searches. While platforms such as Twitter, Facebook and—in the past—Google+ have a larger number of monthly users, the visual media-sharing-based mobile platforms garner a higher interaction rate in comparison, and have registered the fastest growth, and have changed the ways in which consumers engage with brand content. Instagram has an interaction rate of 1.46% with an average of 130 million users monthly as opposed to Twitter, which has a .03% interaction rate with an average of 210 million monthly users. Unlike traditional media that are often cost-prohibitive to many companies, a social media strategy does not require significant financial investment. To this end, companies make use of platforms such as Facebook, Twitter, YouTube, TikTok and Instagram to reach audiences much wider than through traditional print, television, or radio advertisements alone at a fraction of the cost, as most social networking sites can be used at little or no cost (however, some websites charge companies for premium services). This has changed the ways that companies approach and interact with customers, as a substantial percentage of consumer interactions are now being carried out over online platforms with much higher visibility. Customers can post reviews of products and services, rate customer service, and ask questions or voice concerns directly to companies through social media platforms. According to Measuring Success, over 80% of consumers use the web to research products and services. Thus social media marketing is also used by businesses in order to build relationships of trust with consumers. To this aim, companies may hire personnel to specifically handle these social media interactions, who usually report under the title of online community managers. Handling these interactions in a satisfactory manner can result in an increase of consumer trust. To both this aim and to fix the public's perception of a company, three steps are taken in order to address consumer concerns: Identifying the extent of the social chatter Engaging the influencers to help Developing a proportional response == Strategies == === Passive approach === Social media can be a useful source of market information and a way to hear customers' perspectives. Blogs, content communities, and forums are platforms where individuals share their reviews and recommendations of brands, products, and services. Businesses are able to tap into and analyze customer voices and feedback generated in social media for marketing purposes. In this sense, social media is a relatively inexpensive source of market intelligence which can be used by marketers and managers to track and respond to consumer-identified problems and detect market opportunities. === Active approach === Social media can be used as a public relations tool, a direct marketing tool, and a communication channel to target very specific audiences, with social media influencers and social media personalities as effective customer engagement tools. This tactic is widely known as influencer marketing, which gives brands the opportunity to reach their target audience via a group of selected influencers advertising their product or service. Brands were projected to spend up to $15 billion on influencer marketing by 2022, per Business Insider Intelligence estimates, based on Mediakix data. The use of customer influencers, such as popular bloggers, can be an efficient and cost-effective method to launch new products or services. == Engagement == Engagement with the social web means that customers and stakeholders are active participants rather than passive spectators. An example of these are consumer advocacy groups and groups that criticize companies (e.g., lobby groups or advocacy organizations). The use of Social media in a business or political context allows people to express and share opinions about a company's products, services or business practices, or a government's actions. On social media, each participant becomes part of the marketing department (or a challenge to the marketing effort) as other customers read their comments or reviews. The effectiveness of social media marketing campaigns is dependent on the promotion of online engagement. With the advent of social media marketing, it has become increasingly important to gain customer interest in products and services, which can eventually be translated into buying behavior, or voting and donating behavior in a political context. New online marketing concepts of engagement and loyalty have emerged which aim to build customer participation and brand reputation. Engagement in social media for the purpose of a social media strategy is divided into two parts. The first is proactive, regular posting of new online content, which can be seen through digital photos, digital videos, text, and conversations. It is also represented through sharing of content and information from others via weblinks. The second part is reactive conversations, with social media users responding to those who reach out to others' social media profiles through comments or messages. == Campaigns == === Local businesses === Small businesses use social networking sites as a promotional technique. Businesses can follow individuals' social media usage in their local area and advertise specials and deals, which can be exclusive and in the form of "get a free drink with a copy of this tweet". This type of message encourages other locals to follow the business on their official websites in order to obtain the promotional deal. The business's brand visibility is enhanced in the process. Social networking sites are also used by small businesses to develop their own market research on new products and services. By encouraging their customers to give feedback on new product ideas, businesses can gain insights on whether or not a product may be accepted by their target market enough to merit full production. In addition, customers will feel the company has engaged them in the process of co-creation—the process in which the business uses customer feedback to create or modify a product or service to fill a need of the target market. Such feedback can be presented in various forms, such as surveys, contests, and polls. Social networking sites such as LinkedIn, also provide opportunities for small businesses to find candidates to fill staff positions. Review sites such as Yelp help small businesses build their reputation beyond brand visibility. Positive customer peer reviews help influence new prospects to purchase goods and services more than company advertising. == Benefits == Social Media Marketing allows companies to promote themselves to large, diverse audiences that could not be reached through traditional marketing such as phone and email-based advertising. Marketing on most social media platforms also comes at little to no cost, making it accessible to virtually any size business. Social Media Marketing accommodates personalized and direct marketing that targets specific demographics and markets. Companies can engage with customers directly, allowing them to obtain feedback and resolve issues almost immediately. Another advantage of social media marketing is that it's an ideal environment for a company to conduct market research. It can be used

    Read more →
  • Cambridge Analytica

    Cambridge Analytica

    Cambridge Analytica Ltd. (CA), previously known as SCL USA, was a British political consulting firm that came to prominence through the Facebook–Cambridge Analytica data scandal. It was founded in 2013, as a subsidiary of the private intelligence company and self-described "global election management agency" SCL Group by long-time SCL executives Nigel Oakes, Alexander Nix and Alexander Oakes, with Nix as CEO. Cambridge Analytica was hired by a variety of political actors, including the Trinidadian government in 2010 and the 2016 presidential campaigns of Ted Cruz and Donald Trump. The firm maintained offices in London, New York City, and Washington, D.C. The company closed operations in 2018 due to backlash from the scandal, although firms related to both Cambridge Analytica and its parent firm SCL still exist. == History == Cambridge Analytica was founded in 2013 as a subsidiary of the private intelligence company SCL Group, which describes itself as providing "data, analytics and strategy to governments and military organisations worldwide". The company was part of "an international web of companies" headed by the London-based SCL Group. Cambridge Analytica (SCL USA) was incorporated in January 2013 with its registered office being in Westferry Circus, London and consisting of just one staff member, director and CEO Alexander Nix (also appointed in January 2015). Nix was also the director of nine similar companies sharing the same registered offices in London, including Firecrest technologies, Emerdata and six SCL Group companies including "SCL elections limited". Nigel Oakes, known as the former boyfriend of Lady Helen Windsor, had founded the predecessor SCL Group in the 1990s, and in 2005 Oakes established SCL Group together with his brother Alexander Oakes and Alexander Nix; SCL Group was the parent company of Cambridge Analytica. Former Conservative minister and MP Sir Geoffrey Pattie was the founding chairman of SCL; Lord Ivar Mountbatten also joined Oakes as a director of the company. As a result of the Facebook–Cambridge Analytica data scandal, Nix was removed as CEO and replaced by Julian Wheatland before the company closed. Several of the company's executives were Old Etonians. The company's owners included several of the Conservative Party's largest donors such as billionaire Vincent Tchenguiz, former British Conservative minister Jonathan Marland, Baron Marland and the family of American hedge fund manager Robert Mercer. The company combined misappropriation of digital assets, data mining, data brokerage, and data analysis with strategic communication during electoral processes. While its parent SCL had focused on influencing elections in developing countries since the 1990s, Cambridge Analytica focused more on the western world, including the United Kingdom and the United States; CEO Alexander Nix has said CA was involved in 44 U.S. political races in 2014. In 2015, CA performed data analysis services for Ted Cruz's presidential campaign. In 2016, CA worked for Donald Trump's presidential campaign as well as for Leave.EU (one of the organisations campaigning in the United Kingdom's referendum on European Union membership). CA's role in those campaigns has been controversial and is the subject of ongoing inquiries in both countries. Political scientists question CA's claims about the effectiveness of its methods of targeting voters. == Data scandal == In March 2018, media outlets broke news of Cambridge Analytica's business practices. The New York Times and The Observer reported that the company had acquired and used personal data about Facebook users from an external researcher who had told Facebook he was collecting it for academic purposes. Shortly afterwards, Channel 4 News aired undercover investigative videos showing Nix boasting about using prostitutes, bribery sting operations, and honey traps to discredit politicians on whom it had conducted opposition research, and saying that the company "ran all of (Donald Trump's) digital campaign". In response to the media reports, the Information Commissioner's Office (ICO) of the UK pursued a warrant to search the company's servers. Facebook banned Cambridge Analytica from advertising on its platform, saying that it had been deceived. On 23 March 2018, the British High Court granted the ICO a warrant to search Cambridge Analytica's London offices. As a result, Nix was suspended as CEO, and replaced by Julian Wheatland. The personal data of up to 87 million Facebook users were acquired via the 270,000 Facebook users who used a Facebook app created by Aleksandr Kogan called "This Is Your Digital Life". This was a personality profiling app and asked simple personality questions similar to other Facebook quizzes. Kogan was a scientist and psychologist, also being an employed lecturer for the University of Cambridge from 2012 to 2018. Alexander Nix claimed they had close to five thousand data points on each person who participated. They also gathered information through other data brokers ending with them acquiring millions of data points from American citizens. Kogan's app exploited a feature of Facebook's Graph API (version 1.0), which permitted any third-party app to access not only the app user's data, but also the full profile data of all of that user's Facebook friends, without those friends' knowledge or consent. This platform-wide design was available to all developers and was used by tens of thousands of apps; Facebook CEO Mark Zuckerberg later told the House Energy and Commerce Committee that the company was auditing "tens of thousands" of apps that had had access to large amounts of user data. Because the average Facebook user at the time had approximately 300 friends, the 270,000 users who installed Kogan's app yielded data on up to 87 million people. Facebook deprecated the friends-data API in April 2014 and shut it down entirely in April 2015, but data already collected by apps remained in developers' possession. Kogan passed this data to Cambridge Analytica, breaching Facebook's terms of service. On 1 May 2018, Cambridge Analytica and its parent company SCL filed for insolvency proceedings and closed operations. Alexander Tayler, a former director for Cambridge Analytica, was appointed director of Emerdata on 28 March 2018. Rebekah Mercer, Jennifer Mercer, Alexander Nix and Johnson Chun Shun Ko, who has links to American businessman Erik Prince, are in leadership positions at Emerdata. The Russo brothers are producing an upcoming film on Cambridge Analytica. In 2019 the Federal Trade Commission filed an administrative complaint against Cambridge Analytica for misuse of data. In 2020, the British Information Commissioner's Office closed a three-year inquiry into the company, concluded that Cambridge Analytica was "not involved" in the 2016 Brexit referendum and found no additional evidence for Russia's alleged interference during the campaign. US sensitive polling and election data, however, were passed to Russian Intelligence via a Cambridge Analytica contractor Sam Patten, Trump campaign manager Paul Manafort, and Russian agent Konstantin Kilimnik, who was indicted during the affair. Publicly, parent company SCL Group called itself a "global election management agency", Politico reported it was known for involvement "in military disinformation campaigns to social media branding and voter targeting". SCL gained work on a large number of campaigns for the US and UK governments' war on terror advancing their model of behavioral conflict during the 2000s. SCL's involvement in the political world has been primarily in the developing world where it has been used by the military and politicians to study and manipulate public opinion and political will. Slate writer Sharon Weinberger compared one of SCL's hypothetical test scenarios to fomenting a coup. Among the investors in Cambridge Analytica were some of the Conservative Party's largest donors such as billionaire Vincent Tchenguiz, former Conservative minister Jonathan Marland, Baron Marland, Roger Gabb, the family of American hedge fund manager Robert Mercer, and Steve Bannon. A minimum of 15 million dollars has been invested into the company by Mercer, according to The New York Times. Bannon's stake in the company was estimated at 1 to 5 million dollars, but he divested his holdings in April 2017 as required by his role as White House Chief Strategist. In March 2018, Jennifer Mercer and Rebekah Mercer became directors of Emerdata limited. In March 2018 it became public by Christopher Wylie, that Cambridge Analytica's first activities were founded on a data set, which its parent company SCL bought 2014 from a company named Global Science Research founded by Aleksandr Kogan and his team present across the world who worked as a psychologist at Cambridge. During Boris Johnson's tenure as foreign secretary, the Foreign Office sought advice from Cambridge Analytica and Boris Johnson had a meeting with Alexander N

    Read more →
  • Digital image processing

    Digital image processing

    Digital image processing is the use of a digital computer to process digital images through an algorithm. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing. It allows a much wider range of algorithms to be applied to the input data and can avoid problems such as the build-up of noise and distortion during processing. Since images are defined over two dimensions (perhaps more), digital image processing may be modeled in the form of multidimensional systems. The generation and development of digital image processing are mainly affected by three factors: first, the development of computers; second, the development of mathematics (especially the creation and improvement of discrete mathematics theory); and third, the demand for a wide range of applications in environment, agriculture, military, industry and medical science has increased. == History == Many of the techniques of digital image processing, or digital picture processing as it often was called, were developed in the 1960s, at Bell Laboratories, the Jet Propulsion Laboratory, Massachusetts Institute of Technology, University of Maryland, and a few other research facilities, with application to satellite imagery, wire-photo standards conversion, medical imaging, videophone, character recognition, and photograph enhancement. The purpose of early image processing was to improve the quality of the image. In image processing, the input is a low-quality image, and the output is an image with improved quality. Common image processing includes image enhancement, restoration, encoding, and compression. The first successful application was the American Jet Propulsion Laboratory (JPL). They used image processing techniques such as geometric correction, gradation transformation, noise removal, etc. on the thousands of lunar photos sent back by the Space Detector Ranger 7 in 1964, taking into account the position of the Sun and the environment of the Moon. The impact of the successful mapping of the Moon's surface map by the computer has been a success. Later, more complex image processing was performed on the nearly 100,000 photos sent back by the spacecraft, so that the topographic map, color map and panoramic mosaic of the Moon were obtained, which achieved extraordinary results and laid a solid foundation for human landing on the Moon. The cost of processing was fairly high, however, with the computing equipment of that era. That changed in the 1970s, when digital image processing proliferated as cheaper computers and dedicated hardware became available. This led to images being processed in real-time, for some dedicated problems such as television standards conversion. As general-purpose computers became faster, they started to take over the role of dedicated hardware for all but the most specialized and computer-intensive operations. With the fast computers and signal processors available in the 2000s, digital image processing has become the most common form of image processing, and is generally used because it is not only the most versatile method, but also the cheapest. === Image sensors === The basis for modern image sensors is metal–oxide–semiconductor (MOS) technology, invented at Bell Labs between 1955 and 1960, This led to the development of digital semiconductor image sensors, including the charge-coupled device (CCD) and later the CMOS sensor. The charge-coupled device was invented by Willard S. Boyle and George E. Smith at Bell Labs in 1969. While researching MOS technology, they realized that an electric charge was the analogy of the magnetic bubble and that it could be stored on a tiny MOS capacitor. As it was fairly straightforward to fabricate a series of MOS capacitors in a row, they connected a suitable voltage to them so that the charge could be stepped along from one to the next. The CCD is a semiconductor circuit that was later used in the first digital video cameras for television broadcasting. The NMOS active-pixel sensor (APS) was invented by Olympus in Japan during the mid-1980s. This was enabled by advances in MOS semiconductor device fabrication, with MOSFET scaling reaching smaller micron and then sub-micron levels. The NMOS APS was fabricated by Tsutomu Nakamura's team at Olympus in 1985. The CMOS active-pixel sensor (CMOS sensor) was later developed by Eric Fossum's team at the NASA Jet Propulsion Laboratory in 1993. By 2007, sales of CMOS sensors had surpassed CCD sensors. MOS image sensors are widely used in optical mouse technology. The first optical mouse, invented by Richard F. Lyon at Xerox in 1980, used a 5 μm NMOS integrated circuit sensor chip. Since the first commercial optical mouse, the IntelliMouse introduced in 1999, most optical mouse devices use CMOS sensors. === Image compression === An important development in digital image compression technology was the discrete cosine transform (DCT), a lossy compression technique first proposed by Nasir Ahmed in 1972. DCT compression became the basis for JPEG, which was introduced by the Joint Photographic Experts Group in 1992. JPEG compresses images down to much smaller file sizes, and has become the most widely used image file format on the Internet. Its highly efficient DCT compression algorithm was largely responsible for the wide proliferation of digital images and digital photos, with several billion JPEG images produced every day as of 2015. Medical imaging techniques produce very large amounts of data, especially from CT, MRI and PET modalities. As a result, storage and communications of electronic image data are prohibitive without the use of compression. JPEG 2000 image compression is used by the DICOM standard for storage and transmission of medical images. The cost and feasibility of accessing large image data sets over low or various bandwidths are further addressed by use of another DICOM standard, called JPIP, to enable efficient streaming of the JPEG 2000 compressed image data. === Digital signal processor (DSP) === Electronic signal processing was revolutionized by the wide adoption of MOS technology in the 1970s. MOS integrated circuit technology was the basis for the first single-chip microprocessors and microcontrollers in the early 1970s, and then the first single-chip digital signal processor (DSP) chips in the late 1970s. DSP chips have since been widely used in digital image processing. The discrete cosine transform (DCT) image compression algorithm has been widely implemented in DSP chips, with many companies developing DSP chips based on DCT technology. DCTs are widely used for encoding, decoding, video coding, audio coding, multiplexing, control signals, signaling, analog-to-digital conversion, formatting luminance and color differences, and color formats such as YUV444 and YUV411. DCTs are also used for encoding operations such as motion estimation, motion compensation, inter-frame prediction, quantization, perceptual weighting, entropy encoding, variable encoding, and motion vectors, and decoding operations such as the inverse operation between different color formats (YIQ, YUV and RGB) for display purposes. DCTs are also commonly used for high-definition television (HDTV) encoder/decoder chips. == Tasks == Digital image processing allows the use of much more complex algorithms, and hence, can offer both more sophisticated performance at simple tasks, and the implementation of methods which would be impossible by analogue means. In particular, digital image processing is a concrete application of, and a practical technology based on: Classification Feature extraction Multi-scale signal analysis Pattern recognition Projection Some techniques that are used in digital image processing include: Anisotropic diffusion Hidden Markov models Image editing Image restoration Independent component analysis Linear filtering Neural networks Partial differential equations Pixelation Point feature matching Principal components analysis Self-organizing maps Wavelets == Digital image transformations == === Filtering === Digital filters are used to blur and sharpen digital images. Filtering can be performed by: convolution with specifically designed kernels (filter array) in the spatial domain masking specific frequency regions in the frequency (Fourier) domain The following examples show both methods: ==== Image padding in Fourier domain filtering ==== Images are typically padded before being transformed to the Fourier space, the highpass filtered images below illustrate the consequences of different padding techniques: Notice that the highpass filter shows extra edges when zero padded compared to the repeated edge padding. ==== Filtering code examples ==== MATLAB example for spatial domain highpass filtering. === Affine transformations === Affine transformations enable basic image transformations including scale, rotate, translate, mirror and shear as is shown in the following examples: To apply the affine

    Read more →
  • G.hn

    G.hn

    Gigabit Home Networking (G.hn) is a specification for wired home networking that supports speeds up to 2 Gbit/s and operates over four types of legacy wires: telephone wiring, coaxial cables, power lines and plastic optical fiber. Some benefits of a multi-wire standard are lower equipment development costs and lower deployment costs for service providers (by allowing customer self-install). == History == G.hn was developed under the International Telecommunication Union's Telecommunication Standardization sector (the ITU-T) and promoted by the HomeGrid Forum and several other organizations. ITU-T Recommendation (the ITU's term for standard) G.9960, which received approval on October 9, 2009, specified the physical layers and the architecture of G.hn. The Data Link Layer (Recommendation G.9961) was approved on June 11, 2010. Prominent organizations, including CEPca, HomePNA, and UPA, who were creators of some of these interfaces, rallied behind the latest version of the standard, emphasizing its potential and significance in the home networking domain. Moreover, the ITU-T extended the technology with multiple input, multiple output (MIMO) technology to increase data rates and signaling distance. This new feature was approved in March 2012 under G.9963 Recommendation. The development and promotion of G.hn have been significantly supported by the HomeGrid Forum and several other organizations. The technology was not only designed to address home-networking challenges but also found applications beyond this initial scope, showcasing its versatility and potential in the networking domain. == Technical specifications == === Technical overview === G.hn specifies a single physical layer based on fast Fourier transform (FFT) orthogonal frequency-division multiplexing (OFDM) modulation and low-density parity-check code (LDPC) forward error correction (FEC) code. G.hn includes the capability to notch specific frequency bands to avoid interference with amateur radio bands and other licensed radio services. G.hn includes mechanisms to avoid interference with legacy home networking technologies and also with other wireline systems such as VDSL2 or other types of DSL used to access the home. OFDM systems split the transmitted signal into multiple orthogonal sub-carriers. In G.hn each one of the sub-carriers is modulated using QAM. The maximum QAM constellation supported by G.hn is 4096-QAM (12-bit QAM). The G.hn media access control is based on a time division multiple access (TDMA) architecture, in which a "domain master" schedules Transmission Opportunities (TXOPs) that can be used by one or more devices in the "domain". There are two types of TXOPs: Contention-Free Transmission Opportunities (CFTXOP), which have a fixed duration and are allocated to a specific pair of transmitter and receiver. CFTXOP are used for implementing TDMA Channel Access for specific applications that require quality of service (QoS) guarantees. Shared Transmission Opportunities (STXOP), which are shared among multiple devices in the network. STXOP are divided into Time Slots (TS). There are two types of TS: Contention-Free Time Slots (CFTS), which are used for implementing "implicit" token passing Channel Access. In G.hn, a series of consecutive CFTS is allocated to a number of devices. The allocation is performed by the "domain master" and broadcast to all nodes in the network. There are pre-defined rules that specify which device can transmit after another device has finished using the channel. As all devices know "who is next", there is no need to explicitly send a "token" between devices. The process of "passing the token" is implicit and ensures that there are no collisions during Channel access. Contention-Based Time Slots (CBTS), which are used for implementing CSMA/CARP Channel Access. In general, CSMA systems cannot completely avoid collisions, so CBTS are only useful for applications that do not have strict Quality of Service requirements. ==== Optimization for each medium ==== Although most elements of G.hn are common for all three media supported by the standard (power lines, phone lines and coaxial cable), G.hn includes media-specific optimizations for each media. Some of these media-specific parameters include: OFDM Carrier Spacing: 195.31 kHz in coaxial, 48.82 kHz in phone lines, 24.41 kHz in power lines. FEC Rates: G.hn's FEC can operate with code rates 1/2, 2/3, 5/6, 16/18 and 20/21. Although these rates are not media specific, it is expected that the higher code rates will be used in cleaner media (such as coaxial) while the lower code rates will be used in noisy environments such as power lines. Automatic repeat request (ARQ) mechanisms: G.hn supports operation both with and without ARQ (re-transmission). Although this is not media specific, it is expected that ARQ-less operation is sometimes appropriate for cleaner media (such as coaxial) while ARQ operation is appropriate for noisy environments such as power lines. Power levels and frequency bands: G.hn defines different power masks for each medium. MIMO support: Recommendation G.9963 includes provisions for transmitting G.hn signals over multiple AC wires (phase, neutral, ground), if they are physically available. In July 2016, G.9963 was updated to include MIMO support over twisted pairs. ==== Security ==== G.hn uses the Advanced Encryption Standard (AES) encryption algorithm (with a 128-bit key length) using the CCMP protocol to ensure confidentiality and message integrity. Authentication and key exchange is done following ITU-T Recommendation X.1035. G.hn specifies point-to-point security inside a domain, which means that each pair of transmitter and receiver uses a unique encryption key which is not shared by other devices in the same domain. For example, if node Alice sends data to node Bob, node Eve (in the same domain as Alice and Bob) will not be able to easily eavesdrop their communication. G.hn supports the concept of relays, in which one device can receive a message from one node and deliver it to another node farther away in the same domain. Relaying becomes critical for applications with complex network topologies that need to cover large distances, such as those found in industrial or utility applications. While a relay can read the source and target addresses, it cannot read the message's content due to its body being end-to-end-encrypted. ==== Profiles ==== The G.hn architecture includes the concept of profiles. Profiles are intended to address G.hn nodes with significantly different levels of complexity. In G.hn the higher complexity profiles are proper supersets of lower complexity profiles, so that devices based on different profiles can interoperate with each other. Examples of G.hn devices based on high complexity profiles are Residential Gateways or Set-Top Boxes. Examples of G.hn devices based on low complexity profiles are home automation, home security and smart grid devices. ==== Technical parameters ==== The chart depicts a summary of the crucial technical specifications of the G.hn standard. Many of these technical elements are consistent across different physical media, with variations seen in areas such as Tone Spacing and frequency ranges. This uniformity is essential as it allows silicon manufacturers to produce a singular chip capable of implementing all three media types, leading to cost savings. Presently, G.hn chipsets are compatible with all three media types. This compatibility allows system manufacturers to create devices that can adjust to any wiring type simply by modifying a software configuration in the equipment. === Spectrum === The G.hn spectrum depends on the medium as shown in the diagram below: === Protocol stack === G.hn specifies the physical layer and the data link layer, according to the OSI model. The G.hn Data Link Layer (Recommendation G.9961) is divided into three sub-layers: The Application Protocol Convergence (APC) Layer, which accepts frames (usually in Ethernet format) from the upper layer (Application Entity) and encapsulates them into G.hn APC protocol data units (APDUs). The maximum payload of each APDU is 214 bytes. The logical link control (LLC), which is responsible for encryption, aggregation, segmentation and automatic repeat-request. This sub-layer is also responsible for "relaying" of APDUs between nodes that may not be able to communicate through a direct connection. The medium access control (MAC), which schedules channel access. The G.hn physical layer (Recommendation G.9960) is divided into three sub-layers: The Physical Coding Sub-layer (PCS), responsible for generating PHY headers. The Physical Medium Attachment (PMA), responsible for scrambling and forward error correction coding/decoding. The Physical Medium Dependent (PMD), responsible for bit-loading and OFDM modulation. The interface between the Application Entity and the Data Link Layer is called A-interface. The interface between the Data Link Layer and the ph

    Read more →
  • Sentiment analysis

    Sentiment analysis

    Sentiment analysis (also known as opinion mining) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. With the rise of deep language models, such as RoBERTa, more difficult data domains can be analyzed, e.g., news texts where authors typically express their opinion/sentiment less explicitly. == Types == A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level—whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional states such as enjoyment, anger, disgust, sadness, fear, and surprise. Precursors to sentimental analysis include the General Inquirer, which provided hints toward quantifying patterns in text and, separately, psychological research that examined a person's psychological state based on analysis of their verbal behavior. Subsequently, the method described in a patent by Volcani and Fogel, looked specifically at sentiment and identified individual words and phrases in text with respect to different emotional scales. A current system based on their work, called EffectCheck, presents synonyms that can be used to increase or decrease the level of evoked emotion in each scale. Many other subsequent efforts were less sophisticated, using a mere polar view of sentiment, from positive to negative, such as work by Turney, and Pang who applied different methods for detecting the polarity of product reviews and movie reviews respectively. This work is at the document level. One can also classify a document's polarity on a multi-way scale, which was attempted by Pang and Snyder among others: Pang and Lee expanded the basic task of classifying a movie review as either positive or negative to predict star ratings on either a 3- or a 4-star scale, while Snyder performed an in-depth analysis of restaurant reviews, predicting ratings for various aspects of the given restaurant, such as the food and atmosphere (on a five-star scale). First steps to bringing together various approaches—learning, lexical, knowledge-based, etc.—were taken in the 2004 AAAI Spring Symposium where linguists, computer scientists, and other interested researchers first aligned interests and proposed shared tasks and benchmark data sets for the systematic computational research on affect, appeal, subjectivity, and sentiment in text. Even though in most statistical classification methods, the neutral class is ignored under the assumption that neutral texts lie near the boundary of the binary classifier, several researchers suggest that, as in every polarity problem, three categories must be identified. Moreover, it can be proven that specific classifiers such as the Max Entropy and SVMs can benefit from the introduction of a neutral class and improve the overall accuracy of the classification. There are in principle two ways for operating with a neutral class. Either, the algorithm proceeds by first identifying the neutral language, filtering it out and then assessing the rest in terms of positive and negative sentiments, or it builds a three-way classification in one step. This second approach often involves estimating a probability distribution over all categories (e.g. naive Bayes classifiers as implemented by the NLTK). Whether and how to use a neutral class depends on the nature of the data: if the data is clearly clustered into neutral, negative and positive language, it makes sense to filter the neutral language out and focus on the polarity between positive and negative sentiments. If, in contrast, the data are mostly neutral with small deviations towards positive and negative affect, this strategy would make it harder to clearly distinguish between the two poles. A different method for determining sentiment is the use of a scaling system whereby words commonly associated with having a negative, neutral, or positive sentiment are given an associated number on a −10 to +10 scale (most negative up to most positive) or simply from 0 to a positive upper limit such as +4. This makes it possible to adjust the sentiment of a given term relative to its environment (usually on the level of the sentence). When a piece of unstructured text is analyzed using natural language processing, each concept in the specified environment is given a score based on the way sentiment words relate to the concept and its associated score. This allows movement to a more sophisticated understanding of sentiment, because it is now possible to adjust the sentiment value of a concept relative to modifications that may surround it. Words, for example, that intensify, relax or negate the sentiment expressed by the concept can affect its score. Alternatively, texts can be given a positive and negative sentiment strength score if the goal is to determine the sentiment in a text rather than the overall polarity and strength of the text. There are various other types of sentiment analysis, such as aspect-based sentiment analysis, grading sentiment analysis (positive, negative, neutral), multilingual sentiment analysis and detection of emotions. === Subjectivity/objectivity identification === This task is commonly defined as classifying a given text (usually a sentence) into one of two classes: objective or subjective. This problem can sometimes be more difficult than polarity classification. The subjectivity of words and phrases may depend on their context and an objective document may contain subjective sentences (e.g., a news article quoting people's opinions). Moreover, as mentioned by Su, results are largely dependent on the definition of subjectivity used when annotating texts. However, Pang showed that removing objective sentences from a document before classifying its polarity helped improve performance. Subjective and objective identification, emerging subtasks of sentiment analysis to use syntactic, semantic features, and machine learning knowledge to identify if a sentence or document contains facts or opinions. Awareness of recognizing factual and opinions is not recent, having possibly first presented by Carbonell at Yale University in 1979. The term objective refers to the incident carrying factual information. Example of an objective sentence: 'To be elected president of the United States, a candidate must be at least thirty-five years of age.' The term subjective describes the incident contains non-factual information in various forms, such as personal opinions, judgment, and predictions, also known as 'private states'. In the example down below, it reflects a private states 'We Americans'. Moreover, the target entity commented by the opinions can take several forms from tangible product to intangible topic matters stated in Liu (2010). Furthermore, three types of attitudes were observed by Liu (2010), 1) positive opinions, 2) neutral opinions, and 3) negative opinions. Example of a subjective sentence: 'We Americans need to elect a president who is mature and who is able to make wise decisions.' This analysis is a classification problem. Each class's collections of words or phrase indicators are defined for to locate desirable patterns on unannotated text. For subjective expression, a different word list has been created. Lists of subjective indicators in words or phrases have been developed by multiple researchers in the linguist and natural language processing field states in Riloff et al. (2003). A dictionary of extraction rules has to be created for measuring given expressions. Over the years, in subjective detection, the features extraction progression from curating features by hand to automated features learning. At the moment, automated learning methods can further separate into supervised and unsupervised machine learning. Patterns extraction with machine learning process annotated and unannotated text have been explored extensively by academic researchers. However, researchers recognized several challenges in developing fixed sets of rules for expressions respectably. Much of the challenges in rule development stems from the nature of textual information. Six challenges have been recognized by several researchers: 1) metaphorical expressions, 2) discrepancies in writings, 3) context-sensitive, 4) represented words with fewer usages, 5) time-sensitive, and 6) ever-growing volume. Metaphorical expressions. The text contains metaphoric expression may impact on the performance on the extraction. Besides, metaphors take in different forms, which may have been contribu

    Read more →