AI App Quora

AI App Quora — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Automate This

    Automate This

    Automate This: How Algorithms Came to Rule Our World is a book written by Christopher Steiner and published by Penguin Group. == Book == Steiner begins his study of algorithms on Wall Street in the 1980s but also provides examples from other industries. For example, he explains the history of Pandora Radio and the use of algorithms in music identification. He expresses concern that such use of algorithms may lead to the homogenization of music over time. Steiner also discusses the algorithms that eLoyalty (now owned by Mattersight Corporation following divestiture of the technology) was created by dissecting 2 million speech patterns and can now identify a caller's personality style and direct the caller with a compatible customer support representative. Steiner's book shares both the warning and the opportunity that algorithms bring to just about every industry in the world, and the pros and cons of the societal impact of automation (e.g. impact on employment).

    Read more →
  • Social advertising (social relationships)

    Social advertising (social relationships)

    Social advertising is advertising that relies on social information or networks in generating, targeting, and delivering marketing communications. Many current examples of social advertising use a particular Internet service to collect social information, establish and maintain relationships with consumers, and for delivering communications. For example, the advertising platforms provided by Google, Twitter, and Facebook involve targeting and presenting ads based on relationships articulated on those same services. Social advertising can be part of a broader social media marketing strategy designed to connect with consumers. == Social targeting == Since a pair of consumers connected via a relationship are more likely to be similar than an unconnected pair, information about such relationships can be used to infer characteristics of consumers useful for targeting. For example, predictions of an individual's home location can be improved using geographic information about their peers. Existing advertising platforms can allow advertisers to explicitly target the peers (e.g., Facebook friends, Twitter followers) of consumers who have a known affiliation with their brand. Thus, one way social advertising is expected to be effective is because social networks encode information about unobserved characteristics of consumers, including their susceptibility to adopt a product and to influence their peers to adopt. Social advertisement targets audiences' demographics based on customers browsing histories. This helped companies understand users' interests and target a specific group of users. Whether it is location or personal interest, different categories of companies can make the consumers on social media rely heavily on their advertisements. This is one of the reasons why social advertising has grown over time. Targeting their audience to real life stakeholders generally increase the attention of the advertised deal which brings up more profits for companies. Subsequently, the psychological effects that social media gives off to its users play a huge role in advertisement companies keeping their customers online. One of the main reasons users rely on social media is because it's a source of entertainment that provides them with a feeling of inclusiveness. In making the customers feel the inclusiveness, social advertising targeting a specific group of users is presented as if these advertisements are customized for the users in their perspective making them feel the attention that they do not often feel in the real world. You can use Social signals checker tool to find more information about links. Social signals are metrics that measure how much people interact with your content on social media. From likes, to shares, to comments; each of these signals contributes to an overall number that tells search engines like Google how much people like your content. The more social signals your website gets, the more likely it is to rank higher in Google. The reason for this is two-fold. First, social media is used by millions of people every day, and if your content is being shared and interacted with on these sites, it shows that it’s worthy of being seen. And second, social media sites are highly trusted by Google. So if you can get your content seen and interacted with on these platforms, you’ll be off to a great start. == Social cues in advertisements == Social ads often include information about the affiliation of a peer with an advertised entity. For example, a social ad might indicate a friend has endorsed a product, highly rated a restaurant, or watched a particular film. In fact, some definitions make these personalized social signals a necessary condition for the advertising being social advertising. Inclusion of personalized social signals creates a channel for social influence. Experiments that remove peers' names or images from social advertisements provide evidence that their presence increases proximal outcomes (e.g., clicks on advertisements). This is technically how trends are started on social media. Since social media links a single profile to thousands of other accounts some being real-life friends or even acquaintances, the opinions and the bias a user has for other users who are also a customer of an advertisement on the feed can heavily affect whether to click on the advertisement or not. Once this pattern continues, the brand benefits from increased customers, profit, and attention. Social networking can spread rapidly because 71 percent of the world's population contributes and uses social media which means social advertising gives companies a better marketing technique than a physical poster advertisement. == Word of mouth == Advertisers often attempt to use word of mouth to affect consumers and their decisions to adopt products and services. Ads and other inducements targeted at a seed set of individuals can be designed to produce a larger cascade of adoption through influence. Businesses are also using social media to attempt to identify and persuade influential consumers to spread positive messages about their products or services. Consequently, not only on social platforms but also in physical settings, users start talking to each other. When individuals develop an intimate relationship with each other, it is quite heavily based on shared characteristics, interests, and personalities. If one social media user becomes a regular customer to a well-known company that advertises often, there is a higher chance that all the other people who have intimate relationships with that one customer will be exposed to the online advertisement more than another user who might be completely new to a brand that is being advertised on screen. In reality, this happens to not only one user but to most of the users which mean a single brand advertisement online can have to potential of being talked about between billions and trillions of people all around the globe. == Relationship marketing == To accurately conduct relationship marketing, businesses must develop and manage six marketplaces: internal, customer, referral, supplier, influencer and employee. To maintain relationship marketing, customers often see social media influencers getting free sponsorships or PR boxes just to advertise their products. At times, users who become customers through these social influencers will get a better deal than regular customers which stands as a very commonly used marketing technique. By doing this, users think they are receiving special treatment when in reality it very much benefits social influencers and brands. Especially for brands that are just starting, they use this marketing technique so that their names can be out there, and people will start talking, which is their initial goal.

    Read more →
  • Code (cryptography)

    Code (cryptography)

    In cryptology, a code is a method used to encrypt a message that operates at the level of meaning; that is, words or phrases are converted into something else. A code might transform "change" into "CVGDK" or "cocktail lounge". The U.S. National Security Agency defined a code as "A substitution cryptosystem in which the plaintext elements are primarily words, phrases, or sentences, and the code equivalents (called "code groups") typically consist of letters or digits (or both) in otherwise meaningless combinations of identical length." A codebook is needed to encrypt, and decrypt the phrases or words. By contrast, ciphers encrypt messages at the level of individual letters, or small groups of letters, or even, in modern ciphers, individual bits. Messages can be transformed first by a code, and then by a cipher. Such multiple encryption, or "superencryption" aims to make cryptanalysis more difficult. Another comparison between codes and ciphers is that a code typically represents a letter or groups of letters directly without the use of mathematics. As such the numbers are configured to represent these three values: 1001 = A, 1002 = B, 1003 = C, ... . The resulting message, then would be 1001 1002 1003 to communicate ABC. Ciphers, however, utilize a mathematical formula to represent letters or groups of letters. For example, A = 1, B = 2, C = 3, ... . Thus the message ABC results by multiplying each letter's value by 13. The message ABC, then would be 13 26 39. Codes have a variety of drawbacks, including susceptibility to cryptanalysis and the difficulty of managing the cumbersome codebooks, so ciphers are now the dominant technique in modern cryptography. In contrast, because codes are representational, they are not susceptible to mathematical analysis of the individual codebook elements. In the example, the message 13 26 39 can be cracked by dividing each number by 13 and then ranking them alphabetically. However, the focus of codebook cryptanalysis is the comparative frequency of the individual code elements matching the same frequency of letters within the plaintext messages using frequency analysis. In the above example, the code group, 1001, 1002, 1003, might occur more than once and that frequency might match the number of times that ABC occurs in plain text messages. (In the past, or in non-technical contexts, code and cipher are often used to refer to any form of encryption). == One- and two-part codes == Codes are defined by "codebooks" (physical or notional), which are dictionaries of codegroups listed with their corresponding plaintext. Codes originally had the codegroups assigned in 'plaintext order' for convenience of the code designed, or the encoder. For example, in a code using numeric code groups, a plaintext word starting with "a" would have a low-value group, while one starting with "z" would have a high-value group. The same codebook could be used to "encode" a plaintext message into a coded message or "codetext", and "decode" a codetext back into plaintext message. In order to make life more difficult for codebreakers, codemakers designed codes with no predictable relationship between the codegroups and the ordering of the matching plaintext. In practice, this meant that two codebooks were now required, one to find codegroups for encoding, the other to look up codegroups to find plaintext for decoding. Such "two-part" codes required more effort to develop, and twice as much effort to distribute (and discard safely when replaced), but they were harder to break. The Zimmermann Telegram in January 1917 used the German diplomatic "0075" two-part code system which contained upwards of 10,000 phrases and individual words. == One-time code == A one-time code is a prearranged word, phrase or symbol that is intended to be used only once to convey a simple message, often the signal to execute or abort some plan or confirm that it has succeeded or failed. One-time codes are often designed to be included in what would appear to be an innocent conversation. Done properly they are almost impossible to detect, though a trained analyst monitoring the communications of someone who has already aroused suspicion might be able to recognize a comment like "Aunt Bertha has gone into labor" as having an ominous meaning. Famous example of one time codes include: In the Bible, Jonathan prearranges a code with David, who is going into hiding from Jonathan's father, King Saul. If, during archery practice, Jonathan tells the servant retrieving arrows "the arrows are on this side of you," David may safely return to court; if the command is "the arrows are beyond you," David must flee. "One if by land; two if by sea" in "Paul Revere's Ride" made famous in the poem by Henry Wadsworth Longfellow "Climb Mount Niitaka" - the signal to Japanese planes to begin the attack on Pearl Harbor During World War II the British Broadcasting Corporation's overseas service frequently included "personal messages" as part of its regular broadcast schedule. The seemingly nonsensical stream of messages read out by announcers were actually one time codes intended for Special Operations Executive (SOE) agents operating behind enemy lines. An example might be "The princess wears red shoes" or "Mimi's cat is asleep under the table". Each code message was read out twice. By such means, the French Resistance were instructed to start sabotaging rail and other transport links the night before D-day. "Over all of Spain, the sky is clear" was a signal (broadcast on radio) to start the nationalist military revolt in Spain on July 17, 1936. Sometimes messages are not prearranged and rely on shared knowledge hopefully known only to the recipients. An example is the telegram sent to U.S. President Harry Truman, then at the Potsdam Conference to meet with Soviet premier Joseph Stalin, informing Truman of the first successful test of an atomic bomb. "Operated on this morning. Diagnosis not yet complete but results seem satisfactory and already exceed expectations. Local press release necessary as interest extends great distance. Dr. Groves pleased. He returns tomorrow. I will keep you posted." == Idiot code == An idiot code is a code that is created by the parties using it. This type of communication is akin to the hand signals used by armies in the field. Example: Any sentence where 'day' and 'night' are used means 'attack'. The location mentioned in the following sentence specifies the location to be attacked. Plaintext: Attack X. Codetext: We walked day and night through the streets but couldn't find it! Tomorrow we'll head into X. An early use of the term appears to be by George Perrault, a character in the science fiction book Friday by Robert A. Heinlein: The simplest sort [of code] and thereby impossible to break. The first ad told the person or persons concerned to carry out number seven or expect number seven or it said something about something designated as seven. This one says the same with respect to code item number ten. But the meaning of the numbers cannot be deduced through statistical analysis because the code can be changed long before a useful statistical universe can be reached. It's an idiot code... and an idiot code can never be broken if the user has the good sense not to go too often to the well. Terrorism expert Magnus Ranstorp said that the men who carried out the September 11 attacks on the United States used basic e-mail and what he calls "idiot code" to discuss their plans. == Cryptanalysis of codes == While solving a monoalphabetic substitution cipher is easy, solving even a simple code is difficult. Decrypting a coded message is a little like trying to translate a document written in a foreign language, with the task basically amounting to building up a "dictionary" of the codegroups and the plaintext words they represent. One fingerhold on a simple code is the fact that some words are more common than others, such as "the" or "a" in English. In telegraphic messages, the codegroup for "STOP" (i.e., end of sentence or paragraph) is usually very common. This helps define the structure of the message in terms of sentences, if not their meaning, and this is cryptanalytically useful. Further progress can be made against a code by collecting many codetexts encrypted with the same code and then using information from other sources spies newspapers diplomatic cocktail party chat the location from where a message was sent where it was being sent to (i.e., traffic analysis) the time the message was sent, events occurring before and after the message was sent the normal habits of the people sending the coded messages etc. For example, a particular codegroup found almost exclusively in messages from a particular army and nowhere else might very well indicate the commander of that army. A codegroup that appears in messages preceding an attack on a particular location may very well stand for that location. Cribs can be an immediate giveaway to the definiti

    Read more →
  • Electronic lab notebook

    Electronic lab notebook

    An electronic lab notebook or electronic laboratory notebook (ELN) is a computer program designed to replace paper laboratory notebooks. Lab notebooks in general are used by scientists, engineers, and technicians to document research, experiments, and procedures performed in a laboratory. A lab notebook is often maintained to be a legal document and may be used in a court of law as evidence. Similar to an inventor's notebook, the lab notebook is also often referred to in patent prosecution and intellectual property litigation. Electronic lab notebooks offer many benefits to the user as well as organizations; they are easier to search upon, simplify data copying and backups, and support collaboration amongst many users. ELNs can have fine-grained access controls, and can be more secure than their paper counterparts. They also allow the direct incorporation of data from instruments, replacing the practice of printing out data to be stapled into a paper notebook. == Types == ELNs can be divided into two categories: "Specific ELNs" contain features designed to work with specific applications, scientific instrumentation or data types. "Cross-disciplinary ELNs" or "Generic ELNs" are designed to support access to all data and information that needs to be recorded in a lab notebook. Lab Platforms that combine an ELN, LIMS, and scientific data management together, all-in-one configurable software environment. Solutions range from specialized programs designed from the ground up for use as an ELN, to modifications or direct use of more general programs. Examples of using more general software as an ELN include using OpenWetWare, a MediaWiki install (running the same software that Wikipedia uses), WordPress, or the use of general note taking software such as OneNote as an ELN. ELN's come in many different forms. They can be standalone programs, use a client-server model, or be entirely web-based. Some use a lab-notebook approach, others resemble a blog. ELNs are embracing artificial intelligence and LLM technology to provide scientific AI chat assistants. A good many variations on the "ELN" acronym have appeared. Differences between systems with different names are often subtle, with considerable functional overlap between them. Examples include "ERN" (Electronic Research Notebook), "ERMS" (Electronic Resource (or Research or Records) Management System (or Software) and SDMS (Scientific Data (or Document) Management System (or Software). Ultimately, these types of systems all strive to do the same thing: Capture, record, centralize and protect scientific data in a way that is highly searchable, historically accurate, and legally stringent, and which also promotes secure collaboration, greater efficiency, reduced mistakes and lowered total research costs. == Objectives == A good electronic laboratory notebook should offer a secure environment to protect the integrity of both data and process, whilst also affording the flexibility to adopt new processes or changes to existing processes without recourse to further software development. The package architecture should be a modular design, so as to offer the benefit of minimizing validation costs of any subsequent changes that you may wish to make in the future as your needs change. A good electronic laboratory notebook should be an "out of the box" solution that, as standard, has fully configurable forms to comply with the requirements of regulated analytical groups through to a sophisticated ELN for inclusion of structures, spectra, chromatograms, pictures, text, etc. where a preconfigured form is less appropriate. All data within the system may be stored in a database (e.g. MySQL, MS-SQL, Oracle) and be fully searchable. The system should enable data to be collected, stored and retrieved through any combination of forms or ELN that best meets the requirements of the user. The application should enable secure forms to be generated that accept laboratory data input via PCs and/or laptops / palmtops, and should be directly linked to electronic devices such as laboratory balances, pH meters, etc. Networked or wireless communications should be accommodated for by the package which will allow data to be interrogated, tabulated, checked, approved, stored and archived to comply with the latest regulatory guidance and legislation. A system should also include a scheduling option for routine procedures such as equipment qualification and study related timelines. It should include configurable qualification requirements to automatically verify that instruments have been cleaned and calibrated within a specified time period, that reagents have been quality-checked and have not expired, and that workers are trained and authorized to use the equipment and perform the procedures. == Regulatory and legal aspects == The laboratory accreditation criteria found in the ISO 17025 standard needs to be considered for the protection and computer backup of electronic records. These criteria can be found specifically in clause 4.13.1.4 of the standard. Electronic lab notebooks used for development or research in regulated industries, such as medical devices or pharmaceuticals, are expected to comply with FDA regulations related to software validation. The purpose of the regulations is to ensure the integrity of the entries in terms of time, authorship, and content. Unlike ELNs for patent protection, FDA is not concerned with patent interference proceedings, but is concerned with avoidance of falsification. Typical provisions related to software validation are included in the medical device regulations at 21 CFR 820 (et seq.) and Title 21 CFR Part 11. Essentially, the requirements are that the software has been designed and implemented to be suitable for its intended purposes. Evidence to show that this is the case is often provided by a Software Requirements Specification (SRS) setting forth the intended uses and the needs that the ELN will meet; one or more testing protocols that, when followed, demonstrate that the ELN meets the requirements of the specification and that the requirements are satisfied under worst-case conditions. Security, audit trails, prevention of unauthorized changes without substantial collusion of otherwise independent personnel (i.e., those having no interest in the content of the ELN such as independent quality unit personnel) and similar tests are fundamental. Finally, one or more reports demonstrating the results of the testing in accordance with the predefined protocols are required prior to release of the ELN software for use. If the reports show that the software failed to satisfy any of the SRS requirements, then corrective and preventive action ("CAPA") must be undertaken and documented. Such CAPA may extend to minor software revisions, or changes in architecture or major revisions. CAPA activities need to be documented as well. Aside from the requirements to follow such steps for regulated industry, such an approach is generally a good practice in terms of development and release of any software to assure its quality and fitness for use. There are standards related to software development and testing that can be applied (see ref.).

    Read more →
  • Dating app

    Dating app

    An online dating application, commonly known as a dating app, is an online dating service presented through a mobile phone application. These apps often take advantage of a smartphone's GPS location capabilities, always on-hand presence, and access to mobile wallets. These apps aim to speed up the online dating process of sifting through potential dating partners, chatting, flirting, and potentially meeting or becoming romantically involved. Online dating apps are now mainstream in the United States. As of 2017, online dating (which included both apps and other online dating services) was the principal method by which new couples in the U.S. met. The percentage of couples meeting online is predicted to increase to 70% by 2040. == Origins == The first computerized dating service was launched in 1964, the St. James Computer Dating Service, which became known as Com-Pat. The first U.S. dating service that used computerized match making was Operation Match. It required men and women to complete a questionnaire and was launched in 1965. Operation Match inspired the methodology of Dateline, which became popular in the 1970s and 1980s. Match.com was launched in 1995 and turned computerized match making into a profitable business. Grindr targeted gay and bisexual men at launch. Tinder, launched in 2012, led to a growth of online dating applications by both new providers and existing online dating services that expanded into the mobile app market. == Usage by demographic group == Online dating applications typically target a younger demographic group, though some apps, like Senior Match and Silver Singles are geared toward the 50 and up demographic. In 2016, almost 50% of people knew of someone who use the services or had met their loved one through the service. After the iPhone launch in 2007, online dating data has mushroomed as application usage increased. In 2005, only 10% of 18-24 year olds reported to have used online dating services; this number quickly grew to over 27%, making this target demographic the largest number of users for most applications. When Pew Research Center conducted a study in 2016, they found that 59% of U.S. adults agreed that online dating is a good way to meet people compared to 44% in 2005. This explosion in usage can be explained by the increased use of smartphones. By the end of 2022, it is expected there will be 413 million active users of online dating services worldwide. A 2023 Pew Research Center survey of 6,034 American adults found that 30% had ever used an online dating site or app, including 53% of those aged 18 to 29, 37% of those aged 30 to 49, and 17% of those aged 50 and over. Lesbian, gay and bisexual respondents reported using dating apps at nearly twice the rate of straight respondents (51% versus 28%), and 36% of divorced, separated or widowed adults had used one, compared with 16% of married adults. The increased use of smartphones by those 65 and older has also driven that population to the use dating apps. The Pew Research Center found that usage increase by 8 points since last surveyed in 2012. A study in 2021 found that more than one-third of seniors have dated in the past 5 years, and roughly one-third of those dating seniors have turned to dating apps. During the COVID-19 pandemic, Morning Consult found that more Americans were using online dating apps than ever before. In one survey in April 2020, the company discovered that 53% of U.S. adults who use online dating apps have been using them more during the pandemic. As of February 2021, that share increased to 71 percent. Research using Hofstede's cultural dimensions theory has indicated that norms about online dating applications tend to differ across cultures. A study published in the Journal of Creative Communications looked into the relationships between dating-app advertisements from over 51 countries and the cultural dimensions of these countries. The results revealed that dating-app advertisements appealed to multiple cultural needs, including the needs for relationships, friendship, entertainment, sex, status, design and identity. The use of these appeals was found to be 'congruent with ... the individualism/collectivism and uncertainty avoidance cultural dimensions.' == Popular applications == Following Tinder's success, other companies released dating applications. Raya was released in 2015 as a membership-based dating app, allowing entrance only through referrals, which was marketed as a dating app for celebrities. In early 2026, Hily surpassed Bumble to become the third most-used dating application in the United States and the fifth highest-grossing overall, driven largely by growing popularity among Generation Z users, while remaining behind Tinder and Hinge, both owned by Match Group. A number of dating apps have been created targeting adherents of particular religions seeking partners of the same religion, such as Muzmatch for Muslims, Christian Mingle, SALT, and Christian Connection for Christians, and JSwipe and JDate for Jews. === VR Dating === VR Dating is an application of Social VR where people can exist, collaborate, and perform various activities together. Virtual reality apps use virtual and augmented realities to make the dating experience more lifelike and more effective, as well as allow people to expand what is already possible in the world of online dating. There are several online platforms of VR Dating. The VR dating app Nevermet is the VR equivalent of Tinder, where people can search and find on dates. However, instead of actual real-life pictures, users will update pictures of virtual selves and will be interacting with avatars rather than real faces. Flirtual is a self-contained social VR app that serves to match users who then decide where and how to meet in VR. Flirtual hosts speed dating and social events in VR. == Effects on dating == The usage of online dating applications can have both advantages and disadvantages: === Advantages === Many of the applications provide personality tests for matching or use algorithms to match users. These factors enhance the possibility of users getting matched with a compatible candidate. Users are in control; they are provided with many options so there are enough matches that fit their particular type. Users can simply choose to not match the candidates that they know they are not interested in. Narrowing down options is easy. Once users think they are interested, they are able to chat and get to know the potential candidate. This form of communication can reduce the time, cost, and uncertainty often associated with traditional dating methods. Online dating offers convenience; people want dating to work around their schedules. Online dating can also increase self-confidence; even if users get rejected, they know there are hundreds of other candidates that will want to match with them so they can simply move on to the next option. In fact, 60% of U.S. adults agree that online dating is a good way to meet people and 66% say they have gone on a real date with someone they met through an application. Today, 5% of married Americans or Americans in serious relationships said they met their significant other online. The 39% of online dating users (representing 12% of all U.S. adults) say they have been in a committed relationship or married someone they met on a dating site or app. ==== Rejection sensitive individuals ==== Individuals high in rejection sensitivity are more likely to use online dating applications. As they tend to expect, perceive and overreact to rejection, rejection sensitive individuals struggle with traditional dating. Online dating applications allow for them to better reveal their true selves, potentially increasing their dating success. Online dating applications also obscure rejection cues, alleviating the rejection-related distress experienced by rejection sensitive individuals. ==== Anxiously attached individuals ==== Individuals high in attachment anxiety are also more likely to use online dating applications. While they harbour negative self-views, anxiously attached individuals are also more eager to enter and commit to relationships. Online dating applications allow for them to present "an authentic yet ideal version of themselves", mitigating their tendencies to view themselves as undesirable. This increases their romantic confidence, and potentially alleviates their anxiety when searching for a romantic partner. === Disadvantages === Sometimes having too many options can be overwhelming. With so many options available, users can get lost in their choices and end up spending too much time looking for the "perfect" candidate instead of using that time to start a real relationship. In addition, the algorithms and matching systems put in place may not always be as accurate as users think. There is no perfect system that can match two people's personalities perfectly every time. Communication online also lacks the physical chemistry aspec

    Read more →
  • Trusted Computing

    Trusted Computing

    Trusted Computing (TC) is a technology developed and promoted by the Trusted Computing Group. The term is taken from the field of trusted systems and has a specialized meaning that is distinct from the field of confidential computing. With Trusted Computing, the computer will consistently behave in expected ways, and those behaviors will be enforced by computer hardware and software. Enforcing this behavior is achieved by loading the hardware with a unique encryption key that is inaccessible to the rest of the system and the owner. TC is controversial as the hardware is not only secured for its owner, but also against its owner, leading opponents of the technology like free software activist Richard Stallman to deride it as "treacherous computing", and certain scholarly articles to use scare quotes when referring to the technology. Trusted Computing proponents such as International Data Corporation, the Enterprise Strategy Group and Endpoint Technologies Associates state that the technology will make computers safer, less prone to viruses and malware, and thus more reliable from an end-user perspective. They also state that Trusted Computing will allow computers and servers to offer improved computer security over that which is currently available. Opponents often state that this technology will be used primarily to enforce digital rights management policies (imposed restrictions to the owner) and not to increase computer security. Chip manufacturers Intel and AMD, hardware manufacturers such as HP and Dell, and operating system providers such as Microsoft include Trusted Computing in their products if enabled. The U.S. Army requires that every new PC it purchases comes with a Trusted Platform Module (TPM). As of July 3, 2007, so does virtually the entire United States Department of Defense. == Key concepts == Trusted Computing encompasses six key technology concepts, of which all are required for a fully Trusted system, that is, a system compliant to the TCG specifications: Endorsement key Secure input and output Memory curtaining / protected execution Sealed storage Remote attestation Trusted Third Party (TTP) === Endorsement key === The endorsement key is a 2048-bit RSA public and private key pair that is created randomly on the chip at manufacture time and cannot be changed. The private key never leaves the chip, while the public key is used for attestation and for encryption of sensitive data sent to the chip, as occurs during the TPM_TakeOwnership command. This key is used to allow the execution of secure transactions: every Trusted Platform Module (TPM) is required to be able to sign a random number (in order to allow the owner to show that he has a genuine trusted computer), using a particular protocol created by the Trusted Computing Group (the direct anonymous attestation protocol) in order to ensure its compliance of the TCG standard and to prove its identity; this makes it impossible for a software TPM emulator with an untrusted endorsement key (for example, a self-generated one) to start a secure transaction with a trusted entity. The TPM should be designed to make the extraction of this key by hardware analysis hard, but tamper resistance is not a strong requirement. === Memory curtaining === Memory curtaining extends common memory protection techniques to provide full isolation of sensitive areas of memory—for example, locations containing cryptographic keys. Even the operating system does not have full access to curtained memory. The exact implementation details are vendor specific. === Sealed storage === Sealed storage protects private information by binding it to platform configuration information including the software and hardware being used. This means the data can be released only to a particular combination of software and hardware. Sealed storage can be used for DRM enforcing. For example, users who keep a song on their computer that has not been licensed to be listened will not be able to play it. Currently, a user can locate the song, listen to it, and send it to someone else, play it in the software of their choice, or back it up (and in some cases, use circumvention software to decrypt it). Alternatively, the user may use software to modify the operating system's DRM routines to have it leak the song data once, say, a temporary license was acquired. Using sealed storage, the song is securely encrypted using a key bound to the trusted platform module so that only the unmodified and untampered music player on his or her computer can play it. In this DRM architecture, this might also prevent people from listening to the song after buying a new computer, or upgrading parts of their current one, except after explicit permission of the vendor of the song. === Remote attestation === Remote attestation allows changes to the user's computer to be detected by authorized parties. For example, software companies can identify unauthorized changes to software, including users modifying their software to circumvent commercial digital rights restrictions. It works by having the hardware generate a certificate stating what software is currently running. The computer can then present this certificate to a remote party to show that unaltered software is currently executing. Numerous remote attestation schemes have been proposed for various computer architectures, including Intel, RISC-V, and ARM. Remote attestation is usually combined with public-key encryption so that the information sent can only be read by the programs that requested the attestation, and not by an eavesdropper. To take the song example again, the user's music player software could send the song to other machines, but only if they could attest that they were running an authorized copy of the music player software. Combined with the other technologies, this provides a more restricted path for the music: encrypted I/O prevents the user from recording it as it is transmitted to the audio subsystem, memory locking prevents it from being dumped to regular disk files as it is being worked on, sealed storage curtails unauthorized access to it when saved to the hard drive, and remote attestation prevents unauthorized software from accessing the song even when it is used on other computers. To preserve the privacy of attestation responders, Direct Anonymous Attestation has been proposed as a solution, which uses a group signature scheme to prevent revealing the identity of individual signers. Proof of space (PoS) have been proposed to be used for malware detection, by determining whether the L1 cache of a processor is empty (e.g., has enough space to evaluate the PoSpace routine without cache misses) or contains a routine that resisted being evicted. === Trusted third party === == Known applications == The Microsoft products Windows Vista, Windows 7, Windows 8 and Windows RT make use of a Trusted Platform Module to facilitate BitLocker Drive Encryption. Other known applications with runtime encryption and the use of secure enclaves include the Signal messenger and the e-prescription service ("E-Rezept") by the German government. == Possible applications == === Digital rights management === Trusted Computing would allow companies to create a digital rights management (DRM) system which would be very hard to circumvent, though not impossible. An example is downloading a music file. Sealed storage could be used to prevent the user from opening the file with an unauthorized player or computer. Remote attestation could be used to authorize play only by music players that enforce the record company's rules. The music would be played from curtained memory, which would prevent the user from making an unrestricted copy of the file while it is playing, and secure I/O would prevent capturing what is being sent to the sound system. Circumventing such a system would require either manipulation of the computer's hardware, capturing the analogue (and thus degraded) signal using a recording device or a microphone, or breaking the security of the system. New business models for use of software (services) over Internet may be boosted by the technology. By strengthening the DRM system, one could base a business model on renting programs for a specific time periods or "pay as you go" models. For instance, one could download a music file which could only be played a certain number of times before it becomes unusable, or the music file could be used only within a certain time period. === Preventing cheating in online games === Trusted Computing could be used to combat cheating in online games. Some players modify their game copy in order to gain unfair advantages in the game; remote attestation, secure I/O and memory curtaining could be used to determine that all players connected to a server were running an unmodified copy of the software. === Verification of remote computation for grid computing === Trusted Computing could be used to guarantee participants in a grid computing sys

    Read more →
  • Social network hosting service

    Social network hosting service

    A social network hosting service is a web hosting service that specifically hosts the user creation of web-based social networking services, alongside related applications. Such services are also known as vertical social networks due to the creation of SNSes which cater to specific user interests and niches; like larger, interest-agnostic SNSes, such niche networking services may also possess the ability to create increasingly niche groups of users. == List of social network hosting services == Federated Media Publishing's BigTent BroadVision Clearvale Ning Wall.fm

    Read more →
  • MDS matrix

    MDS matrix

    An MDS matrix (maximum distance separable) is a matrix representing a function with certain diffusion properties that have useful applications in cryptography. Technically, an m × n {\displaystyle m\times n} matrix A {\displaystyle A} over a finite field K {\displaystyle K} is an MDS matrix if it is the transformation matrix of a linear transformation f ( x ) = A x {\displaystyle f(x)=Ax} from K n {\displaystyle K^{n}} to K m {\displaystyle K^{m}} such that no two different ( m + n ) {\displaystyle (m+n)} -tuples of the form ( x , f ( x ) ) {\displaystyle (x,f(x))} coincide in n {\displaystyle n} or more components. Equivalently, the set of all ( m + n ) {\displaystyle (m+n)} -tuples ( x , f ( x ) ) {\displaystyle (x,f(x))} is an MDS code, i.e., a linear code that reaches the Singleton bound. Let A ~ = ( I n A ) {\displaystyle {\tilde {A}}={\begin{pmatrix}\mathrm {I} _{n}\\\hline \mathrm {A} \end{pmatrix}}} be the matrix obtained by joining the identity matrix I n {\displaystyle \mathrm {I} _{n}} to A {\displaystyle A} . Then a necessary and sufficient condition for a matrix A {\displaystyle A} to be MDS is that every possible n × n {\displaystyle n\times n} submatrix obtained by removing m {\displaystyle m} rows from A ~ {\displaystyle {\tilde {A}}} is non-singular. This is also equivalent to the following: all the sub-determinants of the matrix A {\displaystyle A} are non-zero. Then a binary matrix A {\displaystyle A} (namely over the field with two elements) is never MDS unless it has only one row or only one column with all components 1 {\displaystyle 1} . Reed–Solomon codes have the MDS property and are frequently used to obtain the MDS matrices used in cryptographic algorithms. Serge Vaudenay suggested using MDS matrices in cryptographic primitives to produce what he called multipermutations, not-necessarily linear functions with this same property. These functions have what he called perfect diffusion: changing t {\displaystyle t} of the inputs changes at least m − t + 1 {\displaystyle m-t+1} of the outputs. He showed how to exploit imperfect diffusion to cryptanalyze functions that are not multipermutations. MDS matrices are used for diffusion in such block ciphers as AES, SHARK, Square, Twofish, Anubis, KHAZAD, Manta, Hierocrypt, Kalyna, Camellia and HADESMiMC, and in the stream cipher MUGI and the cryptographic hash function Whirlpool, Poseidon.

    Read more →
  • AIXI

    AIXI

    AIXI is a theoretical mathematical formalism for artificial general intelligence. It combines Solomonoff induction with sequential decision theory. AIXI was first proposed by Marcus Hutter in 2000 and several results regarding AIXI are proved in Hutter's 2005 book Universal Artificial Intelligence. AIXI is a reinforcement learning (RL) agent. It maximizes the expected total rewards received from the environment. Intuitively, it simultaneously considers every computable hypothesis (or environment). In each time step, it looks at every possible program and evaluates how many rewards that program generates depending on the next action taken. The promised rewards are then weighted by the subjective belief that this program constitutes the true environment. This belief is computed from the length of the program: longer programs are considered less likely, in line with Occam's razor. AIXI then selects the action that has the highest expected total reward in the weighted sum of all these programs. == Etymology == According to Hutter, the word "AIXI" can have several interpretations. AIXI can stand for AI based on Solomonoff's distribution, denoted by ξ {\displaystyle \xi } (which is the Greek letter xi), or e.g. it can stand for AI "crossed" (X) with induction (I). There are other interpretations. == Definition == AIXI is a reinforcement learning agent that interacts with some stochastic and unknown but computable environment μ {\displaystyle \mu } . The interaction proceeds in time steps, from t = 1 {\displaystyle t=1} to t = m {\displaystyle t=m} , where m ∈ N {\displaystyle m\in \mathbb {N} } is the lifespan of the AIXI agent. At time step t, the agent chooses an action a t ∈ A {\displaystyle a_{t}\in {\mathcal {A}}} (e.g. a limb movement) and executes it in the environment, and the environment responds with a "percept" e t ∈ E = O × R {\displaystyle e_{t}\in {\mathcal {E}}={\mathcal {O}}\times \mathbb {R} } , which consists of an "observation" o t ∈ O {\displaystyle o_{t}\in {\mathcal {O}}} (e.g., a camera image) and a reward r t ∈ R {\displaystyle r_{t}\in \mathbb {R} } , distributed according to the conditional probability μ ( o t r t | a 1 o 1 r 1 . . . a t − 1 o t − 1 r t − 1 a t ) {\displaystyle \mu (o_{t}r_{t}|a_{1}o_{1}r_{1}...a_{t-1}o_{t-1}r_{t-1}a_{t})} , where a 1 o 1 r 1 . . . a t − 1 o t − 1 r t − 1 a t {\displaystyle a_{1}o_{1}r_{1}...a_{t-1}o_{t-1}r_{t-1}a_{t}} is the "history" of actions, observations and rewards. The environment μ {\displaystyle \mu } is thus mathematically represented as a probability distribution over "percepts" (observations and rewards) which depend on the full history, so there is no Markov assumption (as opposed to other RL algorithms). Note again that this probability distribution is unknown to the AIXI agent. Furthermore, note again that μ {\displaystyle \mu } is computable, that is, the observations and rewards received by the agent from the environment μ {\displaystyle \mu } can be computed by some program (which runs on a Turing machine), given the past actions of the AIXI agent. The only goal of the AIXI agent is to maximize ∑ t = 1 m r t {\displaystyle \sum _{t=1}^{m}r_{t}} , that is, the sum of rewards from time step 1 to m. The AIXI agent is associated with a stochastic policy π : ( A × E ) ∗ → A {\displaystyle \pi :({\mathcal {A}}\times {\mathcal {E}})^{}\rightarrow {\mathcal {A}}} , which is the function it uses to choose actions at every time step, where A {\displaystyle {\mathcal {A}}} is the space of all possible actions that AIXI can take and E {\displaystyle {\mathcal {E}}} is the space of all possible "percepts" that can be produced by the environment. The environment (or probability distribution) μ {\displaystyle \mu } can also be thought of as a stochastic policy (which is a function): μ : ( A × E ) ∗ × A → E {\displaystyle \mu :({\mathcal {A}}\times {\mathcal {E}})^{}\times {\mathcal {A}}\rightarrow {\mathcal {E}}} , where the ∗ {\displaystyle } is the Kleene star operation. In general, at time step t {\displaystyle t} (which ranges from 1 to m), AIXI, having previously executed actions a 1 … a t − 1 {\displaystyle a_{1}\dots a_{t-1}} (which is often abbreviated in the literature as a < t {\displaystyle a_{ Read more →

  • Undeniable signature

    Undeniable signature

    An undeniable signature is a digital signature scheme which allows the signer to be selective to whom they allow to verify signatures. The scheme adds explicit signature repudiation, preventing a signer later refusing to verify a signature by omission; a situation that would devalue the signature in the eyes of the verifier. It was invented by David Chaum and Hans van Antwerpen in 1989. == Overview == In this scheme, a signer possessing a private key can publish a signature of a message. However, the signature reveals nothing to a recipient/verifier of the message and signature without taking part in either of two interactive protocols: Confirmation protocol, which confirms that a candidate is a valid signature of the message issued by the signer, identified by the public key. Disavowal protocol, which confirms that a candidate is not a valid signature of the message issued by the signer. The motivation for the scheme is to allow the signer to choose to whom signatures are verified. However, that the signer might claim the signature is invalid at any later point, by refusing to take part in verification, would devalue signatures to verifiers. The disavowal protocol distinguishes these cases removing the signer's plausible deniability. It is important that the confirmation and disavowal exchanges are not transferable. They achieve this by having the property of zero-knowledge; both parties can create transcripts of both confirmation and disavowal that are indistinguishable, to a third-party, of correct exchanges. The designated verifier signature scheme improves upon deniable signatures by allowing, for each signature, the interactive portion of the scheme to be offloaded onto another party, a designated verifier, reducing the burden on the signer. == Zero-knowledge protocol == The following protocol was suggested by David Chaum. A group, G, is chosen in which the discrete logarithm problem is intractable, and all operation in the scheme take place in this group. Commonly, this will be the finite cyclic group of order p contained in Z/nZ, with p being a large prime number; this group is equipped with the group operation of integer multiplication modulo n. An arbitrary primitive element (or generator), g, of G is chosen; computed powers of g then combine obeying fixed axioms. Alice generates a key pair, randomly chooses a private key, x, and then derives and publishes the public key, y = gx. === Message signing === Alice signs the message, m, by computing and publishing the signature, z = mx. === Confirmation (i.e., avowal) protocol === Bob wishes to verify the signature, z, of m by Alice under the key, y. Bob picks two random numbers: a and b, and uses them to blind the message, sending to Alice: c = magb. Alice picks a random number, q, uses it to blind, c, and then signing this using her private key, x, sending to Bob: s1 = cgq ands2 = s1x. Note that s1x = (cgq)x = (magb)xgqx = (mx)a(gx)b+q = zayb+q. Bob reveals a and b. Alice verifies that a and b are the correct blind values, then, if so, reveals q. Revealing these blinds makes the exchange zero knowledge. Bob verifies s1 = cgq, proving q has not been chosen dishonestly, and s2 = zayb+q, proving z is valid signature issued by Alice's key. Note that zayb+q = (mx)a(gx)b+q. Alice can cheat at step 2 by attempting to randomly guess s2. === Disavowal protocol === Alice wishes to convince Bob that z is not a valid signature of m under the key, gx; i.e., z ≠ mx. Alice and Bob have agreed an integer, k, which sets the computational burden on Alice and the likelihood that she should succeed by chance. Bob picks random values, s ∈ {0, 1, ..., k} and a, and sends: v1 = msga and v2 = zsya, where exponentiating by a is used to blind the sent values. Note that v2 = zsya = (mx)s(gx)a = v1x. Alice, using her private key, computes v1x and then the quotient, v1xv2−1 = (msga)x(zsgxa)−1 = msxz−s = (mxz−1)s. Thus, v1xv2−1 = 1, unless z ≠ mx. Alice then tests v1xv2−1 for equality against the values: (mxz−1)i for i ∈ {0, 1, …, k}; which are calculated by repeated multiplication of mxz−1 (rather than exponentiating for each i). If the test succeeds, Alice conjectures the relevant i to be s; otherwise, she conjectures random value. Where z = mx, (mxz−1)i = v1xv2−1 = 1 for all i, s is unrecoverable. Alice commits to i: she picks a random r and sends hash(r, i) to Bob. Bob reveals a. Alice confirms that a is the correct blind (i.e., v1 and v2 can be generated using it), then, if so, reveals r. Revealing these blinds makes the exchange zero knowledge. Bob checks hash(r, i) = hash(r, s), proving Alice knows s, hence z ≠ mx. If Alice attempts to cheat at step 3 by guessing s at random, the probability of succeeding is 1/(k + 1). So, if k = 1023 and the protocol is conducted ten times, her chances are 1 to 2100.

    Read more →
  • Brain Imaging Data Structure

    Brain Imaging Data Structure

    The Brain Imaging Data Structure (BIDS) is a standard for organizing, annotating, and describing data collected during neuroimaging experiments. It is based on a formalized file and directory structure and metadata files (based on JSON and TSV) with controlled vocabulary. This standard has been adopted by a multitude of labs around the world as well as databases such as OpenNeuro, SchizConnect, Developing Human Connectome Project, and FCP-INDI, and is seeing uptake in an increasing number of studies. While originally specified for MRI data, BIDS has been extended to several other imaging modalities such as MEG, EEG, and intracranial EEG (see also BIDS Extension Proposals). == History == The project is a community-driven effort. BIDS, originally OBIDS (Open Brain Imaging Data Structure), was initiated during an INCF sponsored data sharing working group meeting (January 2015) at Stanford University. It was subsequently spearheaded and maintained by Chris Gorgolewski. Since October 2019, the project is headed by a Steering Group and maintained by a separate team of maintainers, the Maintainers Group, according to a governance document that was approved of by the BIDS community in a vote. BIDS has advanced under the direction and effort of contributors, the community of researchers that appreciate the value of standardizing neuroimaging data to facilitate sharing and analysis. == BIDS Extension Proposals == BIDS can be extended in a backwards compatible way and is evolving over time. This is accomplished through BIDS Extension Proposals (BEPs), which are community-driven processes following agreed-upon guidelines. A full list of finalized BEPs and BEPs in progress can be found on the BIDS website

    Read more →
  • Branch number

    Branch number

    In cryptography, the branch number is a numerical value that characterizes the amount of diffusion introduced by a vectorial Boolean function F that maps an input vector a to output vector F ( a ) {\displaystyle F(a)} . For the (usual) case of a linear F the value of the differential branch number is produced by: applying nonzero values of a (i.e., values that have at least one non-zero component of the vector) to the input of F; calculating for each input value a the Hamming weight W {\displaystyle W} (number of nonzero components), and adding weights W ( a ) {\displaystyle W(a)} and W ( F ( a ) ) {\displaystyle W(F(a))} together; selecting the smallest combined weight across for all nonzero input values: B d ( F ) = min a ≠ 0 ( W ( a ) + W ( F ( a ) ) ) {\displaystyle B_{d}(F)={\underset {a\neq 0}{\min }}(W(a)+W(F(a)))} . If both a and F ( a ) {\displaystyle F(a)} have s components, the result is obviously limited on the high side by the value s + 1 {\displaystyle s+1} (this "perfect" result is achieved when any single nonzero component in a makes all components of F ( a ) {\displaystyle F(a)} to be non-zero). A high branch number suggests higher resistance to the differential cryptanalysis: the small variations of input will produce large changes on the output and in order to obtain small variations of the output, large changes of the input value will be required. The term was introduced by Daemen and Rijmen in early 2000s and quickly became a typical tool to assess the diffusion properties of the transformations. == Mathematics == The branch number concept is not limited to the linear transformations, Daemen and Rijmen provided two general metrics: differential branch number, where the minimum is obtained over inputs of F that are constructed by independently sweeping all the values of two nonzero and unequal vectors a, b ( ⊕ {\displaystyle \oplus } is a component-by-component exclusive-or): B d ( F ) = min a ≠ b ( W ( a ⊕ b ) + W ( F ( a ) ⊕ F ( b ) ) {\displaystyle B_{d}(F)={\underset {a\neq b}{\min }}(W(a\oplus b)+W(F(a)\oplus F(b))} ; for linear branch number, the independent candidates α {\displaystyle \alpha } and β {\displaystyle \beta } are independently swept; they should be nonzero and correlated with respect to F (the L A T ( α , β ) {\displaystyle LAT(\alpha ,\beta )} coefficient of the linear approximation table of F should be nonzero): B l ( F ) = min α ≠ 0 , β , L A T ( α , β ) ≠ 0 ( W ( α ) + W ( β ) ) {\displaystyle B_{l}(F)={\underset {\alpha \neq 0,\beta ,LAT(\alpha ,\beta )\neq 0}{\min }}(W(\alpha )+W(\beta ))} .

    Read more →
  • Score bug

    Score bug

    A score bug is a digital on-screen graphic which is displayed in a broadcast of a sporting event, displaying the current score and other statistics. It is similar in function to a scoreboard, and is usually placed at either the top or lower third of the television screen. == History == The concept of a persistent score bug was devised by Sky Sports head David Hill, who was dissatisfied over having to wait to see what the score was after tuning into a football match in-progress. The score bug was introduced when Sky launched its coverage of the then newly-formed English Premier League in August 1992. Hill's boss repeatedly demanded that the graphic be removed, describing it as the "stupidest thing [he] had ever seen". Hill defied the boss's demands and kept the graphic in place. ITV introduced a score bug at the start of the 1993–94 football season, and the BBC introduced a score bug towards the end of 1993. The concept was introduced to the United States by ABC Sports and ESPN during coverage of the 1994 FIFA World Cup. Their justification for the graphic was to provide a location for a rotating series of sponsor logos, in order to allow matches to air without commercial interruption. With the acquisition of rights to the National Football League (NFL) by BSkyB's American sibling Fox (a fellow venture of Rupert Murdoch), Hill became the first president of Fox Sports. Under Hill's leadership, Fox introduced a version of the score bug branded as the "Fox Box", which was part of its inaugural season of NFL coverage in 1994. Variety criticized it as an "annoying see-through clock and score graphic" and expressed concern for people "who actually watched the beginning of the game and would rather have their screen clear of graphics". Hill even received a death threat from an irate viewer, with a specific emphasis on him being a "foreigner", but the score bug soon became a ubiquitous feature for American football broadcasts, along with almost all American sports broadcasts in the years that followed. Dick Ebersol of NBC Sports initially opposed the idea of a score bug, as he thought that fans would dislike seeing more graphics on the screen and would change the channel from blowout games if the score was constantly being displayed. Since the 2010s, the on-air design and positioning of some score bugs have been influenced by the needs of Internet video (especially when viewing an event on devices with smaller screens), including bugs noticeably larger than prior iterations designed with television viewing in mind, or designs primarily kept towards the bottom-center of the screen (easing the ability for the bug to remain visible when highlights are cropped for square videos posted on social media). == Details == Score bugs used in team sports typically include the names of both teams, an abbreviation of the team's name, and/or the team's logo; for individual sports, they include the names of individual competitors. In sports where a game clock or playing periods are used, those are generally also displayed as part of the score bug. Some broadcasts also include teams' win-loss records. In 2024, ESPN experimented with adding a persistent win probability meter to its bug in Major League Baseball, which was based on input from its statisticians. === Variations === In addition to the above information, score bugs in some sports include additional information: In baseball, score bugs display the current inning, number of outs, the pitch clock if applicable, and a graphic displaying which bases are occupied; and usually include names of the current pitcher and batter, the pitcher's pitch count, and the number of balls and strikes accrued by the batter. In basketball, score bugs generally include the shot clock, the number of fouls accrued by each team, and whether a team is in the bonus. In cricket, score bugs often take the form of larger dashboards across the bottom of the screen, displaying the current team up and their number of runs, wickets, and overs, a display showing the runs scored and number of balls faced by the current batting partnership, and statistics for the opposing team's bowler (including the number of wickets scored and runs given up). In American football, score bugs usually include the play clock and the down and distance of the current play; they also incorporate graphics indicating when a penalty flag has been thrown. In ice hockey, score bugs display when a penalty or power play is in effect, and often include the number of shots on goal accrued by each team. In golf, Fox popularized the display of a persistent leaderboard graphic in the bottom-right of the screen, usually displaying the top 5. ==== Racing ==== Telecasts of automobile races often include a score bug with the current positions of participants, statistics such as distance behind the leader, and the remaining distance or number of laps. In the mid-2010s, NASCAR broadcasters such as Fox began to transition from horizontal tickers to vertical leaderboards (also referred to as "pylons", in reference to the physical scoring pylons at). The CW differentiated itself by using a horizontal display that divides the field into multiple columns along the bottom of the screen.

    Read more →
  • Letter frequency

    Letter frequency

    Letter frequency is the number of times letters of the alphabet appear on average in written language. Letter frequency analysis dates back to the Arab mathematician Al-Kindi (c. AD 801–873), who formally developed the method to break ciphers. Letter frequency analysis gained importance in Europe with the development of movable type in AD 1450, wherein one must estimate the amount of type required for each letterform. Linguists use letter frequency analysis as a rudimentary technique for language identification, where it is particularly effective as an indication of whether an unknown writing system is alphabetic, syllabic, or logographic. The use of letter frequencies and frequency analysis plays a fundamental role in cryptograms and several word puzzle games, including hangman, Scrabble, Wordle and the television game show Wheel of Fortune. One of the earliest descriptions in classical literature of applying the knowledge of English letter frequency to solving a cryptogram is found in Edgar Allan Poe's famous story "The Gold-Bug", where the method is successfully applied to decipher a message giving the location of a treasure hidden by Captain Kidd. Herbert S. Zim, in his classic introductory cryptography text Codes and Secret Writing, gives the English letter frequency sequence as "ETAON RISHD LFCMU GYPWB VKJXZQ", the most common letter pairs as "TH HE AN RE ER IN ON AT ND ST ES EN OF TE ED OR TI HI AS TO", and the most common doubled letters as "LL EE SS OO TT FF RR NN PP CC". Different ways of counting can produce somewhat different orders. Letter frequencies also have a strong effect on the design of some keyboard layouts. The most frequent letters are placed on the home row of the Blickensderfer typewriter, the Dvorak keyboard layout, Colemak and other optimized layouts, while the commonly used QWERTY layout places common letters apart from each other to prevent typewriter jamming. == Background == The frequency of letters in text has been studied for use in cryptanalysis, and frequency analysis in particular, dating back to the Arab mathematician al-Kindi (c. AD 801–873 ), who formally developed the method (the ciphers breakable by this technique go back at least to the Caesar cipher used by Julius Caesar, so this method could have been explored in classical times). Letter frequency analysis gained additional importance in Europe with the development of movable type in AD 1450, wherein one must estimate the amount of type required for each letterform, as evidenced by the variations in letter compartment size in typographer's type cases. No exact letter frequency distribution underlies a given language, since all writers write slightly differently. However, most languages have a characteristic distribution which is strongly apparent in longer texts. Even language changes as extreme as from Old English to modern English (regarded as mutually unintelligible) show strong trends in related letter frequencies: over a small sample of Biblical passages, from most frequent to least frequent, enaid sorhm tgþlwu æcfy ðbpxz of Old English compares to eotha sinrd luymw fgcbp kvjqxz of modern English, with the most extreme differences concerning letterforms not shared. Linotype machines for the English language assumed the letter order, from most to least common, to be etaoin shrdlu cmfwyp vbgkqj xz based on the experience and custom of manual compositors. The equivalent for the French language was elaoin sdrétu cmfhyp vbgwqj xz. Arranging the alphabet in Morse into groups of letters that require equal amounts of time to transmit, and then sorting these groups in increasing order, yields e it san hurdm wgvlfbk opxcz jyq. Letter frequency was used by other telegraph systems, such as the Murray Code. Similar ideas are used in modern data-compression techniques such as Huffman coding. Letter frequencies, like word frequencies, tend to vary, both by writer and by subject. For instance, ⟨d⟩ occurs with greater frequency in fiction, as most fiction is written in past tense and thus most verbs will end in the inflectional suffix -ed / -d. One cannot write an essay about x-rays without using ⟨x⟩ frequently, and the essay will have an idiosyncratic letter frequency if the essay is about, say, Queen Zelda of Zanzibar requesting X-rays from Qatar to examine hypoxia in zebras. Different authors have habits which can be reflected in their use of letters. Hemingway's writing style, for example, is visibly different from Faulkner's. Letter, bigram, trigram, word frequencies, word length, and sentence length can be calculated for specific authors and used to prove or disprove authorship of texts, even for authors whose styles are not so divergent. Accurate average letter frequencies can only be gleaned by analyzing a large amount of representative text. With the availability of modern computing and collections of large text corpora, such calculations are easily made. Examples can be drawn from a variety of sources (press reporting, religious texts, scientific texts and general fiction) and there are differences especially for general fiction with the position of ⟨h⟩ and ⟨i⟩, with ⟨h⟩ becoming more common. Different dialects of a language will also affect a letter's frequency. For example, an author in the United States would produce something in which ⟨z⟩ is more common than an author in the United Kingdom writing on the same topic: words like "analyze", "apologize", and "recognize" contain the letter in American English, whereas the same words are spelled "analyse", "apologise", and "recognise" in British English. This would highly affect the frequency of the letter ⟨z⟩, as it is rarely used by British writers in the English language. The "top twelve" letters constitute about 80% of the total usage. The "top eight" letters constitute about 65% of the total usage. Letter frequency as a function of rank can be fitted well by several rank functions, with the two-parameter Cocho/Beta rank function being the best. Another rank function with no adjustable free parameter also fits the letter frequency distribution reasonably well (the same function has been used to fit the amino acid frequency in protein sequences.) A spy using the VIC cipher or some other cipher based on a straddling checkerboard typically uses a mnemonic such as "a sin to err" (dropping the second "r") or "at one sir" to remember the top eight characters. == Relative frequencies of letters in the English language == There are three ways to count letter frequency that result in very different charts for common letters. The first method, used in the chart below, is to count letter frequency in lemmas of a dictionary. The lemma is the word in its canonical form. The second method is to include all word variants when counting, such as "abstracts", "abstracted" and "abstracting" and not just the lemma of "abstract". This second method results in letters like ⟨s⟩ appearing much more frequently, such as when counting letters from lists of the most used English words on the Internet. ⟨s⟩ is especially common in inflected words (non-lemma forms) because it is added to form plurals and third person singular present tense verbs. A final method is to count letters based on their frequency of use in actual texts, resulting in certain letter combinations like ⟨th⟩ becoming more common due to the frequent use of common words like "the", "then", "both", "this", etc. Absolute usage frequency measures like this are used when creating keyboard layouts or letter frequencies in old fashioned printing presses. An analysis of entries in the Concise Oxford dictionary, ignoring frequency of word use, gives an order of "EARIOTNSLCUDPMHGBFYWKVXZJQ". The letter-frequency table above is taken from Pavel Mička's website, which cites Robert Lewand's Cryptological Mathematics. According to Lewand, arranged from most to least common in appearance, the letters are: etaoinshrdlcumwfgypbvkjxqz. Lewand's ordering differs slightly from others, such as Cornell University Math Explorer's Project, which produced a table after measuring 40,000 words. In English, the space character occurs almost twice as frequently as the top letter (⟨e⟩) and the non-alphabetic characters (digits, punctuation, etc.) collectively occupy the fourth position (having already included the space) between ⟨t⟩ and ⟨a⟩. == Relative frequencies of the first letters of a word in the English language == The frequency of the first letters of words or names is helpful in pre-assigning space in physical files and indexes. Given 26 filing cabinet drawers, rather than a 1:1 assignment of one drawer to one letter of the alphabet, it is often useful to use a more equal-frequency-letter code by assigning several low-frequency letters to the same drawer (often one drawer is labeled VWXYZ), and to split up the most-frequent initial letters (⟨s, a, c⟩) into several drawers (often 6 drawers Aa-An, Ao-Az, Ca-Cj, Ck-Cz, Sa-Si, Sj-Sz). The same system is used in some mult

    Read more →
  • Weird SoundCloud

    Weird SoundCloud

    Weird SoundCloud, or SoundClown, is a mashup parody music scene taking place on the online distribution platform SoundCloud. The scene has been described by its producers and music journalists to be a satirical take on electronic dance music, and useless, throwaway internet content. One critic, Audra Schroeder, categorized it as an in-joke that is "deconstructing and reshaping memes and popular music, recontextualizing the sacred texts of millennial chat rooms." == Origins == In a January 2014 interview, DJ Kevin Wang suggested that the Weird SoundCloud has "been around in the last one to two years", but started to gain much more popularity the previous year through electronic dance music internet blogs. Weird SoundCloud producer Ideaot suggested that some in the phenomenon came from the YouTube poop scene. Another producer in the community, DJ @@ (AT-AT), reasoned that producers joining the scene "want to express their musicality, see it as a more mature form of YouTube Poop," or are "just looking for recognition on social media sites." AT-AT said that it was "a fun thing to do, and after I stopped making proper music I felt I needed a bit of an outlet for my creativity. The fact that people enjoyed it and/or treated it as a travesty (Direct quote from one of my tracks) spurs me on." == Characteristics == Weird SoundCloud is a mash-up and parody music genre labeled by journalist Audra Schroeder as an in-joke that is "deconstructing and reshaping memes and popular music, recontextualizing the sacred texts of millennial chat rooms." Most tracks range from around 30 seconds to one minute in length. The people who make weird SoundCloud are known as SoundClowns, a term coined by producer Dicksoak. Ideaot described the weird SoundCloud community as "largely just people who are friends with each other." Noisey critic Ryan Bassil spotlight the variety of music coming out of the weird SoundCloud landscape: "One minute you could be listening to the Seinfeld theme reimagined as an aneurysm inducing dubstep corker, the next, you're recovering from hearing a version of Tenacious D's "Tribute" that's akin to having a stroke." Bassil analyzes that the tracks "often take the past and repurpose it into something that, although not altogether useful, sounds fresh and reflective of the abstract, confusing panoramic that encapsulates the modern internet." Bassil compared the lexicon of SoundClown's track titles to that of Reddit and Twitter users. According to Dicksoak, most works of the style are critiques of EDM or "are just uploaded because they sound funny." However, Bassil disagreed, writing that there are also many tracks that keep repurposing a certain meme, such as "mom's spaghetti" or the re-use of vocals from recordings by hip hop group Death Grips. He describe the scene's re-use of memes as a satirical take on pointless online content that is only on the internet to "do nothing other than fill the void": They're changing the format of the original work's intended message or audience - a technique often employed by top-tier digital media companies - and in doing so they're sarcastically, ironically, taking the piss out of what Web 2.0's turned into - an open arena where the most ridiculous, unashamed, often pointless piggy-back content can rack up thousands and thousands of clicks. == Notable examples == There are mash-ups that "disrupt the flow of popular music", in the words of writer Schroeder, such as a "flutedrop" remix of the Miley Cyrus song "Wrecking Ball" and Shaliek's mashup of music by Bruno Mars and Korn. In November 2013, Wang released a set of mp3 files on SoundCloud named Best Drops Ever, which included tracks like "A Drop So Epic a Bunch of NYU Bros Already Bought a 3-Day Weekend Pass for It" and "A Drop So Crazy You'll Kill Your Family". All of the tracks start as normal electronic dance music build-ups, before they drop into a "bait and switch" audio or film clip such as Filet-O-Fish commercials, the Whitney Houston song "I Will Always Love You" and the film Bambi (1942) that ruins the anticipation. The collection is a parody of the over-importance and over-focus of the drop and lack of care of the overall quality of a song common in the modern electronic dance music scene. Wang has released more than 45 tracks in the weird SoundCloud, some of them receiving around a million plays. Subgenres of Weird SoundCloud include Macklecore, mash-ups and remixes that include the works of American hip-hop recording artist Macklemore, and Biggiewave, which include samples of songs from the album Ready to Die (1994) by The Notorious B.I.G. Common audio and meme sources used include Skrillex, the Martin Garrix track "Animals", Thomas the Tank Engine, Shrek, Macklemore, "Gangnam Style", the Bruno Mars track "Uptown Funk", the Disturbed track "Down with the Sickness", Space Jam, the Childish Gambino track "Bonfire", the Death Grips track "Takyon" and air horn sound effects. == Reception == Bassil praised the SoundClown scene as "loveable and strangely honest", reasoning that it "just reminds me that we're all humans on the internet, all searching for #content that means something, something to connect with, but usually only dredging up bastardised versions of things we've already read, seen, or watched before." Bassil also described the weird SoundCloud as a more successful version of a similar scene known as weird YouTube; the reason for the success of SoundClowns is due to SoundCloud's discovery algorithm: "Small collectives and trends are able to form, and there's an abundance of tracks from artists who are almost forging careers out of it, as opposed to uploading one viral hit." Publications have made lists of weird SoundCloud works, such as BuzzFeed's "23 Of The Weirdest Songs On Soundcloud", Obsev's "Weird SoundCloud Mashups That Must've Been Made While Drunk", and Thump's "9 of the Best and Most Upsetting Soundclowns we Could Find", where writer Isabelle Hellyer called it the "most influential genre of music in human history." A Your EDM writer called it "oddly addicting."

    Read more →