Google Books Ngram Viewer

The Google Books Ngram Viewer is an online search engine that charts the frequencies of any set of search strings using a yearly count of n-grams found in printed sources published between 1500 and 2022 in Google's text corpora in English, Chinese (simplified), French, German, Hebrew, Italian, Russian, or Spanish. There are also some specialized English corpora, such as American English, British English, and English Fiction. The program can search for a word or a phrase. The n-grams are matched with the text within the selected corpus, and if found in 40 or more books, are then displayed as a graph. The program supports searches for parts of speech and wildcards. It is routinely used in research. == History == The Ngram Viewer was created by Google software engineers Will Brockman and Jon Orwant , who teamed up with Harvard researchers Jean-Baptiste Michel and Erez Lieberman Aiden. The service was released on December 16, 2010. Before the release, it was difficult to quantify the rate of linguistic change because of the absence of a database that was designed for this purpose, said Steven Pinker, a well-known linguist who was one of the co-authors of the Science paper published on the same day. The Google Books Ngram Viewer was developed in the hope of opening a new window to quantitative research in the humanities field, and the database contained 500 billion words from 5.2 million books publicly available from the very beginning. The intended audience was scholarly, but the Google Books Ngram Viewer made it possible for anyone with a computer to see a graph that represents the diachronic change of the use of words and phrases with ease. Lieberman said in response to The New York Times that the developers aimed to provide even children with the ability to browse cultural trends throughout history. In the Science paper, Lieberman and his collaborators called the method of high-volume data analysis in digitized texts "culturomics". == Usage == Commas delimit user-entered search terms, where each comma-separated term is searched in the database as an n-gram (for example, "nursery school" is a 2-gram or bigram). The Ngram Viewer then returns a plotted line chart. Due to limitations on the size of the Ngram database, only matches found in at least 40 books are indexed. == Limitations == The data sets of the Ngram Viewer have been criticized for their reliance upon inaccurate optical character recognition (OCR) and for including large numbers of incorrectly dated and categorized texts. Because of these errors, and because they are uncontrolled for bias (such as the increasing amount of scientific literature, which causes other terms to appear to decline in popularity), care must be taken in using the corpora to study language or test theories. Furthermore, the data sets may not reflect general linguistic or cultural change and can only hint at such an effect because they do not involve any metadata like date published, author, length, or genre, to avoid any potential copyright infringements. Systemic errors like the confusion of s and f in pre-19th century texts (due to the use of ſ, the long s, which is similar in appearance to f) can cause systemic bias. Although the Google Books team claims that the results are reliable from 1800 onwards, poor OCR and insufficient data mean that frequencies given for languages such as Chinese may only be accurate from 1970 onward, with earlier parts of the corpus showing no results at all for common terms, and data for some years containing more than 50% noise. Guidelines for doing research with data from Google Ngram have been proposed that try to address some of the issues discussed above.

Conversational user interface

A conversational user interface (CUI) is a user interface for computers that emulates a conversation with a human. Historically, computers have relied on text-based user interfaces and graphical user interfaces (GUIs) (such as the user pressing a "back" button) to translate the user's desired action into commands the computer understands. While an effective mechanism of completing computing actions, there is a learning curve for the user associated with GUI. Instead, CUIs provide opportunity for the user to communicate with the computer in their natural language rather than in a syntax specific commands.

Group (online social networking)

A group (often termed as a community, e-group or club) is a feature in many social networking services which allows users to create, post, comment to and read from their own interest- and niche-specific forums, often within the realm of virtual communities. Groups, which may allow for open or closed access, invitation and/or joining by other users outside the group, are formed to provide mini-networks within the larger, more diverse social network service. Much like electronic mailing lists, they are also owned and maintained by owners, moderators, or managers, who can edit posts to discussion threads and regulate member behavior within the group. However, unlike traditional Internet forums and mailing lists, groups in social networking services allow owners and moderators alike to share account credentials between groups without having to log in to every group. == History == The rise of the World Wide Web resulted in an expansion of the varieties of methods for communication on the Internet, much of which was limited in the 1980s to discussion in newsgroups, BBS and chat rooms. While the initial rise of web-based mass communication took place in the form of early Internet forums in the mid-1990s, a few services such as MSN Groups, Yahoo! Groups and eGroups pioneered the combination of web-based mailing list archives with user profiles; by 2000, such services doubled as full-fledged mailing lists and Internet forums, allowing users to create an extremely large variety of discussion and networking mediums with comparatively sparse thresholds of complexity. Further features included chat rooms (often Java-based), image and video galleries, and group calendars. The second spurt of bullecalbel networking, one which was less dependent upon mailing list-related features and more upon Internet forum features, began in the early- to mid-2000s in the form of such services as LiveJournal, Friendster, MySpace and Facebook. These services continued the evolution of the web-based e-group as a discussion and organization medium. In the late 2000s, services such as Yammer and Micromobs further advanced e-group communication by taking advantage of microblog-style activity streams. == In virtual worlds == In Second Life, groups are centered less around discussion forums (as such, an asynchronous conferencing feature is not built into the Second Life network as of 2009) and common interest, and are more centered on maintenance of a particular geographic location inside the network. Such groups are often created by the owners of areas such as buildings, plots of land or whole islands in order to cater to the most frequent visitors and patrons of the regions. With the limited asynchronous messaging capability of Second Life, groups are also a means of mass-emailing announcements pertinent to the group, but are not completely capable of hosting discussion or deliberation of such announcement messages. == The importance of online social networking groups == Before people expanded their social life to the internet, they had small circles. These included the networks gained from rural areas or villages, such as family, friends and neighbors, and community groups such as churches. These networks represented a social safety net to support individuals. Since we have moved a huge part of our social life to the internet, online social networking groups have become a way to maintain a structure in social life. Online networking is made up by clusters of people, bounding themselves together on the World Wide Web. To be able to sort out the many different clusters we belong to we use online groups to helps us arrange and make sense of all our contacts. This sense-making is rooted within us, we sort and put people into compartments or sort by categories to make sense and try to understand our relationships to the people around us. Online social networking groups therefore enables us to do the same thing online. Online social networks have a huge impact on people’s lives. Since the social network revolution has offered people with more loose ties and diversity in their relationships, it creates both stress and opportunities. Furthermore, the Internet revolution has transformed the contact point from a household to the individual. In addition, people are in constant communication with each other due to the mobile revolution. All in all, the mentioned revolutions created a new social operating system: "networked individualism". The way that people currently connect, communicate and exchange information can be described as a form of operating system because of the similarities between the structure of computer systems and the networked individualism that has taken form in society. These structures consist of unwritten rules, norms, constraints and opportunities which are apparent for those who are part of a specific network. == Concerns == There is some research claiming that fake news is infiltrating online social networking. A recent study claimed that people exposed to fake news generally revert to their original opinion even after finding out the information they were given was false.

FactorDaily

FactorDaily is an Indian digital media publication founded in 2016 by Pankaj Mishra and Jayadevan PK. Mishra was formerly an Editor at TechCrunch and the Economic Times. The digital publication was launched with an intent to produce stories on the impact of technology on life in India. == History == FactorDaily began publishing in May 2016, with daily reported stories on technology, culture and life in India. Prior to its launch, the company had raised $1 million in seed funding from Accel India, Blume Ventures, Girish Mathrubootham of Freshdesk, Vijay Shekhar Sharma of PayTm, and Jay Vijayan of Tekion. Josey Puliyenthuruthel John, formerly Managing Editor at Business Today and National Corporate Editor at Mint, later joined the company as a Consulting Editor. In January 2017, FactorDaily launched its first Podcast called The Outliers. The inaugural episode featured a conversation with Manish Sharma of Printo on his journey starting up. == Awards == The FactorDaily team won the Bengaluru Editors Lab 2017, a journalism hackathon organised by the Global Editors Network (GEN). The story titled "India has 3,800 psychiatrists for 1.2bn people. Can tech step in to manage mental health?" won the first prize in the online category of the fifth Schizophrenia Research Foundation’s (SCARF) ‘Media for Mental Health’ awards. The story titled 'The dark hand of tech that stokes sex trafficking in India', won the Stop Slavery media Awards by the Thomson Reuters Foundation for the year 2020.

Acousto-electronics

Acousto-electronics (also spelled 'Acoustoelectronics') is a branch of physics, acoustics and electronics that studies interactions of ultrasonic and hypersonic waves in solids with electrons and with electro-magnetic fields. Typical phenomena studied in acousto-electronics are acousto-electric effect and also amplification of acoustic waves by flows of electrons in piezoelectric semiconductors, when the drift velocity of the electrons exceeds the velocity of sound. The term 'acousto-electronics' is often understood in a wider sense to include numerous practical applications of the interactions of electro-magnetic fields with acoustic waves in solids. In particular, these are signal processing devices using surface acoustic waves (SAW), different sensors of temperature, pressure, humidity, acceleration, etc.

Voice activity detection

Voice activity detection (VAD), also known as speech activity detection or speech detection, is the detection of the presence or absence of human speech, used in speech processing. The main uses of VAD are in speaker diarization, speech coding and speech recognition. It can facilitate speech processing, and can also be used to deactivate some processes during non-speech section of an audio session: it can avoid unnecessary coding/transmission of silence packets in Voice over Internet Protocol (VoIP) applications, saving on computation and on network bandwidth. VAD is an important enabling technology for a variety of speech-based applications. Therefore, various VAD algorithms have been developed that provide varying features and compromises between latency, sensitivity, accuracy and computational cost. Some VAD algorithms also provide further analysis, for example whether the speech is voiced, unvoiced or sustained. Voice activity detection is usually independent of language. It was first investigated for use on time-assignment speech interpolation (TASI) systems. == Algorithm overview == The typical design of a VAD algorithm is as follows: There may first be a noise reduction stage, e.g. via spectral subtraction. Then some features or quantities are calculated from a section of the input signal. A classification rule is applied to classify the section as speech or non-speech – often this classification rule finds when a value exceeds a certain threshold. There may be some feedback in this sequence, in which the VAD decision is used to improve the noise estimate in the noise reduction stage, or to adaptively vary the threshold(s). These feedback operations improve the VAD performance in non-stationary noise (i.e. when the noise varies a lot). A representative set of recently published VAD methods formulates the decision rule on a frame by frame basis using instantaneous measures of the divergence distance between speech and noise. The different measures which are used in VAD methods include spectral slope, correlation coefficients, log likelihood ratio, cepstral, weighted cepstral, and modified distance measures. Independently from the choice of VAD algorithm, a compromise must be made between having voice detected as noise, or noise detected as voice (between false positive and false negative). A VAD operating in a mobile phone must be able to detect speech in the presence of a range of very diverse types of acoustic background noise. In these difficult detection conditions it is often preferable that a VAD should fail-safe, indicating speech detected when the decision is in doubt, to lower the chance of losing speech segments. The biggest difficulty in the detection of speech in this environment is the very low signal-to-noise ratios (SNRs) that are encountered. It may be impossible to distinguish between speech and noise using simple level detection techniques when parts of the speech utterance are buried below the noise. == Applications == VAD is an integral part of different speech communication systems such as audio conferencing, echo cancellation, speech recognition, speech encoding, speaker recognition and hands-free telephony. In the field of multimedia applications, VAD allows simultaneous voice and data applications. Similarly, in Universal Mobile Telecommunications Systems (UMTS), it controls and reduces the average bit rate and enhances overall coding quality of speech. In cellular radio systems (for instance GSM and CDMA systems) based on Discontinuous Transmission (DTX) mode, VAD is essential for enhancing system capacity by reducing co-channel interference and power consumption in portable digital devices. In speech processing applications, voice activity detection plays an important role since non-speech frames are often discarded. For a wide range of applications such as digital mobile radio, Digital Simultaneous Voice and Data (DSVD) or speech storage, it is desirable to provide a discontinuous transmission of speech-coding parameters. Advantages can include lower average power consumption in mobile handsets, higher average bit rate for simultaneous services like data transmission, or a higher capacity on storage chips. However, the improvement depends mainly on the percentage of pauses during speech and the reliability of the VAD used to detect these intervals. On the one hand, it is advantageous to have a low percentage of speech activity. On the other hand, clipping, that is the loss of milliseconds of active speech, should be minimized to preserve quality. This is the crucial problem for a VAD algorithm under heavy noise conditions. === Use in telemarketing === One controversial application of VAD is in conjunction with predictive dialers used by telemarketing firms. In order to maximize agent productivity, telemarketing firms set up predictive dialers to call more numbers than they have agents available, knowing most calls will end up in either "Ring – No Answer" or answering machines. When a person answers, they typically speak briefly ("Hello", "Good evening", etc.) and then there is a brief period of silence. Answering machine messages are usually 3–15 seconds of continuous speech. By setting VAD parameters correctly, dialers can determine whether a person or a machine answered the call and, if it's a person, transfer the call to an available agent. If it detects an answering machine message, the dialer hangs up. Often, even when the system correctly detects a person answering the call, no agent may be available, resulting in a "silent call". Call screening with a multi-second message like "please say who you are, and I may pick up the phone" will frustrate such automated calls. == Performance evaluation == To evaluate a VAD, its output using test recordings is compared with those of an "ideal" VAD – created by hand-annotating the presence or absence of voice in the recordings. The performance of a VAD is commonly evaluated on the basis of the following four parameters: FEC (Front End Clipping): clipping introduced in passing from noise to speech activity; MSC (Mid Speech Clipping): clipping due to speech misclassified as noise; OVER: noise interpreted as speech due to the VAD flag remaining active in passing from speech activity to noise; NDS (Noise Detected as Speech): noise interpreted as speech within a silence period. Although the method described above provides useful objective information concerning the performance of a VAD, it is only an approximate measure of the subjective effect. For example, the effects of speech signal clipping can at times be hidden by the presence of background noise, depending on the model chosen for the comfort noise synthesis, so some of the clipping measured with objective tests is in reality not audible. It is therefore important to carry out subjective tests on VADs, the main aim of which is to ensure that the clipping perceived is acceptable. In VoIP applications, front-end clipping can be reduced by rewinding to shortly before the detection and sending very slightly delayed data. This kind of test requires a certain number of listeners to judge recordings containing the processing results of the VADs being tested, giving marks to several speech sequences on the following features: Quality; Comprehension difficulty; Audibility of clipping. These marks are then used to calculate average results for each of the features listed above, thus providing a global estimate of the behavior of the VAD being tested. To conclude, whereas objective methods are very useful in an initial stage to evaluate the quality of a VAD, subjective methods are more significant. As they require the participation of several people for a few days, increasing cost, they are generally only used when a proposal is about to be standardized. == Implementations == One early standard VAD is that developed by British Telecom for use in the Pan-European digital cellular mobile telephone service in 1991. It uses inverse filtering trained on non-speech segments to filter out background noise, so that it can then more reliably use a simple power-threshold to decide if a voice is present. The G.729 standard calculates the following features for its VAD: line spectral frequencies, full-band energy, low-band energy (<1 kHz), and zero-crossing rate. It applies a simple classification using a fixed decision boundary in the space defined by these features, and then applies smoothing and adaptive correction to improve the estimate. The GSM standard includes two VAD options developed by ETSI. Option 1 computes the SNR in nine bands and applies a threshold to these values. Option 2 calculates different parameters: channel power, voice metrics, and noise power. It then thresholds the voice metrics using a threshold that varies according to the estimated SNR. The Speex audio compression library uses a procedure named Improved Minima Controlled Recursive Averaging, which uses a smoothed representation of spectral pow

Creepiness

Creepiness is the state of being creepy, or causing an unpleasant feeling of fear or unease to someone and/or something. Certain traits or hobbies may make people seem creepy to others; interest in horror or the macabre might come across as 'creepy', and often people who are perverted or exhibit predatory behavior are called 'creeps'. The internet, especially some functions of social media, has been described as increasingly creepy. Adam Kotsko has compared the modern conception of creepiness to the Freudian concept of unheimlich. The term has also been used to describe paranormal or supernatural phenomena. Some people have phobias which are irrational fears, which can make them perceive something as creepy. == History and studies == "Creepiness" is subjective: for example some dolls have been described as creepy, while what makes something "creepy" or "strange" to someone might seem normal to someone else. The adjective "creepy", referring to a feeling of creeping in the flesh, was first used in 1831, but it was Charles Dickens who coined and popularized the term "the creeps" in his 1849 novel David Copperfield. In the 20th century, association was made between involuntary celibacy and creepiness. The concept of creepiness has only recently been formally addressed in social media marketing. The sensation of creepiness has only recently been the subject of psychological research, despite the widespread colloquial use of the word throughout the years. Francis T. McAndrew of Knox College is the first psychologist to do an empirical study on creepiness. == Causes == The state of creepiness has been associated with "feeling scared, nervous, anxious or worried", "awkward or uncomfortable", "vulnerable or violated" in a study conducted by Watt et al. This state arises in the presence of a creepy element, which can be an individual or, as recently observed, new technologies. === Individuals === Creepiness can be caused by the appearance of an individual. Another study investigated the characteristics that make people creepy. Creepy people were thought to be more often male than female by an overwhelming majority of participants (around 95% of both male and female participants). Another study conducted by Watt et al. also found that participants associated the ectomorphic body type (more linear) with creepiness, more than the other two body types (51% vs mesomorphic, 24% and endomorphic, 23%). Other cues of creepiness included low hygiene, especially according to female participants, and a disheveled appearance. Participants also identified the face as an area with potentially creepy features: in particular the eyes and the teeth. Both of those physical features were deemed creepy not only for their unpleasant appearance (ex. squinty eyes or crooked teeth) but also for the movements and expressions they engaged it (ex. darting eye movements and odd smiles). In fact, appearance does not seem to be the only factor making an individual creepy: behaviors provide cues as well. Behaviors such as "being unusually quiet and staring (34%), following or lurking (15%), behaving abnormally (21%), or in a socially awkward, "sketchy" or suspicious way (20%)" are all contributing to a feeling of creepiness, as described by Watt et al.'s study. === Technology === In addition to other individuals, new technologies, such as marketing's targeted ads and AI, have been qualified as creepy. A study by Moore et al. described what aspect of marketing participants considered creepy. The main three reasons are the following: using invasive tactics, causing discomfort and violating of norms. Invasive tactics are practiced by marketers that know so much about the consumer that the ads are "creepily" personalized. Secondly, some ads create discomfort by making the consumer question "the motives of the company advertising the product". Finally, some ads violate social norms by having inappropriate content, for example by unnecessarily sexualizing it. It is marketing's extensive knowledge used in an improper way, together with a certain loss of control over our data, that creates a feeling of creepiness. Another creepy aspect of technology is human-looking AI: this phenomenon is called the uncanny valley. Humans find robots creepy when they start closely resembling humans. It has been hypothesized that the reason why they are viewed as creepy is because they violate our notion of how a robot should look. A study focusing on children's responses to this phenomenon found evidence to support the hypothesis. == Evolutionary explanation == Several studies have hypothesized that creepiness is an evolutionary response to potentially dangerous situations. It could be linked to a mechanism called agent detection which makes individuals expect malignant agents to be responsible for small changes in the environment. McAndrew et al. illustrates the idea with the example of a person hearing some noises while walking in a dark alley. That person would go in high alert, fearing that some dangerous individual was there. If that was not the case the loss would be small. If, on the other hand, a dangerous individual was actually in the alley and the person had not been alerted by this creepy feeling, the loss could have been significant. Creepiness would therefore serve the purpose of alerting us in situations in which the danger is not outright obvious but rather ambiguous. In this case, ambiguity both refers to the possible presence of a threat and to its nature, sexual or physical for example. Creepiness "may reside in between the unknowing and the fear" in the sense that individuals experiencing it are unsure if there truly is something to fear or not. Creepy characteristics are not simply caused by threat potential: in fact, ectomorphic body types are not the most powerful bodies and facial expressions are not a proxy of physical strength either. Therefore, creepiness is not only related to how threatening a characteristic is, in the sense of how dangerous and strong the individual can be. There are more facets to consider. Another characteristic of creepiness is unpredictable behavior. Unpredictability links back to this idea of ambiguity. When an individual is unpredictable it is not possible to tell when their behavior will turn violent: this adds to the ambiguity of a potentially dangerous situation. This theory is endorsed by studies. Not only is unpredictability directly listed as a creepy characteristic, but other behaviors, such as norm-breaking behaviors are indirectly linked with unpredictability. Such behaviors show that the individual does not conform to some social standards others would expect in a given situation. For example, the aforementioned staring at strangers or lack of hygiene—behaviors that make us uneasy or creeped out because they do not fit the norm and therefore are not expected. More generally, participants tended to define creepiness as "different" in the sense of not behaving, or looking, socially acceptable. Such differences point towards a "social mismatch". Humans have a natural system of detection of such mismatch: a physical feeling of coldness. When an individual is creeped out, they report feeling those "cold chills". This phenomenon has been studied by Leander et al, with relation to nonverbal mimicry in social interactions, meaning the unintentional copying of another's behavior. Inappropriate mimicry may leave a person feeling like something is off about the other. Absence of non-verbal mimicry in a friendly interaction, or the presence of it in a professional setting, raises suspicion as it does not follow the relevant social norms. Individuals are left wondering what other unusual behavior the other might engage in.