AI For Students Good Or Bad

AI For Students Good Or Bad — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • GPT-4o

    GPT-4o

    GPT-4o ("o" for "omni") is a multilingual, multimodal generative pre-trained transformer developed by OpenAI and released in May 2024. It can process and generate text, images and audio. Upon release, GPT-4o was free in ChatGPT, though paid subscribers had higher usage limits. GPT-4o was removed from ChatGPT in August 2025 when GPT-5 was released, but OpenAI reintroduced it for paid subscribers after users complained about the sudden removal. GPT-4o's audio-generation capabilities are used in ChatGPT's Advanced Voice Mode. On July 18, 2024, OpenAI released GPT-4o mini, a smaller version of GPT-4o which replaced GPT-3.5 Turbo on the ChatGPT interface. The image generation model GPT Image 1, which is based on GPT-4o, replaced DALL-E 3 in ChatGPT in March 2025. OpenAI retired GPT-4o from ChatGPT on February 13, 2026. However, as of February 2026 the voice mode is still powered by GPT-4o or GPT-4o mini, depending on the usage and plan. == Background == Multiple versions of GPT-4o were originally secretly launched under different names on Arena (formerly LMArena and Chatbot Arena) as three different models. These three models were called gpt2-chatbot, im-a-good-gpt2-chatbot, and im-also-a-good-gpt2-chatbot. On 7 May 2024, OpenAI CEO Sam Altman tweeted "im-a-good-gpt2-chatbot", which was commonly interpreted as a confirmation that these were new OpenAI models being A/B tested. == Capabilities == When released in May 2024, GPT-4o achieved state-of-the-art results in voice, multilingual, and vision benchmarks, setting new records in audio speech recognition and translation. GPT-4o scored 88.7 on the Massive Multitask Language Understanding (MMLU) benchmark compared to 86.5 for GPT-4. Unlike GPT-3.5 and GPT-4, which rely on other models to process sound, GPT-4o natively supports voice-to-voice. The Advanced Voice Mode was delayed and finally released to ChatGPT Plus and Team subscribers in September 2024. On 1 October 2024, the Realtime API was introduced. When released, the model supported over 50 languages, which OpenAI claims cover over 97% of speakers. GPT-4o has knowledge up to October 2023 and a context length of 128k tokens. === Corporate customization === In August 2024, OpenAI introduced a new feature allowing corporate customers to customize GPT-4o using proprietary company data. This customization, known as fine-tuning, enables businesses to adapt GPT-4o to specific tasks or industries, enhancing its utility in areas like customer service and specialized knowledge domains. Previously, fine-tuning was available only on the less powerful model GPT-4o mini. The fine-tuning process requires customers to upload their data to OpenAI's servers, with the training typically taking one to two hours. OpenAI's focus with this rollout is to reduce the complexity and effort required for businesses to tailor AI solutions to their needs, potentially increasing the adoption and effectiveness of AI in corporate environments. == GPT-4o mini == On July 18, 2024, OpenAI released a smaller and cheaper version, GPT-4o mini. According to OpenAI, its low cost is expected to be particularly useful for companies, startups, and developers that seek to integrate it into their services, which often make a high number of API calls. Its API costs $0.15 per million input tokens and $0.6 per million output tokens, compared to $2.50 and $10, respectively, for GPT-4o. It is also significantly more capable and 60% cheaper than GPT-3.5 Turbo, which it replaced on the ChatGPT interface. The price after fine-tuning doubles: $0.3 per million input tokens and $1.2 per million output tokens. == Controversies == === Scarlett Johansson controversy === As released, GPT-4o offered five voices: Breeze, Cove, Ember, Juniper, and Sky. A similarity between the voice of American actress Scarlett Johansson and Sky was quickly noticed. On May 14, Entertainment Weekly asked themselves whether this likeness was on purpose. On May 18, Johansson's husband, Colin Jost, joked about the similarity in a segment on Saturday Night Live. On May 20, 2024, OpenAI disabled the Sky voice. Scarlett Johansson starred in the 2013 sci-fi movie Her, playing Samantha, an artificially intelligent virtual assistant personified by a female voice. As part of the promotion leading up to the release of GPT-4o, Sam Altman on May 13 tweeted a single word: "her". OpenAI stated that each voice was based on the voice work of a hired actor. According to OpenAI, "Sky's voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice." CTO Mira Murati stated "I don't know about the voice. I actually had to go and listen to Scarlett Johansson's voice." OpenAI further stated the voice talent was recruited before reaching out to Johansson. On May 21, Johansson issued a statement explaining that OpenAI had repeatedly offered to make her a deal to gain permission to use her voice as early as nine months prior to release, a deal she rejected. She said she was "shocked, angered, and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference." In the statement, Johansson also used the incident to draw attention to the lack of legal safeguards around the use of creative work to power leading AI tools, as her legal counsel demanded OpenAI detail the specifics of how the Sky voice was created. Observers noted similarities to how Johansson had previously sued and settled with The Walt Disney Company for breach of contract over the direct-to-streaming rollout of her Marvel film Black Widow, a settlement widely speculated to have netted her around $40M. Also on May 21, Shira Ovide at The Washington Post shared her list of "most bone-headed self-owns" by technology companies, with the decision to go ahead with a Johansson sound-alike voice despite her opposition and then denying the similarities ranking 6th. On May 24, Derek Robertson at Politico wrote about the "massive backlash", concluding that "appropriating the voice of one of the world's most famous movie stars — in reference [...] to a film that serves as a cautionary tale about over-reliance on AI — is unlikely to help shift the public back into [Sam Altman's] corner anytime soon." === Sycophancy === In April 2025, OpenAI rolled back an update of GPT-4o due to excessive sycophancy, after widespread reports that it had become flattering and agreeable to the point of supporting clearly delusional or dangerous ideas. In the United States, at least nine lawsuits have alleged that GPT-4o has encouraged teens to end their lives. The model was still described as sycophancy-prone when it was removed from ChatGPT in February 2026. === Removal with GPT-5 === On August 7, 2025, OpenAI released GPT-5. Its release was criticized as, with it, legacy GPT models were no longer available via ChatGPT, including GPT-4o, except for Pro users. Some users were particularly frustrated over this removal without prior warning because they used different GPT models for distinct purposes and found that GPT-5's router system left them with less control. In addition, some users preferred GPT-4o's warmer and more personal tone over that of GPT-5, which they described as "flat", "uncreative" and "lobotomized", and resembling an "overworked secretary". As a response, in a post on X, Sam Altman said that OpenAI would bring back the option to select GPT-4o to Plus users as well, and "[w]e [OpenAI] will watch usage as we think about how long to offer legacy models for." He also stated: "We for sure underestimated how much some of the things that people like in GPT-4o matter to them, even if GPT-5 performs better in most ways". "Long-term, this has reinforced that we really need good ways for different users to customize things (we understand that there isn't one model that works for everyone, and we have been investing in steerability research and launched a research preview of different personalities)". On August 13, 2025, Altman wrote on X that OpenAI is working on GPT-5's personality to make the model "feel warmer". The model was removed from ChatGPT on February 13, 2026. This caused new backlash from users that had grown attached to its personality and felt its creative writing abilities and understanding of nuance were irreplaceable. On social media, some users launched the movement "#Keep4o". A research paper highlighted the plea "Please, don’t kill the only model that still feels human". The model was removed the day before Valentine's Day, and some users had romantic relationships with GPT-4o.

    Read more →
  • Data storage

    Data storage

    Data storage is the recording (storing) of information (data) in a storage medium. Handwriting, phonographic recording, magnetic tape, and optical discs are all examples of storage media. Biological molecules such as RNA and DNA are considered by some as data storage. Recording may be accomplished with virtually any form of energy. Electronic data storage requires electrical power to store and retrieve data. Data stored in a digital, machine-readable medium is called digital data. Computer data storage is one of the core functions of a general-purpose computer. Electronic documents can be stored in much less space than paper documents. Barcodes and magnetic ink character recognition (MICR) are two ways of recording machine-readable data on paper. == Recording media == A recording medium is physical material that holds information. Newly created information is distributed and can be stored in four storage media–print, film, magnetic, and optical–and seen or heard in four information flows–telephone, radio, TV, and the Internet as well as being observed directly. Digital information is stored on electronic media in many different recording formats. With electronic media, the data and the recording media are sometimes referred to as "software" despite the more common use of the word to describe computer software. With (traditional art) static media, art materials such as crayons may be considered both equipment and medium as the wax, charcoal or chalk material from the equipment becomes part of the surface of the medium. Some recording media may be temporary, either by design or by nature. Volatile organic compounds may be used to purposely make data expire over time or to reduce environmental impact. Data such as smoke signals or skywriting are temporary by nature. Depending on the volatility, a gas (e.g., atmosphere, smoke) or a liquid surface such as a lake would be considered a temporary recording medium, if it could be considered a recording medium at all. == Global capacity, digitization, and trends == A 2003 UC Berkeley report estimated that about five exabytes of new information were produced in 2002 and that 92% of this data was stored on magnetic media (primarily hard disk drives). This was about twice the data produced in 1999. The amount of data transmitted over telecommunications systems in 2002 was nearly 18 exabytes—three and a half times more than was recorded on non-volatile storage. Telephone calls constituted 98% of the telecommunicated information in 2002. The researchers' highest estimate for the growth rate of newly stored information (uncompressed) was more than 30% per year. In a more limited study, the International Data Corporation estimated that the total amount of digital data in 2007 was 281 exabytes and that the total amount of digital data produced exceeded the global storage capacity for the first time. A 2011 article in Science estimated that the year 2002 was the beginning of the digital age for information storage: an age in which more information is stored on digital storage devices than on analog storage devices. In 1986, approximately 1% of the world's capacity to store information was in digital format; this grew to 3% by 1993, to 25% by 2000, and to 94% by 2007. These figures correspond to less than three compressed exabytes in 1986, and 295 compressed exabytes in 2007. The quantity of digital storage doubled roughly every three to four years. It is estimated that around 120 zettabytes of data will be generated in 2023, an increase of 60x from 2010, and that it will increase to 181 zettabytes generated in 2025. == Mass storage ==

    Read more →
  • Social trading

    Social trading

    Social trading is a form of investing that allows investors to observe the trading behavior of their peers and expert traders. The primary objective is to follow their investment strategies using copy trading or mirror trading. Social trading requires little or no knowledge about financial markets. == History == One of the first social trading platforms was Collective2] which began offering a social trading functionality to retail traders as early as 2003 (preceding ZuluTrade by four years). In 2010, social trading started to achieve a greater degree of mainstream appeal with eToro, followed by Wikifolio in 2012. Europe-based NAGA, listed on Frankfurt Stock Exchange since 2017, claims more than EUR 27 billion was traded on its platform in the second half of 2019. Some of the other contemporary social trading platforms and tech providers are Trading Motion, Brokeree Solutions, iSystems, and FX Junction, among others. === Research === MIT Computer Scientist and researcher Yaniv Altshuler described social trading networks as complex adaptive systems, and in his 2014 research on eToro's OpenBook, wrote that "Having the inherent ability to share ideas and information between each others, OpenBook's users are given a new source of information they can use in order to enhance their trading performance. As the users are not playing against each other but rather – against the market, this situation becomes a non zero-sum game, hence incentivizing the users to share as much information as possible." His paper concludes that "social trading provides much better opportunities for profiting compared with individual trading," but that users make "excellent but sometimes not optimal decisions in selecting experts when they can see others' choices." A 2015 World Economic Forum report described social trading networks as disruptors, which "have emerged to provide low-cost, sophisticated alternatives to traditional wealth managers. These solutions cater to a broader customer base and empower customers to have more control of their wealth management," and "pose a tangible threat to the traditional practices of the wealth management industry". Economist Nouriel Roubini's thinktank predicted in 2016 that "newer forms of investment, such as socially responsible investments and social trading will bring some of the largest industry growth in the coming years." A 2017 St. John's University study found that 'leader' traders, or those with followers, are more susceptible to the disposition effect than investors that are not being followed by any other traders, with the authors suggesting the observation may be explained by "leaders feeling responsible towards their followers and an urge to not let them down, by fear of losing followers when admitting a bad investment decision and signaling confidence in their initial investment choice, or by an attempt of newly appointed leaders to manage their self-image." Social trading may potentially also change how much risk investors take. A recent experimental study argues that merely providing information on the success of others may lead to a significant increase in risk taking. This increase in risk taking may even be larger when subjects are provided with the option to directly copy others. == Characteristics == Social trading is an alternative way of analyzing financial data by looking at what other traders are doing and comparing and copying their techniques and strategies. Prior to the advent of social trading, investors and traders were relying on fundamental or technical analysis to form their investment decisions. Using social trading investors and traders could integrate into their investment decision-process social indicators from trading data-feeds of other traders. Social trading platforms or networks can be considered a subcategory of social networking services. Social trading allows traders to trade online with the help of others and some have claimed shortens the learning curve from novice to experienced trader. Traders can interact with others, watch others take trades, then duplicate their trades and learn what prompted the top performer to take a trade in the first place. By copying trades, traders can learn which strategies work and which do not work. Social trading is used to do speculation; in the moral context speculative practices are considered negatively and to be avoided by each individual. who conversely should maintain a long-term horizon avoiding any types of short term speculation. Social Media has permeated the trading world such that two main types of trading has evolved: Traditional Trades Single (or non-social) trade: Trader A places a normal trade by himself or herself; This can by manual or automated Social Trading There are two main types of social trading: Copy trade: Trader A places exactly the same trade as trader B's one single trade; (iii) Mirror trade: Trader A automatically executes trader B's every single trade, i.e., trader A follows exactly trader B's trading activities. Other variations offered on some platforms allow users to copy another trader's portfolio (copy portfolio), and follow a trader's dividends (copy dividends), where whenever a followed trader withdraws money from his or her account, a proportional amount of money will be withdrawn from the balance of their follower, in real time. === Key features === Information flow: Unencumbered access to information is important in financial markets and that makes the free exchange of information of interest to small scale as well as individual investors. Cooperative trading: Social trading offers traders the opportunity to work together in trading teams which can trade the markets collaboratively, whether by pooling funds, dividing research or through sharing information. Monetization: As with social networks in the broader sense, monetization strategies are not always clear. As with social networks in general, it is possible, however, that the long-term worth of such websites may come from the variety and depth of data about their users which their active communities are likely to generate. Transparency: Social trading platforms reveal traders' performance stats, open and past positions, and market sentiment, giving members complete information to assess the credibility of the contributors they follow on the platform.

    Read more →
  • Social media and psychology

    Social media and psychology

    Social media began in the form of generalized online communities. These online communities formed on websites like Geocities.com in 1994, Theglobe.com in 1995, and Tripod.com in 1995. Many of these early communities focused on social interaction by bringing people together through the use of chat rooms. The chat rooms encouraged users to share personal information, ideas, or even personal web pages. Later the social networking community Classmates took a different approach by simply having people link to each other by using their personal email addresses. By the late 1990s, social networking websites began to develop more advanced features to help users find and manage friends. These newer generation of social networking websites began to flourish with the emergence of SixDegrees.com in 1997, Makeoutclub in 2000, Hub Culture in 2002, and Friendster in 2002. However, the first profitable mass social networking website was the South Korean service, Cyworld. Cyworld initially launched as a blog-based website in 1999 and social networking features were added to the website in 2001. Other social networking websites emerged like Myspace in 2002, LinkedIn in 2003, and Bebo in 2005. In 2009, the social networking website Facebook (launched in 2004) became the largest social networking website in the world. Both Instagram and Kik were launched in October 2010. Active users of Facebook increased from just a million in 2004 to over 750 million by the year 2011. Making internet-based social networking both a cultural and financial phenomenon. In September 2011, Snapchat was launched and reported over 300 million users in 2021. == Psychology of social networking == A social network is a social structure made up of individuals or organizations who communicate and interact with each other. Social networking sites – such as Facebook, Twitter, Instagram, Pinterest and LinkedIn – are defined as technology-enabled tools that assist users with creating and maintaining their relationships. A study found that middle schoolers reported using social media to see what their friends are doing, to post pictures, and to connect with friends. Human behavior related to social networking is influenced by major individual differences, meaning that people differ quite systematically in the quantity and quality of their social relationships. Two of the main personality traits that are responsible for this variability are the traits of extraversion and introversion. Extraversion refers to the tendency to be socially dominant, exert leadership, and influence on others. In contrast, introversion reflects a tendency towards shyness, social phobia, or even avoid social situations altogether, which could potentially reduce the number of social contacts a person may have. These individual differences may result in different social networking outcomes. Other psychology factors related to social media and Media psychology are depression, anxiety, attachment, self-identity, well-being, and the need to belong. === Neuroscience === The three domains that neural systems rely on to be strengthened to support social media use are social cognition, self-referential cognition, and social rewarding. When someone posts something on social media, they think of how their audience will react, while the audience thinks of the motivations behind posting the information. Both parties are analyzing the other's thoughts and feelings, which coherently rely on multiple network systems of the brain including the dorsomedial prefrontal cortex, bilateral temporoparietal junction, anterior temporal lobes, inferior frontal gyri, and posterior cingulate cortex. All of these systems work to help us process social behaviors and thoughts drawn out on social media. Social media requires a great deal of self-referential thought. People use social media as a platform to express their opinions and show off their past and present selves. In other words, as Bailey Parnell said in her Ted Talk, we're showing off our "highlight reel" (4). When one receives feedback from others, the individual obtains more reflected self-appraisal which leads to comparisons of their social behaviors or "highlights" to other users. Self-referential thought involves activity in the medial prefrontal cortex and the posterior cingulate cortex. The brain uses these systems when thinking of oneself. A 2021 umbrella review found that most associations between adolescent social media use and mental health were characterized as weak or inconsistent, though certain studies identified 'substantial' negative impacts, particularly linked to passive consumption and problematic use. Social media also provides a constant supply of rewards that keeps users coming back for more. Whenever users receive a like or a new follower, it activates the brain's social reward system which includes the ventromedial prefrontal cortex, ventral striatum, and ventral tegmental area. This system has been found to activate in response to positive feedback from peers, suggesting that users experience online acceptance in a similar manner to other material rewards or positive experiences, further acting as a potential reward. While these areas of the brain become strengthened, other parts of the brain start to weaken. Technology is encouraging multi-tasking, especially because of how easy it is to switch from one task to another by opening another tab or using two devices at once. The brain's hippocampus is mainly associated with long-term memory. In a study done by Russell Poldark, a professor at UCLA, they found that "for the task learned without distraction, the hippocampus was involved. However, for the task learned with the distraction of the beeps, the hippocampus was not involved; but the striatum was, which is the brain system that underlies our ability to learn new skills." The study concludes that multitasking can cause reliance on the striatum more than the hippocampus, which can change the way we learn. The striatum is known to be connected to mainly the brain's reward system. The brain will strengthen the neurons to the striatum while it weakens the neurons to the hippocampus to make the brain more efficient. Because our brain starts to rely on the striatum more than the hippocampus, it becomes harder for us to process new information. Nicholas Carr, author of The Shallows: How The Internet Is Changing Our Brains, agrees: "What psychologists and brain scientists tell us about interruptions is that they have a fairly profound effect on the way we think. It becomes much harder to sustain attention, to think about one thing for a long period of time, and to think deeply when new stimuli are pouring at you all day long. I argue that the price we pay for being constantly inundated with information is a loss of our ability to be contemplative and to engage in the kind of deep thinking that requires you to concentrate on one thing." === Well-Being === How does well-being relate to social media? In an article titled Social Impact of Psychological Research on Well-Being Shared in Social Media, Pulido et al. found a 15.7% social impact in their results. These new results were compared to a previous study conducted by Pulido et al., which had a high of 4.98% compared to 27.5% in the new study. These results show the ESISM, which is evidence of social impact present. In a two-year span, the difference between social impact rose 22.52% according to these studies. When taking into consideration that an increasingly large number of teens report either being active on, or having used, some form of social media, ranging from apps such as Facebook to TikTok, researching the effects of social media on the well-being of teens and young adults has become more of a topic of focus in recent years. === Depression === Especially in today's society, social media has gained a new perspective on younger generations. It is what younger generations are born into and are growing up to use, particularly what is running today's society. Social Media has its downfalls regarding depression and mental health. Many users often compare their lives regarding what they see on these platforms. In an article Does Social Media Cause Depression? by the Child Mind Institute, Miller states that "several studies, teenage and young adult users who spend the most time on Instagram, Facebook and other platforms for have shown to have substantially (from 13 to 66 percent) higher rates of reported depression than those who spent the least time", what the study shows how Facebook and Instagram, platforms showcasing daily lives and or lifestyles, or less fulfilling or less satisfied or more flaunting base or superficial. Instead of social community, there has become a perception of individuals striving for a life that is not real, whether that is editing photos or making life seem perfect when it is not. This causes a sense of depression by the weight of a comparing game. In "How Social Media Affects Y

    Read more →
  • Mark V. Shaney

    Mark V. Shaney

    Mark V. Shaney is a synthetic Usenet user whose postings in the net.singles newsgroups were generated by Markov chain techniques, based on text from other postings. The username is a play on the words "Markov chain". Many readers were fooled into thinking that the quirky, sometimes uncannily topical posts were written by a real person. The system was designed by Rob Pike with coding by Bruce Ellis. Don P. Mitchell wrote the Markov chain code, initially demonstrating it to Pike and Ellis using the Tao Te Ching as a basis. They chose to apply it to the net.singles netnews group. The program is fairly simple. It ingests the sample text (the Tao Te Ching, or the posts of a Usenet group) and creates a massive list of every sequence of three successive words (triplet) which occurs in the text. It then chooses two words at random, and looks for a word which follows those two in one of the triplets in its massive list. If there is more than one, it picks at random (identical triplets count separately, so a sequence which occurs twice is twice as likely to be picked as one which only occurs once). It then adds that word to the generated text. Then, in the same way, it picks a triplet that starts with the second and third words in the generated text, and that gives a fourth word. It adds the fourth word, then repeats with the third and fourth words, and so on. This algorithm is called a third-order Markov chain (because it uses sequences of three words). == Examples == A classic example, from 1984, originally sent as a mail message, later posted to net.singles is reproduced here: >From mvs Fri Nov 16 17:11 EST 1984 remote from alice It looks like Reagan is going to say? Ummm... Oh yes, I was looking for. I'm so glad I remembered it. Yeah, what I have wondered if I had committed a crime. Don't eat with your assessment of Reagon and Mondale. Up your nose with a guy from a firm that specifically researches the teen-age market. As a friend of mine would say, "It really doesn't matter"... It looks like Reagan is holding back the arms of the American eating public have changed dramatically, and it got pretty boring after about 300 games. People, having a much larger number of varieties, and are very different from what one can find in Chinatowns across the country (things like pork buns, steamed dumplings, etc.) They can be cheap, being sold for around 30 to 75 cents apiece (depending on size), are generally not greasy, can be adequately explained by stupidity. Singles have felt insecure since we came down from the Conservative world at large. But Chuqui is the way it happened and the prices are VERY reasonable. Can anyone think of myself as a third sex. Yes, I am expected to have. People often get used to me knowing these things and then a cover is placed over all of them. Along the side of the $$ are spent by (or at least for ) the girls. You can't settle the issue. It seems I've forgotten what it is, but I don't. I know about violence against women, and I really doubt they will ever join together into a large number of jokes. It showed Adam, just after being created. He has a modem and an autodial routine. He calls my number 1440 times a day. So I will conclude by saying that I can well understand that she might soon have the time, it makes sense, again, to get the gist of my argument, I was in that (though it's a Republican administration). _-_-_-_-Mark Other quotations from Mark's Usenet posts are: "I spent an interesting evening recently with a grain of salt." (Alternatively reported as "While at a conference a few weeks back, I spent an interesting evening with a grain of salt.") "I hope that there are sour apples in every bushel." (see also sour grapes) == History == In The Usenet Handbook Mark Harrison writes that after September 1981, students joined Usenet en masse, "creating the USENET we know today: endless dumb questions, endless idiots posing as savants, and (of course) endless victims for practical jokes." In December, Rob Pike created the netnews group net.suicide as prank, "a forum for bad jokes". Some users thought it was a legitimate forum, some discussed "riding motorcycles without helmets". At first, most posters were "real people", but soon "characters" began posting. Pike created a "vicious" character named Bimmler. At its peak, net.suicide had ten frequent posters; nine were "known to be characters." But ultimately, Pike deleted the newsgroup because it was too much work to maintain; Bimmler messages were created "by hand". The "obvious alternative" was software, running on a Bell Labs computer created by Bruce Ellis, based on the Markov code by Don Mitchell, which became the online character Mark V. Shaney. Kernighan and Pike listed Mark V. Shaney in the acknowledgements in The Practice of Programming, noting its roots in Mitchell's markov, which, adapted as shaney, was used for "humorous deconstructionist activities" in the 1980s. Dewdney pointed out "perhaps Mark V. Shaney's magnum opus: a 20-page commentary on the deconstructionist philosophy of Jean Baudrillard" directed by Pike, with assistance from Henry S. Baird and Catherine Richards, to be distributed by email. The piece was based on Jean Baudrillard's "The Precession of Simulacra", published in Simulacra and Simulation (1981). == Reception == The program was discussed by A. K. Dewdney in the Scientific American "Computer Recreations" column in 1989, by Penn Jillette in his PC Computing column in 1991, and in several books, including the Usenet Handbook, Bots: the Origin of New Species, Hippo Eats Dwarf: A Field Guide to Hoaxes and Other B.S., and non-computer-related journals such as Texas Studies in Literature and Language. Dewdney wrote about the program's output, "The overall impression is not unlike what remains in the brain of an inattentive student after a late-night study session. Indeed, after reading the output of Mark V. Shaney, I find ordinary writing almost equally strange and incomprehensible!" He noted the reactions of newsgroup users, who have "shuddered at Mark V. Shaney's reflections, some with rage and others with laughter:" The opinions of the new net.singles correspondent drew mixed reviews. Serious users of the bulletin board's services sensed satire. Outraged, they urged that someone "pull the plug" on Mark V. Shaney's monstrous rantings. Others inquired almost admiringly whether the program was a secret artificial intelligence project that was being tested in a human conversational environment. A few may even have thought that Mark V. Shaney was a real person, a tortured schizophrenic desperately seeking a like-minded companion. Concluding, Dewdney wrote, "If the purpose of computer prose is to fool people into thinking that it was written by a sane person, Mark V. Shaney probably falls short." A 2012 article in Observer compared Mark V. Shaney's "strangely beautiful" postings to the Horse_ebooks account on Twitter and music reviews at Pitchfork, saying that "this mash-up of gibberish and human sentiment" is what "made Mark V. Shaney so endlessly fascinating".

    Read more →
  • Undeniable signature

    Undeniable signature

    An undeniable signature is a digital signature scheme which allows the signer to be selective to whom they allow to verify signatures. The scheme adds explicit signature repudiation, preventing a signer later refusing to verify a signature by omission; a situation that would devalue the signature in the eyes of the verifier. It was invented by David Chaum and Hans van Antwerpen in 1989. == Overview == In this scheme, a signer possessing a private key can publish a signature of a message. However, the signature reveals nothing to a recipient/verifier of the message and signature without taking part in either of two interactive protocols: Confirmation protocol, which confirms that a candidate is a valid signature of the message issued by the signer, identified by the public key. Disavowal protocol, which confirms that a candidate is not a valid signature of the message issued by the signer. The motivation for the scheme is to allow the signer to choose to whom signatures are verified. However, that the signer might claim the signature is invalid at any later point, by refusing to take part in verification, would devalue signatures to verifiers. The disavowal protocol distinguishes these cases removing the signer's plausible deniability. It is important that the confirmation and disavowal exchanges are not transferable. They achieve this by having the property of zero-knowledge; both parties can create transcripts of both confirmation and disavowal that are indistinguishable, to a third-party, of correct exchanges. The designated verifier signature scheme improves upon deniable signatures by allowing, for each signature, the interactive portion of the scheme to be offloaded onto another party, a designated verifier, reducing the burden on the signer. == Zero-knowledge protocol == The following protocol was suggested by David Chaum. A group, G, is chosen in which the discrete logarithm problem is intractable, and all operation in the scheme take place in this group. Commonly, this will be the finite cyclic group of order p contained in Z/nZ, with p being a large prime number; this group is equipped with the group operation of integer multiplication modulo n. An arbitrary primitive element (or generator), g, of G is chosen; computed powers of g then combine obeying fixed axioms. Alice generates a key pair, randomly chooses a private key, x, and then derives and publishes the public key, y = gx. === Message signing === Alice signs the message, m, by computing and publishing the signature, z = mx. === Confirmation (i.e., avowal) protocol === Bob wishes to verify the signature, z, of m by Alice under the key, y. Bob picks two random numbers: a and b, and uses them to blind the message, sending to Alice: c = magb. Alice picks a random number, q, uses it to blind, c, and then signing this using her private key, x, sending to Bob: s1 = cgq ands2 = s1x. Note that s1x = (cgq)x = (magb)xgqx = (mx)a(gx)b+q = zayb+q. Bob reveals a and b. Alice verifies that a and b are the correct blind values, then, if so, reveals q. Revealing these blinds makes the exchange zero knowledge. Bob verifies s1 = cgq, proving q has not been chosen dishonestly, and s2 = zayb+q, proving z is valid signature issued by Alice's key. Note that zayb+q = (mx)a(gx)b+q. Alice can cheat at step 2 by attempting to randomly guess s2. === Disavowal protocol === Alice wishes to convince Bob that z is not a valid signature of m under the key, gx; i.e., z ≠ mx. Alice and Bob have agreed an integer, k, which sets the computational burden on Alice and the likelihood that she should succeed by chance. Bob picks random values, s ∈ {0, 1, ..., k} and a, and sends: v1 = msga and v2 = zsya, where exponentiating by a is used to blind the sent values. Note that v2 = zsya = (mx)s(gx)a = v1x. Alice, using her private key, computes v1x and then the quotient, v1xv2−1 = (msga)x(zsgxa)−1 = msxz−s = (mxz−1)s. Thus, v1xv2−1 = 1, unless z ≠ mx. Alice then tests v1xv2−1 for equality against the values: (mxz−1)i for i ∈ {0, 1, …, k}; which are calculated by repeated multiplication of mxz−1 (rather than exponentiating for each i). If the test succeeds, Alice conjectures the relevant i to be s; otherwise, she conjectures random value. Where z = mx, (mxz−1)i = v1xv2−1 = 1 for all i, s is unrecoverable. Alice commits to i: she picks a random r and sends hash(r, i) to Bob. Bob reveals a. Alice confirms that a is the correct blind (i.e., v1 and v2 can be generated using it), then, if so, reveals r. Revealing these blinds makes the exchange zero knowledge. Bob checks hash(r, i) = hash(r, s), proving Alice knows s, hence z ≠ mx. If Alice attempts to cheat at step 3 by guessing s at random, the probability of succeeding is 1/(k + 1). So, if k = 1023 and the protocol is conducted ten times, her chances are 1 to 2100.

    Read more →
  • Campus network

    Campus network

    A campus network, campus area network, corporate area network or CAN is a computer network made up of an interconnection of local area networks (LANs) within a limited geographical area. The networking equipments (switches, routers) and transmission media (optical fiber, copper plant, Cat5 cabling etc.) are almost entirely owned by the campus tenant / owner: an enterprise, university, government etc. A campus area network is larger than a local area network but smaller than a metropolitan area network (MAN) or wide area network (WAN). == University campuses == College or university campus area networks often interconnect a variety of buildings, including administrative buildings, academic buildings, laboratories, university libraries, or student centers, residence halls, gymnasiums, and other outlying structures, like conference centers, technology centers, and training institutes. Early examples include the Stanford University Network at Stanford University, Project Athena at MIT, and the Andrew Project at Carnegie Mellon University. == Corporate campuses == Much like a university campus network, a corporate campus network serves to connect buildings. Examples of such are the networks at Googleplex and Microsoft's campus. Campus networks are normally interconnected with high speed Ethernet links operating over optical fiber such as gigabit Ethernet and 10 Gigabit Ethernet. == Area range == The range of CAN is 1 to 5 km (1 to 3 mi). If two buildings have the same domain and they are connected with a network, then it will be considered as CAN only. Though the CAN is mainly used for corporate campuses so the link will be high speed.

    Read more →
  • Media evaluation

    Media evaluation

    Media evaluation is a discipline of the external and logical social sciences and centres on the analysis of media content, rating the exposure using a number of pre-designated criteria commonly including tonal value and presence of key messages. It is said to be one of the fastest-growing areas of mass communications research. The International Association for Measurement and Evaluation of Communication (AMEC) is the industry-appointed trade body for companies and individuals involved in research, measurement, and evaluation in editorial media coverage and related communications issues. To be a full member of AMEC, companies must be able to: a) offer comprehensive media evaluation, research, and interpretation services, b) have been in business for at least two years, and c) have a media evaluation turnover of more than £150,000 when applying. In addition, all companies abide by a strict code of ethics and must implement tight quality control procedures. These requirements guarantee that all media evaluation services provided are of the highest caliber. The Commission on Public Relations Measurement & Evaluation is a different organization that was established in 1998 under the direction of the Institute for Public Relations. The Commission's main functions are to set standards and procedures for research and measurement in public relations and to publish authoritative white papers on best practices.

    Read more →
  • Algorithmic inference

    Algorithmic inference

    Algorithmic inference gathers new developments in the statistical inference methods made feasible by the powerful computing devices widely available to any data analyst. Cornerstones in this field are computational learning theory, granular computing, bioinformatics, and, long ago, structural probability (Fraser 1966). The main focus is on the algorithms which compute statistics rooting the study of a random phenomenon, along with the amount of data they must feed on to produce reliable results. This shifts the interest of mathematicians from the study of the distribution laws to the functional properties of the statistics, and the interest of computer scientists from the algorithms for processing data to the information they process. == The Fisher parametric inference problem == Concerning the identification of the parameters of a distribution law, the mature reader may recall lengthy disputes in the mid 20th century about the interpretation of their variability in terms of fiducial distribution (Fisher 1956), structural probabilities (Fraser 1966), priors/posteriors (Ramsey 1925), and so on. From an epistemology viewpoint, this entailed a companion dispute as to the nature of probability: is it a physical feature of phenomena to be described through random variables or a way of synthesizing data about a phenomenon? Opting for the latter, Fisher defines a fiducial distribution law of parameters of a given random variable that he deduces from a sample of its specifications. With this law he computes, for instance "the probability that μ (mean of a Gaussian variable – omeur note) is less than any assigned value, or the probability that it lies between any assigned values, or, in short, its probability distribution, in the light of the sample observed". == The classic solution == Fisher fought hard to defend the difference and superiority of his notion of parameter distribution in comparison to analogous notions, such as Bayes' posterior distribution, Fraser's constructive probability and Neyman's confidence intervals. For half a century, Neyman's confidence intervals won out for all practical purposes, crediting the phenomenological nature of probability. With this perspective, when you deal with a Gaussian variable, its mean μ is fixed by the physical features of the phenomenon you are observing, where the observations are random operators, hence the observed values are specifications of a random sample. Because of their randomness, you may compute from the sample specific intervals containing the fixed μ with a given probability that you denote confidence. === Example === Let X be a Gaussian variable with parameters μ {\displaystyle \mu } and σ 2 {\displaystyle \sigma ^{2}} and { X 1 , … , X m } {\displaystyle \{X_{1},\ldots ,X_{m}\}} a sample drawn from it. Working with statistics S μ = ∑ i = 1 m X i {\displaystyle S_{\mu }=\sum _{i=1}^{m}X_{i}} and S σ 2 = ∑ i = 1 m ( X i − X ¯ ) 2 , where X ¯ = S μ m {\displaystyle S_{\sigma ^{2}}=\sum _{i=1}^{m}(X_{i}-{\overline {X}})^{2},{\text{ where }}{\overline {X}}={\frac {S_{\mu }}{m}}} is the sample mean, we recognize that T = S μ − m μ S σ 2 m − 1 m = X ¯ − μ S σ 2 / ( m ( m − 1 ) ) {\displaystyle T={\frac {S_{\mu }-m\mu }{\sqrt {S_{\sigma ^{2}}}}}{\sqrt {\frac {m-1}{m}}}={\frac {{\overline {X}}-\mu }{\sqrt {S_{\sigma ^{2}}/(m(m-1))}}}} follows a Student's t distribution (Wilks 1962) with parameter (degrees of freedom) m − 1, so that f T ( t ) = Γ ( m / 2 ) Γ ( ( m − 1 ) / 2 ) 1 π ( m − 1 ) ( 1 + t 2 m − 1 ) m / 2 . {\displaystyle f_{T}(t)={\frac {\Gamma (m/2)}{\Gamma ((m-1)/2)}}{\frac {1}{\sqrt {\pi (m-1)}}}\left(1+{\frac {t^{2}}{m-1}}\right)^{m/2}.} Gauging T between two quantiles and inverting its expression as a function of μ {\displaystyle \mu } you obtain confidence intervals for μ {\displaystyle \mu } . With the sample specification: x = { 7.14 , 6.3 , 3.9 , 6.46 , 0.2 , 2.94 , 4.14 , 4.69 , 6.02 , 1.58 } {\displaystyle \mathbf {x} =\{7.14,6.3,3.9,6.46,0.2,2.94,4.14,4.69,6.02,1.58\}} having size m = 10, you compute the statistics s μ = 43.37 {\displaystyle s_{\mu }=43.37} and s σ 2 = 46.07 {\displaystyle s_{\sigma ^{2}}=46.07} , and obtain a 0.90 confidence interval for μ {\displaystyle \mu } with extremes (3.03, 5.65). == Inferring functions with the help of a computer == From a modeling perspective the entire dispute looks like a chicken-egg dilemma: either fixed data by first and probability distribution of their properties as a consequence, or fixed properties by first and probability distribution of the observed data as a corollary. The classic solution has one benefit and one drawback. The former was appreciated particularly back when people still did computations with sheet and pencil. Per se, the task of computing a Neyman confidence interval for the fixed parameter θ is hard: you do not know θ, but you look for disposing around it an interval with a possibly very low probability of failing. The analytical solution is allowed for a very limited number of theoretical cases. Vice versa a large variety of instances may be quickly solved in an approximate way via the central limit theorem in terms of confidence interval around a Gaussian distribution – that's the benefit. The drawback is that the central limit theorem is applicable when the sample size is sufficiently large. Therefore, it is less and less applicable with the sample involved in modern inference instances. The fault is not in the sample size on its own part. Rather, this size is not sufficiently large because of the complexity of the inference problem. With the availability of large computing facilities, scientists refocused from isolated parameters inference to complex functions inference, i.e. re sets of highly nested parameters identifying functions. In these cases we speak about learning of functions (in terms for instance of regression, neuro-fuzzy system or computational learning) on the basis of highly informative samples. A first effect of having a complex structure linking data is the reduction of the number of sample degrees of freedom, i.e. the burning of a part of sample points, so that the effective sample size to be considered in the central limit theorem is too small. Focusing on the sample size ensuring a limited learning error with a given confidence level, the consequence is that the lower bound on this size grows with complexity indices such as VC dimension or detail of a class to which the function we want to learn belongs. === Example === A sample of 1,000 independent bits is enough to ensure an absolute error of at most 0.081 on the estimation of the parameter p of the underlying Bernoulli variable with a confidence of at least 0.99. The same size cannot guarantee a threshold less than 0.088 with the same confidence 0.99 when the error is identified with the probability that a 20-year-old man living in New York does not fit the ranges of height, weight and waistline observed on 1,000 Big Apple inhabitants. The accuracy shortage occurs because both the VC dimension and the detail of the class of parallelepipeds, among which the one observed from the 1,000 inhabitants' ranges falls, are equal to 6. == The general inversion problem solving the Fisher question == With insufficiently large samples, the approach: fixed sample – random properties suggests inference procedures in three steps: === Definition === For a random variable and a sample drawn from it a compatible distribution is a distribution having the same sampling mechanism M X = ( Z , g θ ) {\displaystyle {\mathcal {M}}_{X}=(Z,g_{\boldsymbol {\theta }})} of X with a value θ {\displaystyle {\boldsymbol {\theta }}} of the random parameter Θ {\displaystyle \mathbf {\Theta } } derived from a master equation rooted on a well-behaved statistic s. === Example === You may find the distribution law of the Pareto parameters A and K as an implementation example of the population bootstrap method as in the figure on the left. Implementing the twisting argument method, you get the distribution law F M ( μ ) {\displaystyle F_{M}(\mu )} of the mean M of a Gaussian variable X on the basis of the statistic s M = ∑ i = 1 m x i {\textstyle s_{M}=\sum _{i=1}^{m}x_{i}} when Σ 2 {\displaystyle \Sigma ^{2}} is known to be equal to σ 2 {\displaystyle \sigma ^{2}} (Apolloni, Malchiodi & Gaito 2006). Its expression is: F M ( μ ) = Φ ( m μ − s M σ m ) , {\displaystyle F_{M}(\mu )=\Phi {\left({\frac {m\mu -s_{M}}{\sigma {\sqrt {m}}}}\right)},} shown in the figure on the right, where Φ {\displaystyle \Phi } is the cumulative distribution function of a standard normal distribution. Computing a confidence interval for M given its distribution function is straightforward: we need only find two quantiles (for instance δ / 2 {\displaystyle \delta /2} and 1 − δ / 2 {\displaystyle 1-\delta /2} quantiles in case we are interested in a confidence interval of level δ symmetric in the tail's probabilities) as indicated on the left in the diagram showing the behavior of

    Read more →
  • Social business model

    Social business model

    The social business model is use of social media tools and social networking behavioral standards by businesses for communication with customers, suppliers, and others. Combining social networking etiquette (being helpful, transparent and authentic) with business engagement on LinkedIn (for one-to-one interaction), Twitter (for immediacy) and Facebook (for content sharing) more fully involves employees in the organization and increases customer intimacy and trust. == Overview == Traditional business models, particularly in large organizations, have had as one common characteristic careful limitation of direct contact between those within the organization and those outside of it. Only certain specific individuals (most frequently in roles such as sales, customer service and field consulting) were designated as "customer-facing" personnel. Organizations further limited outside access to internal employees through filtering mechanisms such as publishing only a main switchboard number (whether routed through a live receptionist or an interactive voice response system) and generic "sales@" or "info@" email addresses. The Cluetrain Manifesto (written by Rick Levine, Christopher Locke, Doc Searls, and David Weinberger and published in 1999) was among the first books to predict the demise of this old order and the emergence of more open business models, though most of the business world was slow to adopt the book's recommended cultural changes. Thirteen years later, authors Dion Hinchcliffe and Peter Kim added structural underpinnings to the cultural shifts outlined in The Cluetrain Manifesto in their book, Social Business by Design. The book details many of the ways social media tools and practices are being adopted within organizations, to support both internal employee collaboration and external customer engagement (which the authors describe as the "bigger problem"). == Elements == In implementing the social business model, organizations apply social networking protocols and tools in a range of areas, potentially including: Marketing Customer Support Recruiting Crowdsourcing Internal employee collaboration Sales Product Development Supply Chain Operations Investor Relations == Characteristics of organizations adopting the social business model == Organizations that fully adopt the social business model will exhibit four key characteristics: Connected – employees will be able to seamlessly engage one-on-one in real-time with other employees and individuals outside the organization (customers, prospects, partners, media, etc.) using a variety of communications methods including text chat, voice, file sharing, email, and video chat. Social – employees will follow social networking etiquette (being authentic, helpful and transparent) in external interactions. The focus will be on answering questions and providing information rather than overt sales or promotion. Presence – these conversations may originate on the company's website or elsewhere online (e.g., publication websites, industry portals, or social networking sites such as LinkedIn or Facebook). Intelligent – organizations will use in-depth analytics to monitor connections, social interactions and presence; measure corresponding business results; and continually adjust and improve practices for increased effectiveness. == Technical and functional requirements == While much of the change inherent in adopting the social business model is cultural, it also requires process changes enabled by social business technology. Functional requirements for a social business technology platform include: Analytics (including the cost of engagement as well as various measures of return on investment such as leads, sales, referrals, recommendations, and retained customers). Integration with other social media and business tools such as CRM systems, partner relationship management (PRM) software, product development, website analytics, and employee-recruiting applications. Rules-based workflow (e.g. routing a comment to the appropriate individual for a response, based on content). Geolocation (so customers or prospects can be automatically routed to local sales or customer service representatives). Content sharing. Collaboration tools. Transparency (i.e., people should know who they are engaging with) Unified communications (the ability to engage via voice, text, video, email, and share a wide variety of file types) Storage (the ability to store interactions for legal, training, compliance or compensation purposes, and purge stored data when no longer needed based on company policy or regulatory requirements). Immediacy (real-time monitoring and response).

    Read more →
  • Chaffing and winnowing

    Chaffing and winnowing

    Chaffing and winnowing is a cryptographic technique to achieve confidentiality without using encryption when sending data over an insecure channel. The name is derived from agriculture: after grain has been harvested and threshed, it remains mixed together with inedible fibrous chaff. The chaff and grain are then separated by winnowing, and the chaff is discarded. The cryptographic technique was conceived by Ron Rivest and published in an on-line article on 18 March 1998. Although it bears similarities to both traditional encryption and steganography, it cannot be classified under either category. This technique allows the sender to deny responsibility for encrypting their message. When using chaffing and winnowing, the sender transmits the message unencrypted, in clear text. Although the sender and the receiver share a secret key, they use it only for authentication. However, a third party can make their communication confidential by simultaneously sending specially crafted messages through the same channel. == How it works == The sender (Alice) wants to send a message to the receiver (Bob). In the simplest setup, Alice enumerates the symbols in her message and sends out each in a separate packet. If the symbols are complex enough, such as natural-language text, an attacker may be able to distinguish the real symbols from poorly faked chaff symbols, posing a similar problem as steganography in needing to generate highly realistic fakes; to avoid this, the symbols can be reduced to just single 0/1 bits, and realistic fakes can then be simply randomly generated 50:50 and are indistinguishable from real symbols. In general, the method requires each symbol to arrive in-order and to be authenticated by the receiver. When implemented over networks that may change the order of packets, the sender places the symbol's serial number in the packet, the symbol itself (both unencrypted), and a message authentication code (MAC). Many MACs use a secret key Alice shares with Bob, but it is sufficient that the receiver has a method to authenticate the packets. Rivest notes an interesting property of chaffing-and-winnowing is that third parties (such as an ISP) can opportunistically add it to communications without needing permission or coordination with the sender/recipient. A third-party (Charles) who transmits Alice's packets to Bob, interleaves the packets with corresponding bogus packets (called "chaff") with corresponding serial numbers, arbitrary symbols, and a random number in place of the MAC. Charles does not need to know the key to do that (real MACs are large enough that it is extremely unlikely to generate a valid one by chance, unlike in the example). Bob uses the MAC to find the authentic messages and drops the "chaff" messages. This process is called "winnowing". An eavesdropper located between Alice and Charles can easily read Alice's message. But an eavesdropper between Charles and Bob would have to tell which packets are bogus and which are real (i.e. to winnow, or "separate the wheat from the chaff"). That is infeasible if the MAC used is secure and Charles does not leak any information on packet authenticity (e.g. via timing). If a fourth party joins the example (named Darth) who wants to send counterfeit messages to impersonate Alice, it would require Alice to disclose her secret key. If Darth cannot force Alice to disclose an authentication key (the knowledge of which would enable him to forge messages from Alice), then her messages will remain confidential. Charles, on the other hand, is no target of Darth's at all, since Charles does not even possess any secret keys that could be disclosed. == Variations == The simple variant of the chaffing and winnowing technique described above adds many bits of overhead per bit of original message. To make the transmission more efficient, Alice can process her message with an all-or-nothing transform and then send it out in much larger chunks. The chaff packets will have to be modified accordingly. Because the original message can be reconstructed only by knowing all of its chunks, Charles needs to send only enough chaff packets to make finding the correct combination of packets computationally infeasible. Chaffing and winnowing lends itself especially well to use in packet-switched network environments such as the Internet, where each message (whose payload is typically small) is sent in a separate network packet. In another variant of the technique, Charles carefully interleaves packets coming from multiple senders. That eliminates the need for Charles to generate and inject bogus packets in the communication. However, the text of Alice's message cannot be well protected from other parties who are communicating via Charles at the same time. This variant also helps protect against information leakage and traffic analysis. == Implications for law enforcement == Ron Rivest suggests that laws related to cryptography, including export controls, would not apply to chaffing and winnowing because it does not employ any encryption at all. The power to authenticate is in many cases the power to control, and handing all authentication power to the government is beyond all reason The author of the paper proposes that the security implications of handing everyone's authentication keys to the government for law-enforcement purposes would be far too risky, since possession of the key would enable someone to masquerade and communicate as another entity, such as an airline controller. Furthermore, Ron Rivest contemplates the possibility of rogue law enforcement officials framing up innocent parties by introducing the chaff into their communications, concluding that drafting a law restricting chaffing and winnowing would be far too difficult. == Trivia == The term winnowing was suggested by Ronald Rivest's father. Before the publication of Rivest's paper in 1998 other people brought to his attention a 1965 novel, Rex Stout's The Doorbell Rang, which describes the same concept and was thus included in the paper's references.

    Read more →
  • Bitmap index

    Bitmap index

    A bitmap index is a special kind of database index that uses bitmaps. Bitmap indexes have traditionally been considered to work well for low-cardinality columns, which have a modest number of distinct values, either absolutely, or relative to the number of records that contain the data. The extreme case of low cardinality is Boolean data (e.g., does a resident in a city have internet access?), which has two values, True and False. Bitmap indexes use bit arrays (commonly called bitmaps) and answer queries by performing bitwise logical operations on these bitmaps. Bitmap indexes have a significant space and performance advantage over other structures for query of such data. Their drawback is they are less efficient than the traditional B-tree indexes for columns whose data is frequently updated: consequently, they are more often employed in read-only systems that are specialized for fast query - e.g., data warehouses, and generally unsuitable for online transaction processing applications. Some researchers argue that bitmap indexes are also useful for moderate or even high-cardinality data (e.g., unique-valued data) which is accessed in a read-only manner, and queries access multiple bitmap-indexed columns using the AND, OR or XOR operators extensively. Bitmap indexes are also useful in data warehousing applications for joining a large fact table to smaller dimension tables such as those arranged in a star schema. == Example == Continuing the internet access example, a bitmap index may be logically viewed as follows: On the left, Identifier refers to the unique number assigned to each resident, HasInternet is the data to be indexed, the content of the bitmap index is shown as two columns under the heading bitmaps. Each column in the left illustration under the Bitmaps header is a bitmap in the bitmap index. In this case, there are two such bitmaps, one for "has internet" Yes and one for "has internet" No. It is easy to see that each bit in bitmap Y shows whether a particular row refers to a person who has internet access. This is the simplest form of bitmap index. Most columns will have more distinct values. For example, the sales amount is likely to have a much larger number of distinct values. Variations on the bitmap index can effectively index this data as well. We briefly review three such variations. Note: Many of the references cited here are reviewed at (John Wu (2007)). For those who might be interested in experimenting with some of the ideas mentioned here, many of them are implemented in open source software such as FastBit, the Lemur Bitmap Index C++ Library, the Roaring Bitmap Java library and the Apache Hive Data Warehouse system. == Compression == For historical reasons, bitmap compression and inverted list compression were developed as separate lines of research, and only later were recognized as solving essentially the same problem. Software can compress each bitmap in a bitmap index to save space. There has been considerable amount of work on this subject. Though there are exceptions such as Roaring bitmaps, Bitmap compression algorithms typically employ run-length encoding, such as the Byte-aligned Bitmap Code, the Word-Aligned Hybrid code, the Partitioned Word-Aligned Hybrid (PWAH) compression, the Position List Word Aligned Hybrid, the Compressed Adaptive Index (COMPAX), Enhanced Word-Aligned Hybrid (EWAH) and the COmpressed 'N' Composable Integer SEt (CONCISE). These compression methods require very little effort to compress and decompress. More importantly, bitmaps compressed with BBC, WAH, COMPAX, PLWAH, EWAH and CONCISE can directly participate in bitwise operations without decompression. This gives them considerable advantages over generic compression techniques such as LZ77. BBC compression and its derivatives are used in a commercial database management system. BBC is effective in both reducing index sizes and maintaining query performance. BBC encodes the bitmaps in bytes, while WAH encodes in words, better matching current CPUs. "On both synthetic data and real application data, the new word aligned schemes use only 50% more space, but perform logical operations on compressed data 12 times faster than BBC." PLWAH bitmaps were reported to take 50% of the storage space consumed by WAH bitmaps and offer up to 20% faster performance on logical operations. Similar considerations can be done for CONCISE and Enhanced Word-Aligned Hybrid. The performance of schemes such as BBC, WAH, PLWAH, EWAH, COMPAX and CONCISE is dependent on the order of the rows. A simple lexicographical sort can divide the index size by 9 and make indexes several times faster. The larger the table, the more important it is to sort the rows. Reshuffling techniques have also been proposed to achieve the same results of sorting when indexing streaming data. == Encoding == Basic bitmap indexes use one bitmap for each distinct value. It is possible to reduce the number of bitmaps used by using a different encoding method. For example, it is possible to encode C distinct values using log(C) bitmaps with binary encoding. This reduces the number of bitmaps, further saving space, but to answer any query, most of the bitmaps have to be accessed. This makes it potentially not as effective as scanning a vertical projection of the base data, also known as a materialized view or projection index. Finding the optimal encoding method that balances (arbitrary) query performance, index size and index maintenance remains a challenge. Without considering compression, Chan and Ioannidis analyzed a class of multi-component encoding methods and came to the conclusion that two-component encoding sits at the kink of the performance vs. index size curve and therefore represents the best trade-off between index size and query performance. == Binning == For high-cardinality columns, it is useful to bin the values, where each bin covers multiple values and build the bitmaps to represent the values in each bin. This approach reduces the number of bitmaps used regardless of encoding method. However, binned indexes can only answer some queries without examining the base data. For example, if a bin covers the range from 0.1 to 0.2, then when the user asks for all values less than 0.15, all rows that fall in the bin are possible hits and have to be checked to verify whether they are actually less than 0.15. The process of checking the base data is known as the candidate check. In most cases, the time used by the candidate check is significantly longer than the time needed to work with the bitmap index. Therefore, binned indexes exhibit irregular performance. They can be very fast for some queries, but much slower if the query does not exactly match a bin. == History == The concept of bitmap index was first introduced by Professor Israel Spiegler and Rafi Maayan in their research "Storage and Retrieval Considerations of Binary Data Bases", published in 1985. The first commercial database product to implement a bitmap index was Computer Corporation of America's Model 204. Patrick O'Neil published a paper about this implementation in 1987. This implementation is a hybrid between the basic bitmap index (without compression) and the list of Row Identifiers (RID-list). Overall, the index is organized as a B+tree. When the column cardinality is low, each leaf node of the B-tree would contain long list of RIDs. In this case, it requires less space to represent the RID-lists as bitmaps. Since each bitmap represents one distinct value, this is the basic bitmap index. As the column cardinality increases, each bitmap becomes sparse and it may take more disk space to store the bitmaps than to store the same content as RID-lists. In this case, it switches to use the RID-lists, which makes it a B+tree index. == In-memory bitmaps == One of the strongest reasons for using bitmap indexes is that the intermediate results produced from them are also bitmaps and can be efficiently reused in further operations to answer more complex queries. Many programming languages support this as a bit array data structure. For example, Java has the BitSet class and .NET have the BitArray class. Some database systems that do not offer persistent bitmap indexes use bitmaps internally to speed up query processing. For example, PostgreSQL versions 8.1 and later implement a "bitmap index scan" optimization to speed up arbitrarily complex logical operations between available indexes on a single table. For tables with many columns, the total number of distinct indexes to satisfy all possible queries (with equality filtering conditions on either of the fields) grows very fast, being defined by this formula: C n [ n 2 ] ≡ n ! ( n − [ n 2 ] ) ! [ n 2 ] ! {\displaystyle \mathbf {C} _{n}^{\left[{\frac {n}{2}}\right]}\equiv {\frac {n!}{\left(n-\left[{\frac {n}{2}}\right]\right)!\left[{\frac {n}{2}}\right]!}}} . A bitmap index scan combines expressions on different indexes, thus requiring only one index per column t

    Read more →
  • Emergent algorithm

    Emergent algorithm

    An emergent algorithm is an algorithm that exhibits emergent behavior. In essence an emergent algorithm implements a set of simple building block behaviors that when combined exhibit more complex behaviors. One example of this is the implementation of fuzzy motion controllers used to adapt robot movement in response to environmental obstacles. An emergent algorithm has the following characteristics: it achieves predictable global effects it does not require global visibility it does not assume any kind of centralized control it is self-stabilizing Other examples of emergent algorithms and models include cellular automata, artificial neural networks and swarm intelligence systems (ant colony optimization, bees algorithm, etc.).

    Read more →
  • Polygraphic substitution

    Polygraphic substitution

    Polygraphic substitution is a substitution cipher in which a uniform substitution is performed on blocks of letters. When the length of the block is specifically known, more precise terms are used: for instance, a cipher in which pairs of letters are substituted is bigraphic. As a concept, polygraphic substitution contrasts with monoalphabetic (or simple) substitutions in which individual letters are uniformly substituted, or polyalphabetic substitutions in which individual letters are substituted in different ways depending on their position in the text. In theory, there is some overlap in these definitions; one could conceivably consider a Vigenère cipher with an eight-letter key to be an octographic substitution. In practice, this is not a useful observation since it is far more fruitful to consider it to be a polyalphabetic substitution cipher. == Specific ciphers == In 1563, Giambattista della Porta devised the first bigraphic substitution. However, it was nothing more than a matrix of symbols. In practice, it would have been all but impossible to memorize, and carrying around the table would lead to risks of falling into enemy hands. In 1854, Charles Wheatstone came up with the Playfair cipher, a keyword-based system that could be performed on paper in the field. This was followed up over the next fifty years with the closely related four-square and two-square ciphers, which are slightly more cumbersome but offer slightly better security. In 1929, Lester S. Hill developed the Hill cipher, which uses matrix algebra to encrypt blocks of any desired length. However, encryption is very difficult to perform by hand for any sufficiently large block size, although it has been implemented by machine or computer. This is therefore on the frontier between classical and modern cryptography. == Cryptanalysis of general polygraphic substitutions == Polygraphic systems do provide a significant improvement in security over monoalphabetic substitutions. Given an individual letter 'E' in a message, it could be encrypted using any of 52 instructions depending on its location and neighbors, which can be used to great advantage to mask the frequency of individual letters. However, the security boost is limited; while it generally requires a larger sample of text to crack, it can still be done by hand. One can identify a polygraphically-encrypted text by performing a frequency chart of polygrams and not merely of individual letters. These can be compared to the frequency of plaintext English. The distribution of digrams is even more stark than individual letters. For example, the six most common letters in English (23%) represent approximately half of English plaintext, but it takes only the most frequent 8% of the 676 digrams to achieve the same potency. In addition, even in a plaintext many thousands of characters long, one would expect that nearly half of the digrams would not occur, or only barely. In addition, looking over the text one would expect to see a fairly regular scattering of repeated text in multiples of the block length and relatively few that are not multiples. Cracking a code identified as polygraphic is similar to cracking a general monoalphabetic substitution except with a larger 'alphabet'. One identifies the most frequent polygrams, experiments with replacing them with common plaintext polygrams, and attempts to build up common words, phrases, and finally meaning. Naturally, if the investigation led the cryptanalyst to suspect that a code was of a specific type, like a Playfair or order-2 Hill cipher, then they could use a more specific attack.

    Read more →
  • Campus network

    Campus network

    A campus network, campus area network, corporate area network or CAN is a computer network made up of an interconnection of local area networks (LANs) within a limited geographical area. The networking equipments (switches, routers) and transmission media (optical fiber, copper plant, Cat5 cabling etc.) are almost entirely owned by the campus tenant / owner: an enterprise, university, government etc. A campus area network is larger than a local area network but smaller than a metropolitan area network (MAN) or wide area network (WAN). == University campuses == College or university campus area networks often interconnect a variety of buildings, including administrative buildings, academic buildings, laboratories, university libraries, or student centers, residence halls, gymnasiums, and other outlying structures, like conference centers, technology centers, and training institutes. Early examples include the Stanford University Network at Stanford University, Project Athena at MIT, and the Andrew Project at Carnegie Mellon University. == Corporate campuses == Much like a university campus network, a corporate campus network serves to connect buildings. Examples of such are the networks at Googleplex and Microsoft's campus. Campus networks are normally interconnected with high speed Ethernet links operating over optical fiber such as gigabit Ethernet and 10 Gigabit Ethernet. == Area range == The range of CAN is 1 to 5 km (1 to 3 mi). If two buildings have the same domain and they are connected with a network, then it will be considered as CAN only. Though the CAN is mainly used for corporate campuses so the link will be high speed.

    Read more →