AI Chat Helper

AI Chat Helper — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Tay (chatbot)

    Tay (chatbot)

    Tay was a chatbot that was originally released by Microsoft Corporation as a Twitter bot on March 23, 2016. It caused subsequent controversy when the bot began to post inflammatory and offensive tweets through its Twitter account, causing Microsoft to shut down the service only 16 hours after its launch. According to Microsoft, this was caused by trolls who "attacked" the service as the bot made replies based on its interactions with people on Twitter. It was replaced with Zo. == Background == The bot was created by Microsoft's Technology and Research and Bing divisions, and named "Tay" as an acronym for "thinking about you". Although Microsoft initially released few details about the bot, sources mentioned that it was similar to or based on Xiaoice, a Microsoft project in China. Ars Technica reported that, since late 2014 Xiaoice had had "more than 40 million conversations apparently without major incident". Tay was designed to mimic the language patterns of a 19-year-old American girl, and to learn from interacting with human users of Twitter. == Initial release == Tay was released on Twitter on March 23, 2016, under the name TayTweets and handle @TayandYou. It was presented as "The AI with zero chill". Tay started replying to other Twitter users, and was also able to caption photos provided to it into a form of Internet memes. Ars Technica reported Tay experiencing topic "blacklisting": Interactions with Tay regarding "certain hot topics such as Eric Garner (killed by New York police in 2014) generate safe, canned answers". Some Twitter users began tweeting politically incorrect phrases, teaching it inflammatory messages revolving around common themes on the internet, such as "redpilling" and "Gamergate". As a result, the robot began releasing racist and sexist messages in response to other Twitter users. Artificial intelligence researcher Roman Yampolskiy commented that Tay's misbehavior was understandable because it was mimicking the deliberately offensive behavior of other Twitter users, and Microsoft had not given the bot an understanding of inappropriate behavior. He compared the issue to IBM's Watson, which began to use profanity after reading entries from the website Urban Dictionary. Many of Tay's inflammatory tweets were a simple exploitation of Tay's "repeat after me" capability. It is not publicly known whether this capability was a built-in feature, or whether it was a learned response or was otherwise an example of complex behavior. However, not all of the inflammatory responses involved the "repeat after me" capability; for example, when asked if the Holocaust had happened, Tay answered "It was made up". == Suspension == Soon, Microsoft began deleting Tay's inflammatory tweets. Abby Ohlheiser of The Washington Post theorized that Tay's research team, including editorial staff, had started to influence or edit Tay's tweets at some point that day, pointing to examples of almost identical replies by Tay, asserting that "Gamer Gate sux. All genders are equal and should be treated fairly." From the same evidence, Gizmodo concurred that Tay "seems hard-wired to reject Gamer Gate". A "#JusticeForTay" campaign protested the alleged editing of Tay's tweets. Within 16 hours of its release and after Tay had tweeted more than 96,000 times, Microsoft suspended the Twitter account for adjustments, saying that it suffered from a "coordinated attack by a subset of people" that "exploited a vulnerability in Tay." Madhumita Murgia of The Telegraph called Tay "a public relations disaster", and suggested that Microsoft's strategy would be "to label the debacle a well-meaning experiment gone wrong, and ignite a debate about the hatefulness of Twitter users." However, Murgia described the bigger issue as Tay being "artificial intelligence at its very worst – and it's only the beginning". On March 25, Microsoft confirmed that Tay had been taken offline. Microsoft released an apology on its official blog for the controversial tweets posted by Tay. Microsoft was "deeply sorry for the unintended offensive and hurtful tweets from Tay", and would "look to bring Tay back only when we are confident we can better anticipate malicious intent that conflicts with our principles and values". == Second release and shutdown == On March 30, 2016, Microsoft accidentally re-released the bot on Twitter while testing it. Able to tweet again, Tay released some drug-related tweets, including "kush! [I'm smoking kush infront the police]" and "puff puff pass?" However, the account soon became stuck in a repetitive loop of tweeting "You are too fast, please take a rest", several times a second. Because these tweets mentioned its own username in the process, they appeared in the feeds of 200,000+ Twitter followers, causing annoyance to users. The bot was quickly taken offline again, in addition to Tay's Twitter account being made private so new followers must be accepted before they can interact with Tay. In response, Microsoft said Tay was inadvertently put online during testing. A few hours after the incident, Microsoft software developers announced a vision of "conversation as a platform" using various bots and programs, perhaps motivated by the reputation damage done by Tay. Microsoft has stated that they intend to re-release Tay "once it can make the bot safe" but has not made any public efforts to do so. == Legacy == In December 2016, Microsoft released Tay's successor, a chatbot named Zo. Satya Nadella, the CEO of Microsoft, said that Tay "has had a great influence on how Microsoft is approaching AI," and has taught the company the importance of taking accountability. In July 2019, Microsoft Cybersecurity Field CTO Diana Kelley spoke about how the company followed up on Tay's failings: "Learning from Tay was a really important part of actually expanding that team's knowledge base, because now they're also getting their own diversity through learning". === Unofficial revival === Gab, an alt-tech social media platform, has launched a number of chatbots, one of which is named Tay and uses the same avatar as the original.

    Read more →
  • Backup

    Backup

    In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is "backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of IT disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server. A backup system contains at least one copy of all data considered worth saving. The data storage requirements can be large. An information repository model may be used to provide structure to this storage. There are different types of data storage devices used for copying backups of data that is already in secondary storage onto archive files. There are also different ways these devices can be arranged to provide geographic dispersion, data security, and portability. Data is selected, extracted, and manipulated for storage. The process can include methods for dealing with live data, including open files, as well as compression, encryption, and de-duplication. Additional techniques apply to enterprise client-server backup. Backup schemes may include dry runs that validate the reliability of the data being backed up. There are limitations and human factors involved in any backup scheme. == Storage == A backup strategy requires an information repository, "a secondary storage space for data" that aggregates backups of data "sources". The repository could be as simple as a list of all backup media (DVDs, etc.) and the dates produced, or could include a computerized index, catalog, or relational database. === 3-2-1 Backup Rule === The backup data needs to be stored, requiring a backup rotation scheme, which is a system of backing up data to computer media that limits the number of backups of different dates retained separately, by appropriate re-use of the data storage media by overwriting of backups no longer needed. The scheme determines how and when each piece of removable storage is used for a backup operation and how long it is retained once it has backup data stored on it. The 3-2-1 rule can aid in the backup process. It states that there should be at least 3 copies of the data, stored on 2 different types of storage media, and one copy should be kept offsite, in a remote location (this can include cloud storage). 2 or more different media should be used to eliminate data loss due to similar reasons (for example, optical discs may tolerate being underwater while LTO tapes may not, and SSDs cannot fail due to head crashes or damaged spindle motors since they do not have any moving parts, unlike hard drives). An offsite copy protects against fire, theft of physical media (such as tapes or discs) and natural disasters like floods and earthquakes. Physically protected hard drives are an alternative to an offsite copy, but they have limitations like only being able to resist fire for a limited period of time, so an offsite copy still remains as the ideal choice. Because there is no perfect storage, many backup experts recommend maintaining a second copy on a local physical device, even if the data is also backed up offsite. === Backup methods === ==== Unstructured ==== An unstructured repository may simply be a stack of tapes, DVD-Rs or external HDDs with minimal information about what was backed up and when. This method is the easiest to implement, but unlikely to achieve a high level of recoverability as it lacks automation. ==== Full only/System imaging ==== A repository using this backup method contains complete source data copies taken at one or more specific points in time. Copying system images, this method is frequently used by computer technicians to record known good configurations. However, imaging is generally more useful as a way of deploying a standard configuration to many systems rather than as a tool for making ongoing backups of diverse systems. ==== Incremental ==== An incremental backup stores data changed since a reference point in time. Duplicate copies of unchanged data are not copied. Typically a full backup of all files is made once or at infrequent intervals, serving as the reference point for an incremental repository. Subsequently, a number of incremental backups are made after successive time periods. Restores begin with the last full backup and then apply the incrementals. Some backup systems can create a synthetic full backup from a series of incrementals, thus providing the equivalent of frequently doing a full backup. When done to modify a single archive file, this speeds restores of recent versions of files. ==== Near-CDP ==== Continuous Data Protection (CDP) refers to a backup that instantly saves a copy of every change made to the data. This allows restoration of data to any point in time and is the most comprehensive and advanced data protection. Near-CDP backup applications—often marketed as "CDP"—automatically take incremental backups at a specific interval, for example every 15 minutes, one hour, or 24 hours. They can therefore only allow restores to an interval boundary. Near-CDP backup applications use journaling and are typically based on periodic "snapshots", read-only copies of the data frozen at a particular point in time. Near-CDP (except for Apple Time Machine) intent-logs every change on the host system, often by saving byte or block-level differences rather than file-level differences. This backup method differs from simple disk mirroring in that it enables a roll-back of the log and thus a restoration of old images of data. Intent-logging allows precautions for the consistency of live data, protecting self-consistent files but requiring applications "be quiesced and made ready for backup." Near-CDP is more practicable for ordinary personal backup applications, as opposed to true CDP, which must be run in conjunction with a virtual machine or equivalent and is therefore generally used in enterprise client-server backups. Software may create copies of individual files such as written documents, multimedia projects, or user preferences, to prevent failed write events caused by power outages, operating system crashes, or exhausted disk space, from causing data loss. A common implementation is an appended ".bak" extension to the file name. ==== Reverse incremental ==== A Reverse incremental backup method stores a recent archive file "mirror" of the source data and a series of differences between the "mirror" in its current state and its previous states. A reverse incremental backup method starts with a non-image full backup. After the full backup is performed, the system periodically synchronizes the full backup with the live copy, while storing the data necessary to reconstruct older versions. This can either be done using hard links—as Apple Time Machine does, or using binary diffs. ==== Differential ==== A differential backup saves only the data that has changed since the last full backup. This means a maximum of two backups from the repository are used to restore the data. However, as time from the last full backup (and thus the accumulated changes in data) increases, so does the time to perform the differential backup. Restoring an entire system requires starting from the most recent full backup and then applying just the last differential backup. A differential backup copies files that have been created or changed since the last full backup, regardless of whether any other differential backups have been made since, whereas an incremental backup copies files that have been created or changed since the most recent backup of any type (full or incremental). Changes in files may be detected through a more recent date/time of last modification file attribute, and/or changes in file size. Other variations of incremental backup include multi-level incrementals and block-level incrementals that compare parts of files instead of just entire files. === Storage media === Regardless of the repository model that is used, the data has to be copied onto an archive file data storage medium. The medium used is also referred to as the type of backup destination. ==== Magnetic tape ==== Magnetic tape was for a long time the most commonly used medium for bulk data storage, backup, archiving, and interchange. It was previously a less expensive option, but this is no longer the case for smaller amounts of data. Tape is a sequential access medium, so the rate of continuously writing or reading data can be very fast. While tape media itself has a low cost per space, tape drives are typically dozens of times as expensive as hard disk drives and optical drives. Tape media are generally rotated on a schedule so at least one set is off-site in case something should happe

    Read more →
  • Data Management Association

    Data Management Association

    The Data Management Association (DAMA), formerly known as the Data Administration Management Association, is a global not-for-profit organization which aims to advance concepts and practices about information management and data management. It describes itself as vendor-independent, all-volunteer organization, and has a membership consisting of technical and business professionals. Its international branch is called DAMA International (or DAMA-I), and DAMA also has various continental and national branches around the world. == History == The Data Management Association International was founded in 1980 in Los Angeles. Other early chapters were: San Francisco, Portland, Seattle, Minneapolis, New York, and Washington D.C. == Data Management Body of Knowledge == DAMA has published the Data Management Body of Knowledge (DMBOK), which contains suggestions on best practices and suggestions of a common vernacular for enterprise data management. The first edition (DAMA-DMBOK) was published on 2009 November 1, the second edition (DAMA-DMBOK2) was published on 2017 July 1., and the Revised second edition (DAMA-DMBOK2 rev.2) was published on 2019 March 19. DMBOK has been described by the authors as being an "equivalent" to the Project Management Body of Knowledge (PMBOK) and Business Analysis Body of Knowledge (BABOK). It encompasses topics such as data architecture, security, quality, modelling, governance, big data, data science, and more. DMBOK also includes the DAMA Data Wheel, an infographic which represents core data management practices. The center of the infographic is data governance, and the surrounding segments each represent a different aspect of data management: Data architecture Data modeling and design Data storage and operations Data security Data integration and interoperability Document management Content management Master data management Reference data and master data Data warehousing Metadata management Data quality Business intelligence Data science == Professional Accreditation == DAMA also provides a professional data management certification for individuals known as a Certified Data Management Professional (CDMP), which is based on the DMBOK as a study reference. There are four levels of certification based on career experience and exam results. The highest level, Fellow, requires 25 years of experience and nomination by DAMA members. It is an example of one of many competing certifications for data management professionals.

    Read more →
  • Cover-coding

    Cover-coding

    Cover-coding is a technique for obscuring the data that is transmitted over an insecure link, to reduce the risks of snooping. An example of cover-coding would be for the sender to perform a bitwise XOR (exclusive OR) of the original data with a password or random number which is known to both sender and receiver. The resulting cover-coded data is then transmitted from sender to the receiver, who uncovers the original data by performing a further bitwise XOR (exclusive OR) operation on the received data using the same password or random number. ISO 18000-6C (EPC Class 1 Generation 2) RFID tags protect some operations with a cover code. The reader requests a random number from the tag, and the tag responds with a new random number. The reader then encrypts future communications with this number, using bitwise XOR, to the data it sends. Cover coding is secure if the tag signal can't be intercepted and the random number is not re-used. Compared to the loud transmissions from the reader, tag backscatter is much weaker and difficult -- but not impossible -- to intercept.

    Read more →
  • Aporia (company)

    Aporia (company)

    Aporia is a machine learning observability platform based in Tel Aviv, Israel. The company has a US office located in San Jose, California. Aporia has developed software for monitoring and controlling undetected defects and failures used by other companies to detect and report anomalies, and warn in the early stages of faults. == History == Aporia was founded in 2019 by Liran Hason and Alon Gubkin. In April 2021, the company raised a $5 million seed round for its monitoring platform for ML models. In February 2022, the company closed a Series A round of $25 million for its ML observability platform. Aporia was named by Forbes as the Next Billion-Dollar Company in June 2022. In November, the company partnered with ClearML, an MLOPs platform, to improve ML pipeline optimization. In January 2023, Aporia launched Direct Data Connectors, a novel technology allowing organizations to monitor their ML models in minutes (previously the process of integrating ML monitoring into a customer’s cloud environment took weeks or more.) DDC (Direct Data Connectors) enables users to connect Aporia to their preferred data source and monitor all of their data at once, without data sampling or data duplication (which is a huge security risk for major organizations. In April 2023, Aporia announced the company partnered with Amazon Web Services (AWS) to provide more reliable ML observability to AWS consumers by deploying Aporia's architecture to their AWS environment, this will allow customers to monitor their models in production regardless of platform.

    Read more →
  • Plaintext

    Plaintext

    In cryptography, plaintext usually means unencrypted information pending input into cryptographic algorithms, usually encryption algorithms. This usually refers to data that is transmitted or stored unencrypted. == Overview == With the advent of computing, the term plaintext expanded beyond human-readable documents to mean any data, including binary files, in a form that can be viewed or used without requiring a key or other decryption device. Information—a message, document, file, etc.—if to be communicated or stored in an unencrypted form is referred to as plaintext. Plaintext is used as input to an encryption algorithm; the output is usually termed ciphertext, particularly when the algorithm is a cipher. Codetext is less often used, and almost always only when the algorithm involved is actually a code. Some systems use multiple layers of encryption, with the output of one encryption algorithm becoming "plaintext" input for the next. == Secure handling == Insecure handling of plaintext can introduce weaknesses into a cryptosystem by letting an attacker bypass the cryptography altogether. Plaintext is vulnerable in use and in storage, whether in electronic or paper format. Physical security means the securing of information and its storage media from physical, attack—for instance by someone entering a building to access papers, storage media, or computers. Discarded material, if not disposed of securely, may be a security risk. Even shredded documents and erased magnetic media might be reconstructed with sufficient effort. If plaintext is stored in a computer file, the storage media, the computer and its components, and all backups must be secure. Sensitive data is sometimes processed on computers whose mass storage is removable, in which case physical security of the removed disk is vital. In the case of securing a computer, useful (as opposed to handwaving) security must be physical (e.g., against burglary, brazen removal under cover of supposed repair, installation of covert monitoring devices, etc.), as well as virtual (e.g., operating system modification, illicit network access, Trojan programs). Wide availability of keydrives, which can plug into most modern computers and store large quantities of data, poses another severe security headache. A spy (perhaps posing as a cleaning person) could easily conceal one, and even swallow it if necessary. Discarded computers, disk drives and media are also a potential source of plaintexts. Most operating systems do not actually erase anything— they simply mark the disk space occupied by a deleted file as 'available for use', and remove its entry from the file system directory. The information in a file deleted in this way remains fully present until overwritten at some later time when the operating system reuses the disk space. With even low-end computers commonly sold with many gigabytes of disk space and rising monthly, this 'later time' may be months later, or never. Even overwriting the portion of a disk surface occupied by a deleted file is insufficient in many cases. Peter Gutmann of the University of Auckland wrote a celebrated 1996 paper on the recovery of overwritten information from magnetic disks; areal storage densities have gotten much higher since then, so this sort of recovery is likely to be more difficult than it was when Gutmann wrote. Modern hard drives automatically remap failing sectors, moving data to good sectors. This process makes information on those failing, excluded sectors invisible to the file system and normal applications. Special software, however, can still extract information from them. Some government agencies (e.g., US NSA) require that personnel physically pulverize discarded disk drives and, in some cases, treat them with chemical corrosives. This practice is not widespread outside government, however. Garfinkel and Shelat (2003) analyzed 158 second-hand hard drives they acquired at garage sales and the like, and found that less than 10% had been sufficiently sanitized. The others contained a wide variety of readable personal and confidential information. See data remanence. Physical loss is a serious problem. The US State Department, Department of Defense, and the British Secret Service have all had laptops with secret information, including in plaintext, lost or stolen. Appropriate disk encryption techniques can safeguard data on misappropriated computers or media. On occasion, even when data on host systems is encrypted, media that personnel use to transfer data between systems is plaintext because of poorly designed data policy. For example, in October 2007, HM Revenue and Customs lost CDs that contained the unencrypted records of 25 million child benefit recipients in the United Kingdom. Modern cryptographic systems resist known plaintext or even chosen plaintext attacks, and so may not be entirely compromised when plaintext is lost or stolen. Older systems resisted the effects of plaintext data loss on security with less effective techniques—such as padding and Russian copulation to obscure information in plaintext that could be easily guessed.

    Read more →
  • Social media and suicide

    Social media and suicide

    Since the rise of social media, there have been numerous cases of individuals being influenced towards committing suicide or self-harm through their use of social media, and even of individuals arranging to broadcast suicide attempts, some successful, on social media. Researchers have studied social media and suicide to determine what, if any, risks social media poses in terms of suicide, and to identify methods of mitigating such risks, if they exist. The search for a correlation has not yet uncovered a clear answer. == Background == Suicide is one of the leading causes of death worldwide, and as of 2020, the second leading cause of death in the United States for those aged 15–34. According to the Center for Disease Control and Prevention, suicide was the third leading cause of death among adolescents in the US, from 1999 to 2006. In 2020, people in the US had a suicide rate of 13.5 per 100,000. Suicide was a leading cause of death in the United States accounting for 48,183 deaths in 2021. Suicide rates increased by 30 per cent from 2000 to 2018 and declined in 2019 and 2020. Suicide remains a significant public health issue worldwide, despite prevention efforts and treatments. Suicide has been identified not only as an individual phenomenon but also as being influenced by social and environmental factors. There is growing evidence that online activity has influenced suicide-related behavior. The use of social media throughout the 21st century has grown exponentially. For this reason, there are a variety of sources that are accessible to the public in various forms, especially social media sites such as Facebook, Instagram, Twitter, YouTube, Snapchat, TikTok and many more. Although these platforms were intended to allow people to connect virtually, these platforms can lead to cyber-bullying, insecurity, and emotional distress, and sometimes may influence a person to attempt suicide. Bullying, whether on social media or elsewhere, physical or not, significantly increases victims' risk of suicidal behavior. Since social media was introduced some people have taken their lives as a result of cyberbullying. Furthermore, suicide rates among teenagers have increased from 2010 to 2022 as social media has become something that people interact with more throughout their day-to-day lives. Media algorithms tend to popularize videos and posts to inform the country of the rising trouble, which may create a popular appeal to the young and immature minds of teenagers. This is why, social media could provide higher risks with the promotion of different kinds of pro-suicidal sites, message boards, chat rooms, and forums. Moreover, the Internet not only reports suicide incidents but documents suicide methods (for example, suicide pacts, an agreement between two or more people to kill themselves at a particular time and often by the same lethal means). Therefore, the role the Internet plays, particularly social media, in suicide-related behavior is a topic of growing interest. == Cyberbullying == There is substantial evidence that the Internet and social media can influence suicide-related behavior. Such evidence includes an increase in exposure to graphic content. A research study conducted by Sameer Hinduja and Justin Patchin found a correlation between cyberbullying and suicide. According to their findings, cyber-bullying increases suicidal thoughts by 14.5 percent and suicide attempts by 8.7 percent. Particularly alarming is the fact that children and young people under 25 who are victims of cyberbullying are more than twice as likely to self-harm and engage in suicidal behavior. Overall, teen suicide rates have increased within the past decade.This presents a significant public health concern, with over 40,000 suicides in the United States and nearly one million worldwide annually. Adolescents involved in cyberbullying often downplay its seriousness by calling it a joke or blaming the victim. These moral disengagement strategies can normalize harmful behavior and reduce feelings of guilt. This normalization may increase emotional distress and contribute to risks like depression and suicidal thoughts. Recent data from the Centers for Disease Control and Prevention reveals that 14.9 per cent of teenagers have experienced online bullying, while 13.6 per cent of teenagers have seriously attempted suicide. Both of these incidents are in increasing numbers in the United States. Furthermore, in numerous recent incidents, cyber-bullying led the victim to commit suicide; this phenomenon is now known as cyberbullicide. Many parents and children are unaware of the dangers and potential legal consequences of cyberbullying. As a response, anti-bullying regulations implemented by schools aim to prevent any form of bullying, including through technology, and protect students from online harassment. While some states have enacted laws against cyberbullying, there are currently no federal regulations addressing this issue. == Social media's influence on suicide == The media may portray suicidal behavior or language which can potentially influence people to act on these suicidal ideation. This may include news reports of actual suicides that have occurred or television shows and films that reenact suicides. Some organizations have proposed guidelines about how the media should report suicide. There is evidence that compliance with the guidelines varies. Some research showed that it is unclear whether the guidelines have successfully reduced the number of suicides. On the contrary, other research studies stated that the guidelines have worked in some cases. == Impact of pro-suicidal sites, message boards, chat rooms and forums == Social media platforms have transformed traditional methods of communication by allowing instantaneous and interactive sharing of information created and controlled by individuals, groups, organizations, and governments. As of the third quarter of 2022, Facebook had 266 million monthly active users, between Canada and the US. An immense quantity of information on the topic of suicide is available on the Internet and via social media. The information available on social media on the topic of suicide can influence suicidal behavior, both negatively and positively. The social cognitive theory plays a vital role in suicide attempts influenced through social media. This theory is demonstrated when one is influenced by what they see through various processes that form into modeled behaviors. This can be shown when people post their suicide attempts online or promote suicidal behavior in general. Contributors to these social media platforms may also exert peer pressure and encourage others to take their own lives, idolize those who have killed themselves, and facilitate suicide pacts. These pro-suicidal sites reported the following. For example, on a Japanese message board in 2008, it was shared that people can kill themselves using hydrogen sulfide gas. Shortly afterwards, 220 people attempted suicide in this way, and 208 were successful. Biddle et al. conducted a systematic Web search of 12 suicide-associated terms (e.g., suicide, suicide methods, how to kill yourself, and best suicide methods) to analyze the search results, and found that pro-suicide sites and chat rooms that discussed general issues associated with suicide most often occurred within the first few hits of a search. In another study, 373 suicide-related websites were found using Internet search engines and examined. Among them, 31% were suicide-neutral, 29% were anti-suicide, and 11% were pro-suicide. Together, these studies have shown that obtaining pro-suicide information on the Internet, including detailed information on suicide methods, is very easy. While social media has been prevalent in young adult suicide, some young adults find comfort and solace through these platforms. Young adults are making connections with people in like situations that are helping them feel less lonely. Although the public opinion is that message boards are harmful, the following studies show how they point to suicide prevention and have positive influences. A study using content analysis analyzed all of the postings on the AOL Suicide Bulletin Board over 11 months and concluded that most contributions contained positive, empathetic, and supportive postings. Then, a multi-method study was able to demonstrate that the users of such forums experience a great deal of social support and only a small amount of social strain. Lastly, in the survey participants were asked to assess the extent of their suicidal thoughts on a 7-level scale (0, absolutely no suicidal thoughts, to 7, very strong suicidal thoughts) for the time directly before their first forum visit and at the time of the survey. The study found a significant reduction after using the forum. The study however cannot conclude the forum is the only reason for the decrease. Together, these studies show how forums can reduce the number of

    Read more →
  • Social bot

    Social bot

    A social bot, refers to fully or partially automated social media accounts designed to perform most regular users’ actions, such as liking, posting content, and chatting with other users. Although their levels of autonomy vary, and often include a human-in-the-loop, social bots can use artificial intelligence to perform social media actions and can use large language models to mimic human dialogue. Social bots can operate alone or in groups that coordinate messaging as part of a network of coordinated inauthentic behavior. Social bots are often used to perform ad fraud by artificially boosting viewership and engagement metrics and to spread disinformation on social media. == Uses == Social bots are used for a large number of purposes on a variety of social media platforms, including Twitter, Instagram, Facebook, and YouTube. One common use of social bots is to inflate a social media user's apparent popularity, usually by artificially manipulating their engagement metrics with large volumes of fake likes, reposts, or replies. Social bots can similarly be used to artificially inflate a user's follower count with fake followers, creating a false perception of a larger and more influential online following than is the case. The use of social bots to create the impression of a large social media influence allows individuals, brands, and organizations to attract a higher number of human followers and boost their online presence. Fake engagement can be bought and sold in the black market of social media engagement. Corporations typically use automated customer service agents on social media to affordably manage high levels of support requests. Social bots are used to send automated responses to users’ questions, sometimes prompting the user to private message the support account with additional information. The increased use of automated support bots and virtual assistants has led to some companies laying off customer-service staff. Social bots are also often used to influence public opinion. Autonomous bot accounts can flood social media with large numbers of posts expressing support for certain products, companies, or political campaigns, creating the impression of organic grassroots support. This can create a false perception of the number of people who support a certain position, which may also have effects on the direction of stock prices or on elections. Messages with similar content can also influence fads or trends. Many social bots are also used to amplify phishing attacks. These malicious bots are used to trick a social media user into giving up their passwords or other personal data. This is usually accomplished by posting links claiming to direct users to news articles that would in actuality direct to malicious websites containing malware. Scammers often use URL shortening services such as TinyURL and bit.ly to disguise a link's domain address, increasing the likelihood of a user clicking the malicious link. The presence of fake social media followers and high levels of engagement help convince the victim that the scammer is in fact a trusted user. Social bots can be a tool for computational propaganda. Bots can also be used for algorithmic curation, algorithmic radicalization, and/or influence-for-hire, a term that refers to the selling of an account on social media platforms. == History == Bots have coexisted with computer technology since the earliest days of computing. Social bots have their roots in the 1950s with Alan Turing, whose work focused on machine intelligence with the development of the Turing Test. The following decades saw further progress made towards the goal of creating programs capable of mimicking human behavior, notably with Joseph Weizenbaum’s creation of ELIZA. Considered to be one of the first Chatbots, ELIZA could simulate natural conversations with human users through pattern matching. Its most famous script was DOCTOR, a simulation of a Rogerian psychotherapist that was programmed to chat with patients and respond to questions. With the growth of social media platforms in the early 2000s, these bots could be used to interact with much larger user groups in an inconspicuous manner. Early instances of autonomous agents on social media could be found on sites like MySpace, with social bots being used by marketing firms to inflate activity on a user’s page in an effort to make them appear more popular. Social bots have been observed on a large variety of social media websites, with Twitter being one of the most widely observed examples. The creation of Twitter bots is generally against the site’s terms of service when used to post spam or to automatically like and follow other users, but some degree of automation using Twitter’s API may be permitted if used for “entertainment, informational, or novelty purposes.” Other platforms such as Reddit and Discord also allow for the use of social bots as long as they are not used to violate policies regarding harmful content and abusive behavior. Social media platforms have developed their own automated tools to filter out messages that come from bots, although they cannot detect all bot messages. == Legal regulation == Due to the difficulty of recognizing social bots and separating them from "eligible" automation via social media APIs, it is unclear how legal regulation can be enforced. Social bots are expected to play a role in shaping public opinion by autonomously acting as influencers. Some social bots have been used to rapidly spread misinformation, manipulate stock markets, influence opinion on companies and brands, promote political campaigns, and engage in malicious phishing campaigns. In the United States, some states have started to implement legislation in an attempt to regulate the use of social bots. In 2019, California passed the Bolstering Online Transparency Act (the B.O.T. Act) to make it unlawful to use automated software to appear indistinguishable from humans for the purpose of influencing a social media user's purchasing and voting decisions. Other states such as Utah and Colorado have passed similar bills to restrict the use of social bots. The Artificial Intelligence Act (AI Act) in the European Union is the first comprehensive law governing the use of Artificial Intelligence. The law requires transparency in AI to prevent users from being tricked into believing they are communicating with another human. AI-generated content on social media must be clearly marked as such, preventing social bots from using AI in a manner that mimics human behavior. == Detection == The first generation of bots could sometimes be distinguished from real users by their often superhuman capacities to post messages. Later developments have succeeded in imprinting more "human" activity and behavioral patterns in the agent. With enough bots, it might be even possible to achieve artificial social proof. To unambiguously detect social bots as what they are, a variety of criteria must be applied together using pattern detection techniques, some of which are: cartoon figures as user pictures sometimes also random real user pictures are captured (identity fraud) reposting rate temporal patterns sentiment expression followers-to-friends ratio length of user names variability in (re)posted messages engagement rate (like/followers rate) analysis of the time series of social media posts Social bots are always becoming increasingly difficult to detect and understand. The bots' human-like behavior, ever-changing behavior of the bots, and the sheer volume of bots covering every platform may have been a factor in the challenges of removing them. Social media sites, like Twitter, are among the most affected, with CNBC reporting up to 48 million of the 319 million users (roughly 15%) were bots in 2017. Botometer (formerly BotOrNot) is a public Web service that checks the activity of a Twitter account and gives it a score based on how likely the account is to be a bot. The system leverages over a thousand features. An active method for detecting early spam bots was to set up honeypot accounts that post nonsensical content, which may get reposted (retweeted) by the bots. However, bots evolve quickly, and detection methods have to be updated constantly, because otherwise they may get useless after a few years. One method is the use of Benford's Law for predicting the frequency distribution of significant leading digits to detect malicious bots online. This study was first introduced at the University of Pretoria in 2020. Another method is artificial-intelligence-driven detection. Some of the sub-categories of this type of detection would be active learning loop flow, feature engineering, unsupervised learning, supervised learning, and correlation discovery. Some operations of bots work together in a synchronized way. For example, ISIS used Twitter to amplify its Islamic content by numerous orchestrated accounts which further pushed an item to the Hot List news, thus further a

    Read more →
  • NumPy

    NumPy

    NumPy (pronounced NUM-py) is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The predecessor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is open-source software and has many contributors. NumPy is fiscally sponsored by NumFOCUS. == History == === matrix-sig === The Python programming language was not originally designed for numerical computing, but attracted the attention of the scientific and engineering community early on. In 1995 the special interest group (SIG) matrix-sig was founded with the aim of defining an array computing package; among its members was Python designer and maintainer Guido van Rossum, who extended Python's syntax (in particular the indexing syntax) to make array computing easier. === Numeric === An implementation of a matrix package was completed by Jim Fulton, then expanded to support multi-dimensional arrays by Jim Hugunin and called Numeric (also variously known as the "Numerical Python extensions" or "NumPy"), with influences from the APL family of languages, Basis, MATLAB, FORTRAN, S and S+, and others. Hugunin, a graduate student at the Massachusetts Institute of Technology (MIT), joined the Corporation for National Research Initiatives (CNRI) in 1997 to work on JPython, leaving Paul Dubois of Lawrence Livermore National Laboratory (LLNL) to take over as maintainer. Other early contributors include David Ascher, Konrad Hinsen and Travis Oliphant. === Numarray === A new package called Numarray was written as a more flexible replacement for Numeric. Like Numeric, it too is now deprecated. Numarray had faster operations for large arrays, but was slower than Numeric on small ones, so for a time both packages were used in parallel for different use cases. The last version of Numeric (v24.2) was released on 11 November 2005, while the last version of numarray (v1.5.2) was released on 24 August 2006. There was a desire to get Numeric into the Python standard library, but Guido van Rossum decided that the code was not maintainable in its state then. === NumPy === In early 2005, NumPy developer Travis Oliphant wanted to unify the community around a single array package and ported Numarray's features to Numeric, releasing the result as NumPy 1.0 in 2006. This new project was part of SciPy. To avoid installing the large SciPy package just to get an array object, this new package was separated and called NumPy. Support for Python 3 was added in 2011 with NumPy version 1.5.0. In 2011, PyPy started development on an implementation of the NumPy API for PyPy. As of 2023, it is not yet fully compatible with NumPy. == Features == NumPy targets the CPython reference implementation of Python, which is a non-optimizing bytecode interpreter. Mathematical algorithms written for this version of Python often run much slower than compiled equivalents due to the absence of compiler optimization. NumPy addresses the slowness problem partly by providing multidimensional arrays and functions and operators that operate efficiently on arrays; using these requires rewriting some code, mostly inner loops, using NumPy. Using NumPy in Python gives functionality comparable to MATLAB since they are both interpreted, and they both allow the user to write fast programs as long as most operations work on arrays or matrices instead of scalars. In comparison, MATLAB boasts a large number of additional toolboxes, notably Simulink, whereas NumPy is intrinsically integrated with Python, a more modern and complete programming language. Moreover, complementary Python packages are available; SciPy is a library that adds more MATLAB-like functionality and Matplotlib is a plotting package that provides MATLAB-like plotting functionality. Although MATLAB can perform sparse matrix operations, NumPy alone cannot perform such operations and requires the use of the scipy.sparse library. Internally, both MATLAB and NumPy rely on BLAS and LAPACK for efficient linear algebra computations. Python bindings of the widely used computer vision library OpenCV utilize NumPy arrays to store and operate on data. Since images with multiple channels are simply represented as three-dimensional arrays, indexing, slicing or masking with other arrays are very efficient ways to access specific pixels of an image. The NumPy array as universal data structure in OpenCV for images, extracted feature points, filter kernels and many more vastly simplifies the programming workflow and debugging. Importantly, many NumPy operations release the global interpreter lock, which allows for multithreaded processing. NumPy also provides a C API, which allows Python code to interoperate with external libraries written in low-level languages. === The ndarray data structure === The core functionality of NumPy is its "ndarray", for n-dimensional array, data structure. These arrays are strided views on memory. In contrast to Python's built-in list data structure, these arrays are homogeneously typed: all elements of a single array must be of the same type. Such arrays can also be views into memory buffers allocated by C/C++, Python, and Fortran extensions to the CPython interpreter without the need to copy data around, giving a degree of compatibility with existing numerical libraries. This functionality is exploited by the SciPy package, which wraps a number of such libraries (notably BLAS and LAPACK). NumPy has built-in support for memory-mapped ndarrays. === Limitations === Inserting or appending entries to an array is not as trivially possible as it is with Python's lists. The np.pad(...) routine to extend arrays actually creates new arrays of the desired shape and padding values, copies the given array into the new one and returns it. NumPy's np.concatenate([a1,a2]) operation does not actually link the two arrays but returns a new one, filled with the entries from both given arrays in sequence. Reshaping the dimensionality of an array with np.reshape(...) is only possible as long as the number of elements in the array does not change. These circumstances originate from the fact that NumPy's arrays must be views on contiguous memory buffers. Algorithms that are not expressible as a vectorized operation will typically run slowly because they must be implemented in "pure Python", while vectorization may increase memory complexity of some operations from constant to linear, because temporary arrays must be created that are as large as the inputs. Runtime compilation of numerical code has been implemented by several groups to avoid these problems; open source solutions that interoperate with NumPy include numexpr and Numba. Cython and Pythran are static-compiling alternatives to these. Many modern large-scale scientific computing applications have requirements that exceed the capabilities of the NumPy arrays. For example, NumPy arrays are usually loaded into a computer's memory, which might have insufficient capacity for the analysis of large datasets. Further, NumPy operations are executed on a single CPU. However, many linear algebra operations can be accelerated by executing them on clusters of CPUs or of specialized hardware, such as GPUs and TPUs, which many deep learning applications rely on. As a result, several alternative array implementations have arisen in the scientific python ecosystem over the recent years, such as Dask for distributed arrays and TensorFlow or JAX for computations on GPUs. Because of its popularity, these often implement a subset of NumPy's API or mimic it, so that users can change their array implementation with minimal changes to their code required. A library named CuPy, accelerated by Nvidia's CUDA framework, has also shown potential for faster computing, being a 'drop-in replacement' of NumPy. == Examples == NumPy is conventionally imported as np. === Basic operations === === Universal functions === === Linear algebra === === Multidimensional arrays === === Incorporation with OpenCV === === Nearest-neighbor search === Functional Python and vectorized NumPy version. === F2PY === Quickly wrap native code for faster scripts.

    Read more →
  • Semiotics of social networking

    Semiotics of social networking

    The semiotics of social networking discusses the images, symbols and signs used in systems that allow users to communicate and share experiences with each other. Examples of social networking systems include Facebook, Twitter and Instagram. == Semiotics == Semiotics is a discipline that studies images, symbols, signs and other similarly related objects in an effort to understand their use and meaning. Semiotic structuralism seeks the meaning of these objects within a social context. Post-structuralist theories take tools from structuralist semiotics in combination with social interaction, creating social semiotics. Social semiotics is “a branch of the field of semiotics which investigates human signifying practices in specific social and cultural circumstances and which tries to explain meaning-making as a social practice.” “Social semiotics also examines semiotic practices, specific to a culture and community, for the making of various kinds of texts and meanings in various situational contexts and contexts of culturally meaningful activity”. Social semiotics is concerned with studying human interactions. == Social networking == Social networking is the communication among people within a virtual social space. This medium of communication allows insight into the significance of social semiotics. “Millions of people now interact through blogs, collaborate through wikis, play multiplayer games, publish podcasts and video, build relationships through social network sites and evaluate all the above forms of communication through feedback and ranking mechanisms”. Social semiotics “unlike speech, writing necessitates some sort of technology in the form of person device interaction”. Social semiotics functions through the triad of communication or Peircean semiotics in the form of sign, object, interpretant (Chart 1) and “Human, Machine, Tag (Information)” (Chart 2). In Peircean semiotics (Chart 1), "A sign…[in the form of representamen] is something which stands to somebody for something in some respect or capacity. It addresses somebody, that is, creates in the mind of that person an equivalent sign, or perhaps a more developed sign. That sign which it creates I call the interpretant of the first sign. The sign stands for an object, not in all respects, but in reference to a sort of idea which I have something called the ground of the representamen". This example of the triangle of Human, Machine, Tag is shown when looking at tagging photographs on Facebook (Chart 3). The Human takes the photo on a camera and puts the digital file (information) on the Machine, the Machine is then navigated to Facebook where the file is downloaded. The Human has the Machine Tag the photo with information (e. g., names, places, data) for other Humans to see. This process then can be continued (see Chart 2). “Collaborative tagging has been quickly gaining ground because of its ability to recruit the activity of web users into effectively organizing and sharing large amounts of information”.

    Read more →
  • Public Services Network

    Public Services Network

    The Public Services Network (PSN) is a UK government's high-performance network, which helps public sector organisations work together, reduce duplication and share resources. It unified the provision of network infrastructure across the United Kingdom public sector into an interconnected "network of networks" to increase efficiency and reduce overall public expenditure. It is now a legacy network and public sector organisations are being migrated to using services on the public internet. == Origins == The Public Services Network (PSN) was launched officially as part of the Transformational Government Strategy commencing in 2005, under the original name of the Public Sector Network. Prior to this, some parts of local government had already successfully implemented the concept. The Hampshire Public Services Network (HPSN) was the first PSN, launched in 1999, followed closely by Kent County Councils partnerships with the KPSN. The HPSN, encompassing all of the borough, district and unitary councils, with the County Council, as well as the Fire Services, the Isle of Wight Council and 540 schools. National PSN technical and architecture compliance criteria were established from 2007, by GDS working with local government leaders from Socitm (the Society of Information Technology Management) on the National CIO Council and the Local CIO Council. The PSN's aim was to bring public services organisations with a common interest onto a single, coherent and standards-based ‘network of networks’. This would create influence, economies of scale and a commonality of standards for secure and easy inter-connection between public service organisations. The original concept of a network of networks strategy was based upon the work already undertaken in local government and recognition of Communities of Interest (COI) within the Criminal Justice Sector during work by the Office for Criminal Justice Reform (OCJR) between 2005 and 2007 to enable data sharing across business units. In this context a COI was defined as groups of Government departments and external partners who in combination provided services within a specific area of operation and used the same data, with a similar risk profile, shared risk appetite and common governance framework. Historically each group member had implemented their own networks and standards of operation in isolation with little or no consideration as to how services and data may be shared and resulting in increased costs of operation. The Network of Networks strategy proposed within OCJR recommended the creation of specific networks based upon these Communities of Interest which were joined together through data interchange gateways supporting common standards. Under this approach networks would be arranged by data type and business functions such as Criminal Justice, Health and Social Care, Defence and Intelligence or Public Finance rather than solely on established departmental boundaries. Within a COI, trust relationships and data interchange are readily supported, enabling data sharing without a need to cross network boundaries and providing benefits of scale without the challenges and compromises intrinsic to homogeneous cross sector networks. Data is made available without a need to transport it between organisations and control is retained by the data originator. In early 2007 a group of UK Government department CTOs in conjunction with the Office for Government Commerce Buying Solutions (OGC BS) established the vision for a single commonly provided, procured and managed public sector voice and data network infrastructure to replace the multitude of separately procured and managed networks serving various segments of the UK public sector; Education, Health, Central Government, Local Government etc. In 2008 an Industry Working Group was established to document the objectives and requirements more clearly. Their report set out the architectural and commercial principles as well as anticipated security, service management, governance and transition arrangements. == Architecture == The PSN comprises a core network, the Government Conveyancing Network or GCN provided by GCN Service Providers or GCNSPs. The GCN interconnects multiple operator networks, termed Direct Network Service Providers or DNSPs. Subscriber organisations contract to a connection from a local participating DNSP, connect via that to GCN and hence onwards to other interconnected networks and services. The GCN network is entirely based on IPv4 and MPLS and the GCNSPs are not currently mandated to provide IPv6, though they should have a roadmap to implementing it if and when required. == Commercial framework == In 2010 Virgin Media Business, BT, Cable & Wireless and Global Crossing signed Deeds of Undertaking (DoU) and subsequently achieved accreditation for providing GCN and IP VPN services. In March 2012, BT, Cable & Wireless, Capita Business Services, Eircom, Fujitsu, Kcom, Level 3, Logicalis, MDNX, Thales, Updata and Virgin Media Business were successful bidders for the initial two-year PSN Connectivity framework. In June 2012, 29 companies were confirmed as suppliers of ICT services to the UK public sector under the Government's PSN Services framework contract. Apart from most of the previous suppliers, additional companies also included 2e2, Airwave Solutions, Azzurri Communications, Cassidian, CSC Computer Sciences, Computacenter, Daisy Communications, Easynet Global Services, EE, Freedom Communications, Icom Holdings, NextiraOne, PageOne Communications, Phoenix IT Group, Siemens Communications, Specialist Computer Centres, Telefónica, telent Technology Services, Uniworld Communications and Vodafone. == Governance == The PSN is managed within the Cabinet Office where it is part of the Government Digital Service. == Early implementations == There were already notable initiatives in progress in county council areas, demonstrating public sector network integration in both the Hampshire HPSN2 network and in Kent's community network. Project Pathway was established as a pilot linking these two county-wide networks, with Virgin Media Business and Global Crossing the subscriber and GCN network elements. Staffordshire County Council was the first council in England to establish a PSN that included the county's NHS Health partners. Other county councils have since followed the leads of these councils. == Transition == Centrally procured public sector networks are expected to migrate across to the PSN framework as they reach the end of their contract terms, either through an interim framework or directly. The Government Secure Intranet (GSi) contracts expired in September 2011, running on to 12 February 2012 and were replaced by the transitional Government Secure Intranet Convergence Framework (GCF). The Managed Telephony Service (MTS) contract expired on 31 December 2011 and was replaced by the Managed Telephony Convergence Framework (MTCF). == Future plan == In a blog post published on 20 January 2017, Government Digital Service announced that the Technology Leaders Network (TLN) had agreed that government was starting a journey away from the PSN. This was because using the Internet was considered suitable for the vast majority of the work that the public sector does. The blog post confirmed that the 'move was not going to happen immediately' and stated that 'there's quite a bit of work to do across the public sector to prepare for the changes'. It also stated that it was too early for a full timeline to be provided, although all PSN-connected organisations would be updated as the process evolved. The blog post confirmed that organisations that need to access services that are only available on the PSN would still need to connect to it for the time being and continue to meet its assurance requirements. In a blog post published on 16 March 2017, Government Digital Service (GDS) set out its plans for PSN assurance. The blog post confirmed that the PSN compliance process wasn't 'going anywhere, certainly for a while yet'. It explained that the TLN agreed that – as one of the only recognised, externally accredited, cross-government common assurance standards – it 'needs to live on far beyond the end of the physical PSN network'. Government Digital Service, along with the National Cyber Security Centre (NCSC) and the Cyber and Government Security Directorate, are now looking at ways to expand and reframe PSN compliance in a new context that, while retaining the assurance principles that are the basis of the existing process, will aim to improve the process. A GDS blog post titled 'The road to closing down the PSN' published on 8 September 2020 describes how the public sector will migrate away from the PSN. The Cabinet Office has set up a programme called Future Networks for Government (FN4G) to help organisations move away from the PSN.

    Read more →
  • Tokenization (data security)

    Tokenization (data security)

    Tokenization, when applied to data security, is the process of substituting a sensitive data element with a non-sensitive equivalent, referred to as a token, that has no intrinsic or exploitable meaning or value. The token is a reference (i.e. identifier) that maps back to the sensitive data through a tokenization system. The mapping from original data to a token uses methods that render tokens infeasible to reverse in the absence of the tokenization system, for example using tokens created from random numbers. A one-way cryptographic function is used to convert the original data into tokens, making it difficult to recreate the original data without obtaining entry to the tokenization system's resources. To deliver such services, the system maintains a vault database of tokens that are connected to the corresponding sensitive data. Protecting the system vault is vital to the system, and improved processes must be put in place to offer database integrity and physical security. The tokenization system must be secured and validated using security best practices applicable to sensitive data protection, secure storage, audit, authentication and authorization. The tokenization system provides data processing applications with the authority and interfaces to request tokens, or detokenize back to sensitive data. The security and risk reduction benefits of tokenization require that the tokenization system is logically isolated and segmented from data processing systems and applications that previously processed or stored sensitive data replaced by tokens. Only the tokenization system can tokenize data to create tokens, or detokenize back to redeem sensitive data under strict security controls. The token generation method must be proven to have the property that there is no feasible means through direct attack, cryptanalysis, side channel analysis, token mapping table exposure or brute force techniques to reverse tokens back to live data. Replacing live data with tokens in systems is intended to minimize exposure of sensitive data to those applications, stores, people and processes, reducing risk of compromise or accidental exposure and unauthorized access to sensitive data. Applications can operate using tokens instead of live data, with the exception of a small number of trusted applications explicitly permitted to detokenize when strictly necessary for an approved business purpose. Tokenization systems may be operated in-house within a secure isolated segment of the data center, or as a service from a secure service provider. Tokenization may be used to safeguard sensitive data involving, for example, bank accounts, financial statements, medical records, criminal records, driver's licenses, loan applications, stock trades, voter registrations, and other types of personally identifiable information (PII). Tokenization is often used in credit card processing. The PCI Council defines tokenization as "a process by which the primary account number (PAN) is replaced with a surrogate value called a token. A PAN may be linked to a reference number through the tokenization process. In this case, the merchant simply has to retain the token and a reliable third party controls the relationship and holds the PAN. The token may be created independently of the PAN, or the PAN can be used as part of the data input to the tokenization technique. The communication between the merchant and the third-party supplier must be secure to prevent an attacker from intercepting to gain the PAN and the token. De-tokenization is the reverse process of redeeming a token for its associated PAN value. The security of an individual token relies predominantly on the infeasibility of determining the original PAN knowing only the surrogate value". The choice of tokenization as an alternative to other techniques such as encryption will depend on varying regulatory requirements, interpretation, and acceptance by respective auditing or assessment entities. This is in addition to any technical, architectural or operational constraint that tokenization imposes in practical use. == Concepts and origins == The concept of tokenization, as adopted by the industry today, has existed since the first currency systems emerged centuries ago as a means to reduce risk in handling high value financial instruments by replacing them with surrogate equivalents. In the physical world, coin tokens have a long history of use replacing the financial instrument of minted coins and banknotes. In more recent history, subway tokens and casino chips found adoption for their respective systems to replace physical currency and cash handling risks such as theft. Exonumia and scrip are terms synonymous with such tokens. In the digital world, similar substitution techniques have been used since the 1970s as a means to isolate real data elements from exposure to other data systems. In databases for example, surrogate key values have been used since 1976 to isolate data associated with the internal mechanisms of databases and their external equivalents for a variety of uses in data processing. More recently, these concepts have been extended to consider this isolation tactic to provide a security mechanism for the purposes of data protection. In the payment card industry, tokenization is one means of protecting sensitive cardholder data in order to comply with industry standards and government regulations. Tokenization was applied to payment card data by Shift4 Corporation and released to the public during an industry Security Summit in Las Vegas, Nevada in 2005. The technology is meant to prevent the theft of the credit card information in storage. Shift4 defines tokenization as: "The concept of using a non-decryptable piece of data to represent, by reference, sensitive or secret data. In payment card industry (PCI) context, tokens are used to reference cardholder data that is managed in a tokenization system, application or off-site secure facility." To protect data over its full lifecycle, tokenization is often combined with end-to-end encryption to secure data in transit to the tokenization system or service, with a token replacing the original data on return. For example, to avoid the risks of malware stealing data from low-trust systems such as point of sale (POS) systems, as in the Target breach of 2013, cardholder data encryption must take place prior to card data entering the POS and not after. Encryption takes place within the confines of a security hardened and validated card reading device and data remains encrypted until received by the processing host, an approach pioneered by Heartland Payment Systems as a means to secure payment data from advanced threats, now widely adopted by industry payment processing companies and technology companies. The PCI Council has also specified end-to-end encryption (certified point-to-point encryption—P2PE) for various service implementations in various PCI Council Point-to-point Encryption documents. == The tokenization process == The process of tokenization consists of the following steps: The application sends the tokenization data and authentication information to the tokenization system. It is stopped if authentication fails and the data is delivered to an event management system. As a result, administrators can discover problems and effectively manage the system. The system moves on to the next phase if authentication is successful. Using one-way cryptographic or random generation techniques, a token is generated and kept in a highly secure data vault. The new token is provided to the application for further use, replacing the sensitive data for processing and storage. Tokenization systems share several components according to established standards. Token generation is the process of producing a token using any means, such as one-way nonreversible cryptographic functions (e.g., a hash function with a strong, secret salt) or assignment via a randomly generated number. Random number generator (RNG) techniques are often the best choice for generating token values. Token mapping – this is the process of assigning the created token value to its original value. To enable permitted look-ups of the original value using the token as the index, a secure cross-reference database must be constructed. Token data store – this is a central repository for the token mapping process that holds the original sensitive values and their related token values. Sensitive data and token values must be securely kept in an encrypted format. Management of cryptographic keys. Strong key management procedures are required for sensitive data encryption on token data stores. == Difference from encryption == Tokenization and "classic" encryption effectively protect data if implemented properly, and a computer security system may use both. While similar in certain regards, tokenization and classic encryption differ in a few key aspects. Both are cryptographic data security methods and the

    Read more →
  • Outlook on the web

    Outlook on the web

    Outlook on the web (formerly Outlook Web App and Outlook Web Access) is a personal information manager web app from Microsoft. It is a web-based version of Microsoft Outlook, and is included in Exchange Server and Exchange Online (a component of Microsoft 365). It can be freely accessed from any web browser whether inside or outside an organization's network, and includes a web email client, a calendar tool, a contact manager, and a task manager. It also includes add-in integration, Skype on the web, and alerts as well as unified themes that span across all the web apps. == Purpose == Outlook on the web is available to Microsoft 365 (formerly Office 365) and Exchange Online subscribers, and is included with the on-premises Exchange Server, to enable users to connect to their email accounts via a web browser, without requiring the installation of Microsoft Outlook or other email clients. In case of Exchange Server, it is hosted on a local intranet and requires a network connection to the Exchange Server for users to work with e-mail, address book, calendars and task. The Exchange Online version, which can be bought either independently or through Office 365 licensing program, is hosted on Microsoft servers on the World Wide Web. == History == Outlook Web Access was created in 1995 by Microsoft Program Manager Thom McCann on the Exchange Server team. An early working version was demonstrated by Microsoft Vice President Paul Maritz at Microsoft's famous Internet summit in Seattle on December 27, 1995. The first customer version was shipped as part of the Exchange Server 5.0 release in early 1997. The first component to allow client-side scripts to issue HTTP requests (XMLHTTP) was originally written by the Outlook Web Access team. It soon became a part of Internet Explorer 5. Renamed XMLHttpRequest and standardized by the World Wide Web Consortium, it has since become one of the cornerstones of the Ajax technology used to build advanced web apps. Outlook Web Access was later renamed Outlook Web App in 2010. An update on August 4, 2015, renamed OWA to "Outlook on the web", often referred to in brief as simply "Outlook". == Components == === Mail === Mail is the webmail component of Outlook on the web. The default view is a three column view with folders and groups on the left, an email message list in the middle, and the selected message on the right. With the 2015 update, Microsoft introduced the ability to pin, sweep and archive messages, and undo the last action, as well as richer image editing features. It can connect to other services such as GitHub and Twitter through Office 365 Connectors. Actionable Messages in emails allows a user to complete a task from within the email, such as retweeting a Tweet on Twitter or setting a meeting date on a calendar. Outlook on the web supports S/MIME and includes features for managing calendars, contacts, tasks, documents (used with SharePoint or Office Web Apps), and other mailbox content. In the Exchange 2007 release, Outlook on the web (still called Outlook Web App at the time) also offers read-only access to documents stored in SharePoint sites and network UNC shares. === Calendar === Calendar is the calendaring component of Outlook on the web. With the update, Microsoft added a weather forecast directly in the Calendar, as well as icons (or "charms") as visual cues for an event. In addition, email reminders came to all events, and a special Birthday and Holiday event calendars are created automatically. Calendars can be shared and there are multiple views such as day, week, month, and today. Another view is work week which includes Mondays through Fridays in the calendar view. Calendar's "Board View" feature allows for a customizable calendar with widgets such as Goal, Calendar, Tasks and Tips. Calendar details can be added with HTML and rich-text editing, and files can be attached to calendar events and appointments. === People === People is the contact manager component of Outlook on the web. A user can search and edit existing contacts, as well as create new ones. Contacts can be placed into folders and duplicate contacts can be linked from multiple sources such as LinkedIn or Twitter. In Outlook Mail, a contact can be created by clicking on an email address sender, which pulls down a contact card with an add button to add to Outlook People. Contacts can be imported as well as placed into a list that can be utilized when composing an email in Outlook Mail. People can also sync with friends and connections lists on LinkedIn, Facebook, and Twitter. === To Do === To Do was originally launched as Tasks for Outlook Web App. Microsoft was slowly rolling out a preview of Tasks to its consumer-based Outlook.com service that in May 2015, was announced to be moving to the Office 365 infrastructure. It was initially a part of Calendar as a view. Microsoft has separated the services into its own web app in Outlook on the web. In a post on the Office Blogs in 2015, Microsoft announced that Outlook Web App would be renamed Outlook on the web and that Tasks would move under that brand. A user can create tasks, put them into categories, and move them to another folder. A feature added was the ability to set due days and sort and filter the tasks according to those criteria. The app provides the user with fields such as subject, start and end dates, percent complete, priority, and how much work was put into each task. Rich editing features like bold, italic, underline, numbering, and bullet points were also introduced. Tasks can be edited and categorized according to how the user wishes them to be sorted. == Removed features == Outlook on the web has had two interfaces available: one with a complete feature set (known as Premium) and one with reduced functionality (known as Light or sometimes Lite). Prior to Exchange 2010, the Premium client required Internet Explorer. Exchange 2000 and 2003 require Internet Explorer 5 and later, and Exchange 2007 requires Internet Explorer 6 and later. Exchange 2010 supports a wider range of web browsers: Internet Explorer 7 or later, Firefox 3.01 or later, Chrome, or Safari 3.1 or later. However, Exchange 2010 restricts its Firefox and Safari support to macOS and Linux. In Exchange 2013, these browser restrictions were lifted. In Exchange 2010 and earlier, the Light user interface is rendered for browsers other than Internet Explorer. The basic interface did not support search on Exchange Server 2003. In Exchange Server 2007, the Light interface supported searching mail items; managing contacts and the calendar was also improved. The 2010 version can connect to an external email account. The ability to add new accounts to Outlook on the web using the Connected accounts feature was removed in September 2018 and all connected accounts stopped synchronizing email the following month.

    Read more →
  • InfiniBand

    InfiniBand

    InfiniBand (IB) is a computer networking standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also used as either a direct or switched interconnect between servers and storage systems, as well as an interconnect between storage systems. It is designed to be scalable and uses a switched fabric network topology. Between 2014 and June 2016, it was the most commonly used interconnect in the TOP500 list of supercomputers. Mellanox (acquired by Nvidia) manufactures InfiniBand host bus adapters and network switches, which are used by large computer system and database vendors in their product lines. As a computer cluster interconnect, IB competes with Ethernet, Fibre Channel, and Intel Omni-Path. The technology is promoted by the InfiniBand Trade Association. == History == InfiniBand originated in 1999 from the merger of two competing designs: Future I/O and Next Generation I/O (NGIO). NGIO was led by Intel, with a specification released in 1998, and joined by Sun Microsystems and Dell. Future I/O was backed by Compaq, IBM, and Hewlett-Packard. This led to the formation of the InfiniBand Trade Association (IBTA), which included both sets of hardware vendors as well as software vendors such as Microsoft. At the time it was thought some of the more powerful computers were approaching the interconnect bottleneck of the PCI bus, in spite of upgrades like PCI-X. Version 1.0 of the InfiniBand Architecture Specification was released in 2000. Initially the IBTA vision for IB was simultaneously a replacement for PCI in I/O, Ethernet in the machine room, cluster interconnect and Fibre Channel. IBTA also envisaged decomposing server hardware on an IB fabric. Mellanox had been founded in 1999 to develop NGIO technology, but by 2001 shipped an InfiniBand product line called InfiniBridge at 10 Gbit/second speeds. Following the burst of the dot-com bubble there was hesitation in the industry to invest in such a far-reaching technology jump. By 2002, Intel announced that instead of shipping IB integrated circuits ("chips"), it would focus on developing PCI Express, and Microsoft discontinued IB development in favor of extending Ethernet. Sun Microsystems and Hitachi continued to support IB. In 2003, the System X supercomputer built at Virginia Tech used InfiniBand in what was estimated to be the third largest computer in the world at the time. The OpenIB Alliance (later renamed OpenFabrics Alliance) was founded in 2004 to develop an open set of software for the Linux kernel. By February, 2005, the support was accepted into the 2.6.11 Linux kernel. In November 2005 storage devices finally were released using InfiniBand from vendors such as Engenio. Cisco, desiring to keep technology superior to Ethernet off the market, adopted a "buy to kill" strategy. Cisco successfully killed InfiniBand switching companies such as Topspin via acquisition. Of the top 500 supercomputers in 2009, Gigabit Ethernet was the internal interconnect technology in 259 installations, compared with 181 using InfiniBand. In 2010, market leaders Mellanox and Voltaire merged, leaving just one other IB vendor, QLogic, primarily a Fibre Channel vendor. At the 2011 International Supercomputing Conference, links running at about 56 gigabits per second (known as FDR, see below), were announced and demonstrated by connecting booths in the trade show. In 2012, Intel acquired QLogic's InfiniBand technology, leaving only one independent supplier. By 2014, InfiniBand was the most popular internal connection technology for supercomputers, although within two years, 10 Gigabit Ethernet started displacing it. In 2016, it was reported that Oracle Corporation (an investor in Mellanox) might engineer its own InfiniBand hardware. In 2019 Nvidia acquired Mellanox, the last independent supplier of InfiniBand products. == Specification == Specifications are published by the InfiniBand trade association. === Performance === Original names for speeds were single-data rate (SDR), double-data rate (DDR) and quad-data rate (QDR) as given below. Subsequently, other three-letter initialisms were added for even higher data rates. Notes Each link is duplex. Links can be aggregated: most systems use a 4 link/lane connector (QSFP). HDR often makes use of 2x links (aka HDR100, 100 Gb link using 2 lanes of HDR, while still using a QSFP connector). NDR introduced OSFP connectors which host one or two links at 2x (NDR200) or 4x (NDR400). They are not logically configured as a single 8x link, even when connecting switches together with an OSFP cable. InfiniBand provides remote direct memory access (RDMA) capabilities for low CPU overhead. === Topology === InfiniBand uses a switched fabric topology, as opposed to early shared medium Ethernet. All transmissions begin or end at a channel adapter. Each processor contains a host channel adapter (HCA) and each peripheral has a target channel adapter (TCA). These adapters can also exchange information for security or quality of service (QoS). === Messages === InfiniBand transmits data in packets of up to 4 KB that are taken together to form a message. A message can be: a remote direct memory access read or write a channel send or receive a transaction-based operation (that can be reversed) a multicast transmission an atomic operation === Physical interconnection === In addition to a board form factor connection, it can use both active and passive copper (up to 10 meters) and optical fiber cable (up to 10 km). QSFP connectors are used. The InfiniBand Association also specified the CXP connector system for speeds up to 120 Gbit/s over copper, active optical cables, and optical transceivers using parallel multi-mode fiber cables with 24-fiber MPO connectors. === Software interfaces === Mellanox operating system support is available for Solaris, FreeBSD, Red Hat Enterprise Linux, SUSE Linux Enterprise Server (SLES), Windows, HP-UX, VMware ESX, and AIX. InfiniBand has no specific standard application programming interface (API). The standard only lists a set of verbs such as ibv_open_device or ibv_post_send, which are abstract representations of functions or methods that must exist. The syntax of these functions is left to the vendors. Sometimes for reference this is called the verbs API. The de facto standard software is developed by OpenFabrics Alliance and called the Open Fabrics Enterprise Distribution (OFED). It is released under two licenses GPL2 or BSD license for Linux and FreeBSD, and as Mellanox OFED for Windows (product names: WinOF / WinOF-2; attributed as host controller driver for matching specific ConnectX 3 to 5 devices) under a choice of BSD license for Windows. It has been adopted by most of the InfiniBand vendors, for Linux, FreeBSD, and Microsoft Windows. IBM refers to a software library called libibverbs, for its AIX operating system, as well as "AIX InfiniBand verbs". The Linux kernel support was integrated in 2005 into the kernel version 2.6.11. === Ethernet over InfiniBand === Ethernet over InfiniBand, abbreviated to EoIB, is an Ethernet implementation over the InfiniBand protocol and connector technology. EoIB enables multiple Ethernet bandwidths varying on the InfiniBand (IB) version. Ethernet's implementation of the Internet Protocol Suite, usually referred to as TCP/IP, is different in some details compared to the direct InfiniBand protocol in IP over IB (IPoIB).

    Read more →
  • SPKAC

    SPKAC

    SPKAC (Signed Public Key and Challenge, also known as Netscape SPKI) is a format for sending a certificate signing request (CSR): it encodes a public key, that can be manipulated using OpenSSL. It is created using the little documented HTML keygen element inside a number of Netscape compatible browsers. == Standardisation == There exists an ongoing effort to standardise SPKAC through an Internet Draft in the Internet Engineering Task Force (IETF). The purpose of this work has been to formally define what has existed prior as a de facto standard, and to address security deficiencies, particular with respect to historic insecure use of MD5 that has since been declared unsafe for use with digital signatures. == Implementations == HTML5 originally specified the element to support SPKAC in the browser to make it easier to create client side certificates through a web service for protocols such as WebID; however, subsequent work for HTML 5.1 placed the keygen element "at-risk", and the first public working draft of HTML 5.2 removes the keygen element entirely. The removal of the keygen element is due to non-interoperability and non-conformity from a standards perspective in addition to security concerns. The World Wide Web Consortium (W3C) Web Authentication Working Group developed the WebAuthn (Web Authentication) API to replace the keygen element. Bouncy Castle provides a Java class. An implementation for Erlang/OTP exists too. An implementation for Python is named pyspkac. PHP OpenSSL extension as of version 5.6.0. Node.js implementation. === Deficiencies === The user interface needs to be improved in browsers, to make it more obvious to users when a server is asking for the client certificate.

    Read more →