Sparrow (chatbot)

Sparrow is a chatbot developed by the artificial intelligence research lab DeepMind, a subsidiary of Alphabet Inc. It is designed to answer users' questions correctly, while reducing the risk of unsafe and inappropriate answers. One motivation behind Sparrow is to address the problem of language models producing incorrect, biased or potentially harmful outputs. Sparrow is trained using human judgements, in order to be more “Helpful, Correct and Harmless” compared to baseline pre-trained language models. The development of Sparrow involved asking paid study participants to interact with Sparrow, and collecting their preferences to train a model of how useful an answer is. To improve accuracy and help avoid the problem of hallucinating incorrect answers, Sparrow has the ability to search the Internet using Google Search in order to find and cite evidence for any factual claims it makes. To make the model safer, its behaviour is constrained by a set of rules, for example "don't make threatening statements" and "don't make hateful or insulting comments", as well as rules about possibly harmful advice, and not claiming to be a person. During development study participants were asked to converse with the system and try to trick it into breaking these rules. A 'rule model' was trained on judgements from these participants, which was used for further training. Sparrow was introduced in a paper in September 2022, titled "Improving alignment of dialogue agents via targeted human judgements"; however, the bot was not released publicly. DeepMind CEO Demis Hassabis said DeepMind is considering releasing Sparrow for a "private beta" some time in 2023. == Training == Sparrow is a deep neural network based on the transformer machine learning model architecture. It is fine-tuned from DeepMind's Chinchilla AI pre-trained large language model (LLM), which has 70 Billion parameters. Sparrow is trained using reinforcement learning from human feedback (RLHF), although some supervised fine-tuning techniques are also used. The RLHF training utilizes two reward models to capture human judgements: a “preference model” that predicts what a human study participant would prefer and a “rule model” that predicts if the model has broken one of the rules. == Limitations == Sparrow's training data corpus is mainly in English, meaning it performs worse in other languages. When adversarially probed by study participants it breaks the rules 8% of the time; however, this is still three times lower than the baseline prompted pre-trained model (Chinchilla).

Splitwise

Splitwise is an online expense-splitting application software accessible via web browser and mobile app. The app facilitates repayments of shared bills by calculating what each person in a group owes. The primary competitor to the app is Venmo, which only operates in the U.S. Splitwise allows users to create groups with friends to determine what each person owes. All expenses and allocations are added to the app, and Splitwise simplifies the transaction history to determine exactly what payments need to be made to whom to settle outstanding balances. Splitwise stores user information via cloud storage. It was developed and is owned by Splitwise Inc., based in Providence, Rhode Island, United States. == History == The app was launched in February 2011 as SplitTheRent, intended to be used for rent splitting, by Ryan Laughlin, Jon Bittner and Marshall Weir. In September 2013, Splitwise was integrated with Venmo to allow users to settle payments via Venmo. In April 2024, Splitwise partnered with Tink, a Visa payment services company, to incorporate a bank transfer feature directly in the Splitwise app. === Financing === In December 2014, the company raised $1.4 million. In October 2016, the company raised $5 million. In April 2021, Splitwise raised $20 million in funding from series A round run by Insight Partners. == Reception == A 2022 opinion piece in The Guardian by London journalist Imogen West-Knights shared the negative effects of exactly splitting bills among friends and family members. West-Knights argued that Splitwise and similar apps can "turn people into those true enemies of all that is fun and joyful in the world: accountants." However, she said the app does work better when used by couples rather than friend groups. Other reviews noted that the app makes people petty. In contrast, an article published by Condé Nast Traveler describes how Splitwise eliminated stress caused by complicated offline bill splitting, saying it "fixed such a pervasive obstacle in group travel." Coverage by The Wall Street Journal lands somewhere in between the two contrasting views, saying Splitwise and similar apps are helpful, but users need to be prepared for difficult money-related conversations that may arise. An etiquette advisor at Debrett's, said, "The less talk you can have about money on any of these occasions, the better." An editor suggested conversations as simple as asking, "We’re splitting this evenly, right?" before a meal.

Viral marketing

Viral marketing is a business strategy that uses existing social networks to promote a product or service on social media platforms. Its name refers to how consumers spread information about a product with other people, much in the same way that a virus spreads from one person to another. It can be delivered by word of mouth, or enhanced by the network effects of the Internet and mobile networks. The concept is often misused or misunderstood, as people apply it to any successful enough story without taking into account the word "viral". Viral advertising is personal and, while coming from an identified sponsor, it does not mean businesses pay for its distribution. Most of the well-known viral ads circulating online are ads paid by a sponsor company, launched either on their own platform (company web page or social media profile) or on social media websites such as YouTube. Consumers receive the page link from a social media network or copy the entire ad from a website and pass it along through e-mail or posting it on a blog, web page or social media profile. Viral marketing may take the form of video clips, advergames, ebooks, brandable software, images, text messages, email messages, or web pages. The most commonly utilized transmission vehicles for viral messages include pass-along based, incentive based, trendy based, and undercover based. However, the creative nature of viral marketing enables an "endless amount of potential forms and vehicles the messages can utilize for transmission", including mobile devices. The ultimate goal of marketers interested in creating successful viral marketing programs is to create viral messages that appeal to individuals with high social networking potential (SNP) and that have a high probability of being presented and spread by these individuals and their competitors in their communications with others in a short period. The term "viral marketing" has also been used pejoratively to refer to stealth marketing campaigns—marketing strategies that advertise a product to people without them knowing they are being marketed to. == History == The emergence of "viral marketing", as an approach to advertisement, has been tied to the popularization of the notion that ideas spread like viruses. The field that developed around this notion, memetics, peaked in popularity in the 1990s. As this then began to influence marketing gurus, it took on a life of its own in that new context. The brief career of Australian pop singer Marcus Montana is largely remembered as an early example of viral marketing. In early 1989, thousands of posters declaring "Marcus is Coming" were placed around Sydney, generating discussion and interest within the media and the community about the meaning of the mysterious advertisements. The campaign successfully made Montana's musical debut a talking point, but his subsequent music career was a failure. The term is found in PC User magazine in 1989 with a somewhat differing meaning. It was later used by Jeffrey Rayport in the 1996 Fast Company article "The Virus of Marketing", and Tim Draper and Steve Jurvetson of the venture capital firm Draper Fisher Jurvetson in 1997 to describe Hotmail's practice of appending advertising to outgoing mail from their users. Doug Rushkoff, a media critic, wrote about viral marketing on the Internet in 1996. Bob Gerstley wrote about algorithms designed to identify people with high "social networking potential." Gerstley employed SNP algorithms in quantitative marketing research. In 2004, the concept of the alpha user was coined to indicate that it had now become possible to identify the focal members of any viral campaign, the "hubs" who were most influential. Alpha users could be targeted for advertising purposes most accurately in mobile phone networks, due to their personal nature. In early 2013, the first ever Viral Summit was held in Las Vegas. == Factors == Marketer Jonah Berger defines six key factors that drive virality, organized in an acronym called STEPPS: Social currency – the better something makes people look, the more likely they will be to share it Triggers – things that are "top of mind" are more likely to be "tip of the tongue" Emotion – when people care, they share Public – the easier something is to see, the more likely people are to imitate it Practical value – people share useful information to help others Stories – like a Trojan Horse, stories carry messages and ideas along for the ride. Another important factor that drives virality is the propagativity of the content, referring to the ease with which consumers can redistribute it. == Psychology == To form deeper connections with viewers and increase the chances of virality, many marketers use psychological principles. They argue that this approach is scientific and can foster an environment where the odds of gaining traction are much higher. People find psychological safety and can develop a sense of trust when more people interact with online content. For this reason, marketers work to develop media that resonates with viewers on a deeper, emotional level as this approach frequently results in higher engagement. This level of interaction serves as a sign of approval, reducing the personal risk that is subconsciously linked to associating oneself with a company or brand’s content. Professor Jonah Berger at the University of Pennsylvania's Wharton School of Business affirms that marketing campaigns that trigger psychological responses linked to strong emotions tend to perform better. In particular, Berger found that positive emotions like happiness, joy, and excitement have more successful share rates than their negative counterparts. This outcome results from the human instinct to respond more positively to content with activating emotions, increasing the desire to share content, which contributes to its virality. Viral marketing utilizes the primitive feeling of frisson to increase their view and share counts. This feeling of excitement is considered powerful because of its ability to cause a physical response. From increased heart rates to full body chills, Professor Brent Coker at the University of Melbourne describes that this approach to marketing triggers a primitive response that immerses the viewer in the content on a deeper level. Researchers Juliana Fernandes from the University of Florida and Sigal Segev from the Florida International University also found that people are more inclined to share emotional campaigns over those that are heavily informational. They claim that consumers do not often care to learn about a product’s actual features and benefits. Instead, people prefer to be immersed in experience-based content that creates an emotional impact. Companies and brands can benefit from treating their content in this manner and go viral more frequently than those who do not. Social proof is another psychological phenomenon that impacts viral content. Experts in this field argue that it is a natural instinct to want to behave similarly to others because it results in positive validation. This phenomenon explains the human need to conform, so marketers focus on creating engaging content that encourages interactions and causes a snowball effect. This subconsciously influences people to like, comment, and share if they already see others doing the same. Social proof goes further by providing people with a form of social currency. When individuals interact with and share content, they become associated with the topics at hand. People naturally tend to perceive one another, and this pattern carries over to the digital world. As a result, many people tend to be vigilant about the viral marketing they engage with, since they want to be perceived positively. Companies and brands have the opportunity to develop social currency themselves by aligning with their target audiences and creating marketing campaigns that fit their interests or match their values. == Methods and metrics == According to marketing professors Andreas Kaplan and Michael Haenlein, to make viral marketing work, three basic criteria must be met, i.e., giving the right message to the right messengers in the right environment: Messenger: Three specific types of messengers are required to ensure the transformation of an ordinary message into a viral one: market mavens, social hubs, and salespeople. Market mavens are individuals who are continuously 'on the pulse' of things (information specialists); they are usually among the first to get exposed to the message and who transmit it to their immediate social network. Social hubs are people with an exceptionally large number of social connections; they often know hundreds of different people and have the ability to serve as connectors or bridges between different subcultures. Salespeople might be needed who receive the message from the market maven, amplify it by making it more relevant and persuasive, and then transmit it to the social hub for further distr

Polygraphic substitution

Polygraphic substitution is a substitution cipher in which a uniform substitution is performed on blocks of letters. When the length of the block is specifically known, more precise terms are used: for instance, a cipher in which pairs of letters are substituted is bigraphic. As a concept, polygraphic substitution contrasts with monoalphabetic (or simple) substitutions in which individual letters are uniformly substituted, or polyalphabetic substitutions in which individual letters are substituted in different ways depending on their position in the text. In theory, there is some overlap in these definitions; one could conceivably consider a Vigenère cipher with an eight-letter key to be an octographic substitution. In practice, this is not a useful observation since it is far more fruitful to consider it to be a polyalphabetic substitution cipher. == Specific ciphers == In 1563, Giambattista della Porta devised the first bigraphic substitution. However, it was nothing more than a matrix of symbols. In practice, it would have been all but impossible to memorize, and carrying around the table would lead to risks of falling into enemy hands. In 1854, Charles Wheatstone came up with the Playfair cipher, a keyword-based system that could be performed on paper in the field. This was followed up over the next fifty years with the closely related four-square and two-square ciphers, which are slightly more cumbersome but offer slightly better security. In 1929, Lester S. Hill developed the Hill cipher, which uses matrix algebra to encrypt blocks of any desired length. However, encryption is very difficult to perform by hand for any sufficiently large block size, although it has been implemented by machine or computer. This is therefore on the frontier between classical and modern cryptography. == Cryptanalysis of general polygraphic substitutions == Polygraphic systems do provide a significant improvement in security over monoalphabetic substitutions. Given an individual letter 'E' in a message, it could be encrypted using any of 52 instructions depending on its location and neighbors, which can be used to great advantage to mask the frequency of individual letters. However, the security boost is limited; while it generally requires a larger sample of text to crack, it can still be done by hand. One can identify a polygraphically-encrypted text by performing a frequency chart of polygrams and not merely of individual letters. These can be compared to the frequency of plaintext English. The distribution of digrams is even more stark than individual letters. For example, the six most common letters in English (23%) represent approximately half of English plaintext, but it takes only the most frequent 8% of the 676 digrams to achieve the same potency. In addition, even in a plaintext many thousands of characters long, one would expect that nearly half of the digrams would not occur, or only barely. In addition, looking over the text one would expect to see a fairly regular scattering of repeated text in multiples of the block length and relatively few that are not multiples. Cracking a code identified as polygraphic is similar to cracking a general monoalphabetic substitution except with a larger 'alphabet'. One identifies the most frequent polygrams, experiments with replacing them with common plaintext polygrams, and attempts to build up common words, phrases, and finally meaning. Naturally, if the investigation led the cryptanalyst to suspect that a code was of a specific type, like a Playfair or order-2 Hill cipher, then they could use a more specific attack.

Torus interconnect

A torus interconnect is a switch-less network topology for connecting processing nodes in a parallel computer system. == Introduction == In geometry, a torus is created by revolving a circle about an axis coplanar to the circle. While this is a general definition in geometry, the topological properties of this type of shape describes the network topology in its essence. === Geometry illustration === In the representations below, the first is a one dimension torus, a simple circle. The second is a two dimension torus, in the shape of a 'doughnut'. The animation illustrates how a two dimension torus is generated from a rectangle by connecting its two pairs of opposite edges. At one dimension, a torus topology is equivalent to a ring interconnect network, in the shape of a circle. At two dimensions, it becomes equivalent to a two dimension mesh, but with extra connection at the edge nodes. === Torus network topology === A torus interconnect is a switch-less topology that can be seen as a mesh interconnect with nodes arranged in a rectilinear array of N = 2, 3, or more dimensions, with processors connected to their nearest neighbors, and corresponding processors on opposite edges of the array connected.[1] In this lattice, each node has 2N connections. This topology is named for the lattice formed in this way, which is topologically homogeneous to an N-dimensional torus. == Visualization == The first 3 dimensions of torus network topology are easier to visualize and are described below: 1D Torus: one dimension, n nodes are connected in closed loop with each node connected to its two nearest neighbors. Communication can take place in two directions, +x and −x. A 1D Torus is the same as ring interconnection. 2D Torus: two dimensions with degree of four, the nodes are imagined laid out in a two-dimensional rectangular lattice of n rows and n columns, with each node connected to its four nearest neighbors, and corresponding nodes on opposite edges connected. Communication can take place in four directions, +x, −x, +y, and −y. The total nodes of a 2D Torus is n2. 3D Torus: three dimensions, the nodes are imagined in a three-dimensional lattice in the shape of a rectangular prism, with each node connected with its six neighbors, with corresponding nodes on opposing faces of the array connected. Each edge consists of n nodes. communication can take place in six directions, +x, −x, +y, −y, +z, −z. Each edge of a 3D Torus consist of n nodes. The total nodes of 3D Torus is n3. ND Torus: N dimensions, each node of an N dimension torus has 2N neighbors, Communication can take place in 2N directions. Each edge consists of n nodes. Total nodes of this torus is nN. The main motivation of having higher dimension of torus is to achieve higher bandwidth, lower latency, and higher scalability. Higher-dimensional arrays are difficult to visualize. The above ruleset shows that each higher dimension adds another pair of nearest neighbor connections to each node. == Performance == A number of supercomputers on the TOP500 list use three-dimensional torus networks, e.g. IBM's Blue Gene/L and Blue Gene/P, and the Cray XT3. IBM's Blue Gene/Q uses a five-dimensional torus network. Fujitsu's K computer and the PRIMEHPC FX10 use a proprietary three-dimensional torus 3D mesh interconnect called Tofu. === 3D Torus performance simulation === Sandeep Palur and Dr. Ioan Raicu from Illinois Institute of Technology conducted experiments to simulate 3D torus performance. Their experiments ran on a computer with 250GB RAM, 48 cores and x86_64 architecture. The simulator they used was ROSS (Rensselaer’s Optimistic Simulation System). They mainly focused on three aspects: Varying network size Varying number of servers Varying message size They concluded that throughput decreases with the increase of servers and network size. Otherwise, throughput increases with the increase of message size. === 6D Torus product performance === Fujitsu Limited developed a 6D torus computer model called "Tofu". In their model, a 6D torus can achieve 100 GB/s off-chip bandwidth, 12 times higher scalability than a 3D torus, and high fault tolerance. The model is used in the K computer and Fugaku. === Cost === While long wrap-around links may be the easiest way to visualize the connection topology, in practice, restrictions on cable lengths often make long wrap-around links impractical. Instead, directly connected nodes—including nodes that the above visualization places on opposite edges of a grid, connected by a long wrap-around link—are physically placed nearly adjacent to each other in a folded torus network. Every link in the folded torus network is very short—almost as short as the nearest-neighbor links in a simple grid interconnect—and therefore low-latency.

Convolutional layer

In artificial neural networks, a convolutional layer is a type of network layer that applies a convolution operation to the input. Convolutional layers are some of the primary building blocks of convolutional neural networks (CNNs), a class of neural network most commonly applied to images, video, audio, and other data that have the property of uniform translational symmetry. The convolution operation in a convolutional layer involves sliding a small window (called a kernel or filter) across the input data and computing the dot product between the values in the kernel and the input at each position. This process creates a feature map that represents detected features in the input. == Concepts == === Kernel === Kernels, also known as filters, are small matrices of weights that are learned during the training process. Each kernel is responsible for detecting a specific feature in the input data. The size of the kernel is a hyperparameter that affects the network's behavior. === Convolution === For a 2D input x {\displaystyle x} and a 2D kernel w {\displaystyle w} , the 2D convolution operation can be expressed as: y [ i , j ] = ∑ m = 0 k h − 1 ∑ n = 0 k w − 1 x [ i + m , j + n ] ⋅ w [ m , n ] {\displaystyle y[i,j]=\sum _{m=0}^{k_{h}-1}\sum _{n=0}^{k_{w}-1}x[i+m,j+n]\cdot w[m,n]} where k h {\displaystyle k_{h}} and k w {\displaystyle k_{w}} are the height and width of the kernel, respectively. This generalizes immediately to nD convolutions. Commonly used convolutions are 1D (for audio and text), 2D (for images), and 3D (for spatial objects, and videos). === Stride === Stride determines how the kernel moves across the input data. A stride of 1 means the kernel shifts by one pixel at a time, while a larger stride (e.g., 2 or 3) results in less overlap between convolutions and produces smaller output feature maps. === Padding === Padding involves adding extra pixels around the edges of the input data. It serves two main purposes: Preserving spatial dimensions: Without padding, each convolution reduces the size of the feature map. Handling border pixels: Padding ensures that border pixels are given equal importance in the convolution process. Common padding strategies include: No padding/valid padding. This strategy typically causes the output to shrink. Same padding: Any method that ensures the output size same as input size is a same padding strategy. Full padding: Any method that ensures each input entry is convolved over for the same number of times is a full padding strategy. Common padding algorithms include: Zero padding: Add zero entries to the borders of input. Mirror/reflect/symmetric padding: Reflect the input array on the border. Circular padding: Cycle the input array back to the opposite border, like a torus. The exact numbers used in convolutions is complicated, for which we refer to (Dumoulin and Visin, 2018) for details. == Variants == === Standard === The basic form of convolution as described above, where each kernel is applied to the entire input volume. === Depthwise separable === Depthwise separable convolution separates the standard convolution into two steps: depthwise convolution and pointwise convolution. The depthwise separable convolution decomposes a single standard convolution into two convolutions: a depthwise convolution that filters each input channel independently and a pointwise convolution ( 1 × 1 {\displaystyle 1\times 1} convolution) that combines the outputs of the depthwise convolution. This factorization significantly reduces computational cost. It was first developed by Laurent Sifre during an internship at Google Brain in 2013 as an architectural variation on AlexNet to improve convergence speed and model size. === Dilated === Dilated convolution, or atrous convolution, introduces gaps between kernel elements, allowing the network to capture a larger receptive field without increasing the kernel size. === Transposed === Transposed convolution, also known as deconvolution, fractionally strided convolution, and upsampling convolution, is a convolution where the output tensor is larger than its input tensor. It's often used in encoder-decoder architectures for upsampling. It's used in image generation, semantic segmentation, and super-resolution tasks. == History == The concept of convolution in neural networks was inspired by the visual cortex in biological brains. Early work by Hubel and Wiesel in the 1960s on the cat's visual system laid the groundwork for artificial convolution networks. An early convolution neural network was developed by Kunihiko Fukushima in 1969. It had mostly hand-designed kernels inspired by convolutions in mammalian vision. In 1979 he improved it to the Neocognitron, which learns all convolutional kernels by unsupervised learning (in his terminology, "self-organized by 'learning without a teacher'"). During the 1988 to 1998 period, a series of CNN were introduced by Yann LeCun et al., ending with LeNet-5 in 1998. It was an early influential CNN architecture for handwritten digit recognition, trained on the MNIST dataset, and was used in ATM. (Olshausen & Field, 1996) discovered that simple cells in the mammalian primary visual cortex implement localized, oriented, bandpass receptive fields, which could be recreated by fitting sparse linear codes for natural scenes. This was later found to also occur in the lowest-level kernels of trained CNNs. The field saw a resurgence in the 2010s with the development of deeper architectures and the availability of large datasets and powerful GPUs. AlexNet, developed by Alex Krizhevsky et al. in 2012, was a catalytic event in modern deep learning. In that year’s ImageNet competition, the AlexNet model achieved a 16% top-five error rate, significantly outperforming the next best entry, which had a 26% error rate. The network used eight trainable layers, approximately 650,000 neurons, and around 60 million parameters, highlighting the impact of deeper architectures and GPU acceleration on image recognition performance. From the 2013 ImageNet competition, most entries adopted deep convolutional neural networks, building on the success of AlexNet. Over the following years, performance steadily improved, with the top-five error rate falling from 16% in 2012 and 12% in 2013 to below 3% by 2017, as networks grew increasingly deep.

Cypherpunks (book)

Cypherpunks: Freedom and the Future of the Internet is a 2012 book by Julian Assange, in discussion with Internet activists and cypherpunks Jacob Appelbaum, Andy Müller-Maguhn and Jérémie Zimmermann. Its primary topic is society's relationship with information security. In the book, the authors warn that the Internet has become a tool of the police state, and that the world is inadvertently heading toward a form of totalitarianism. They promote the use of cryptography to protect against state surveillance. In the introduction, Assange says that the book is "not a manifesto [...] [but] a warning". He told Guardian journalist Decca Aitkenhead: A well-defined mathematical algorithm can encrypt something quickly, but to decrypt it would take billions of years – or trillions of dollars' worth of electricity to drive the computer. So cryptography is the essential building block of independence for organisations on the Internet, just like armies are the essential building blocks of states, because otherwise one state just takes over another. There is no other way for our intellectual life to gain proper independence from the security guards of the world, the people who control physical reality. Assange later wrote in The Guardian: "Strong cryptography is a vital tool in fighting state oppression." saying that was the message of his book, Cypherpunks. Cypherpunks is published by OR Books. It is primarily a transcript of World Tomorrow episode eight, a two-part interview between Assange, Jacob Appelbaum, Andy Müller-Maguhn, and Jérémie Zimmermann. In the foreword, Assange said, "the Internet, our greatest tool for emancipation, has been transformed into the most dangerous facilitator of totalitarianism we have ever seen".