Negobot

Negobot

Negobot also referred to as Lolita or Lolita chatbot is a chatterbot that was introduced to the public in 2013, designed by researchers from the University of Deusto and Optenet to catch online pedophiles. It is a conversational agent that utilizes natural language processing (NLP), information retrieval (IR) and Automatic Learning. Because the bot poses as a young female in order to entice and track potential predators, it became known in media as the "virtual Lolita", in reference to Vladimir Nabokov's novel. == Background == In 2013, the University of Deusto researchers published a paper on their work with Negobot and disclosed the text online. In their abstract, the researchers addressed the issue that an increasing number of children are using the internet and that these young users are more susceptible to existing internet risks. Their main objective was to create a chatterbot with the ability to trap online predators that posed a threat to children. They intended to deploy the bot into sites frequented by predators such as social networks and chatrooms. The university researchers used information provided by anti-pedophilia activist organization Perverted-Justice, including examples of online encounters and conversations with sexual predators, to supplement the program's artificial intelligence system. == Features == === Programmed persona === The chatterbot takes the guise of a naive and vulnerable 14-year-old girl. The bot's programmers used methods of artificial intelligence and natural language processing to create a conversational agent fluent in typical teenage slang, misspellings, and knowledge of pop culture. Through these linguistic features, the bot is able to mimic the conversational style of young teenagers. It also features split personalities and seven different patterns of conversation. Negobot's primary creator, Dr. Carlos Laorden, expressed the significance of the bot's distinguishable style of communication, stating that normally, "chatbots tend to be very predictable. Their behavior and interest in a conversation are flat, which is a problem when attempting to detect untrustworthy targets like paedophiles." What makes Negobot different is its game theory feature, which makes it able to "maintain a much more realistic conversation." Apart from being able to imitate a stereotypical teenager, the program is also able to translate messages into different languages. === Game theory === Negobot's designers programmed it with the ability to treat conversations with potential predators as if it were a game, the objective being to collect as much information on the suspect as possible that could provide evidence of pedophilic characteristics and motives. The use of game theory shapes the decisions the bot makes and the overall direction of the conversation. The bot initiates its undercover operations by entering a chat as a passive participant, waiting to be chatted by a user. Once a user elicits conversation, the bot will frame the conversation in such a way that keeps the target engaged, extracting personal information and discouraging it from leaving the chat. The information is then recorded to be potentially sent to the police. If the target seems to lose interest, the bot attempts to make it feel guilty by expressing sentiments of loneliness and emotional need through strategic, formulated responses, ultimately prolonging interaction. In addition, the bot may provide fake information about itself in attempt to lure the target into physical meetings. === Limitations === Despite being able to carry out a realistic conversation, Negobot is still unable to detect linguistic subtleties in the messages of others, including sarcasm. == Controversy == John Carr, a specialist in online child safety, expressed his concern to BBC over the legality of this undercover investigation. He claimed that using the bot on unsuspecting internet users could be considered a form of entrapment or harassment. The type of information that Negobot collects from potential online predators, he said, is unlikely to be upheld in court. Furthermore, he warned that relying on only software without any real-world policing risks enticing individuals to do or say things that they would not have if real-world policing were a factor.

Butler in a Box

Butler in a Box was an early voice-controlled home automation device developed in 1983 by magician Gus Searcy and programmer Franz Kavan. The device allowed users to control various home electronics, such as lights and phones, using voice commands. It predated modern smart speakers and virtual assistants by several decades. == History == The idea for the Butler in a Box originated in 1983 when Searcy was asked by friends why he couldn't simply command lights to turn on and off if he could pull rabbits out of hats, given his background as a professional magician. Searcy partnered with former IBM programmer Kavan to develop the device, with their first prototype being named "Sidney". The Butler in a Box combined remote control technology with voice recognition to enable control of home devices. However, it faced challenges due to the technological limitations of the era and its high price point of nearly $1,500 (equivalent to around $3,700 in 2021). == Features and functionality == Users could activate the Butler in a Box by speaking a wake word, typically a traditional butler name, and the device would address the user as "boss". It was capable of performing tasks such as: Turning lights on and off, controlling individual zones if lights were connected to remote control modules Making and receiving phone calls Setting timers Pairing with sensors to function as a security alarm system However, the device required extensive voice training for each user, a time-consuming process compared to modern voice recognition. Additionally, settings and trained commands would be lost if power was out for over 3 hours due to the volatile memory technology used at the time. == Reception and legacy == While innovative for its time, the Butler in a Box did not achieve widespread commercial success due to its high price and the technical limitations of the 1980s. Nevertheless, it served as an important early step in the development of home automation and showcased the potential for voice-controlled technology to enhance accessibility and convenience in the home. Decades later, products like Amazon Alexa, Google Home, and Apple's Siri would make voice-controlled smart home devices commonplace and affordable, building on the groundwork laid by early attempts like the Butler in a Box.

Distributed concurrency control

Distributed concurrency control is the concurrency control of a system distributed over a computer network (Bernstein et al. 1987, Weikum and Vossen 2001). In database systems and transaction processing (transaction management) distributed concurrency control refers primarily to the concurrency control of a distributed database. It also refers to the concurrency control in a multidatabase (and other multi-transactional object) environment (e.g., federated database, grid computing, and cloud computing environments. A major goal for distributed concurrency control is distributed serializability (or global serializability for multidatabase systems). Distributed concurrency control poses special challenges beyond centralized one, primarily due to communication and computer latency. It often requires special techniques, like distributed lock manager over fast computer networks with low latency, like switched fabric (e.g., InfiniBand). The most common distributed concurrency control technique is strong strict two-phase locking (SS2PL, also named rigorousness), which is also a common centralized concurrency control technique. SS2PL provides both the serializability and strictness. Strictness, a special case of recoverability, is utilized for effective recovery from failure. For large-scale distribution and complex transactions, distributed locking's typical heavy performance penalty (due to delays, latency) can be saved by using the atomic commitment protocol, which is needed in a distributed database for (distributed) transactions' atomicity.

Ray tracing (graphics)

In 3D computer graphics, ray tracing is a technique for modeling light transport for use in a wide variety of rendering algorithms for generating digital images. On a spectrum of computational cost and visual fidelity, ray tracing-based rendering techniques, such as ray casting, recursive ray tracing, distribution ray tracing, photon mapping and path tracing, are generally slower and higher fidelity than scanline rendering methods. Thus, ray tracing was first deployed in applications where taking a relatively long time to render could be tolerated, such as still CGI images, and film and television visual effects (VFX), but was less suited to real-time applications such as video games, where speed is critical in rendering each frame. Since 2018, however, hardware acceleration for real-time ray tracing has become standard on new commercial graphics cards, and graphics APIs have followed suit, allowing developers to use hybrid ray tracing and rasterization-based rendering in games and other real-time applications with a lesser hit to frame render times. Ray tracing is capable of simulating a variety of optical effects, such as reflection, refraction, soft shadows, scattering, depth of field, motion blur, caustics, ambient occlusion and dispersion phenomena (such as chromatic aberration). It can also be used to trace the path of sound waves in a similar fashion to light waves, making it a viable option for more immersive sound design in video games by rendering realistic reverberation and echoes. In fact, any physical wave or particle phenomenon with approximately linear motion can be simulated with ray tracing. Ray tracing–based rendering techniques that sample light over a domain typically generate multiple rays and often rely on denoising to reduce the resulting noise. == History == The idea of ray tracing comes from as early as the 16th century, when it was described by Albrecht Dürer, who is credited for its invention. Dürer described multiple techniques for projecting 3-D scenes onto an image plane. Some of these project chosen geometry onto the image plane, as is done with rasterization today. Others determine what geometry is visible along a given ray, as is done with ray tracing. Using a computer for ray tracing to generate shaded pictures was first accomplished by Arthur Appel in 1968. Appel used ray tracing for primary visibility (determining the closest surface to the camera at each image point) by tracing a ray through each point to be shaded into the scene to identify the visible surface. The closest surface intersected by the ray was the visible one. This non-recursive ray tracing-based rendering algorithm is today called "ray casting". His algorithm then traced secondary rays to the light source from each point being shaded to determine whether the point was in shadow or not. Later, in 1971, Goldstein and Nagel of MAGI (Mathematical Applications Group, Inc.) published "3-D Visual Simulation", wherein ray tracing was used to make shaded pictures of solids. At the ray-surface intersection point found, they computed the surface normal and, knowing the position of the light source, computed the brightness of the pixel on the screen. Their publication describes a short (30-second) film "made using the University of Maryland's display hardware outfitted with a 16mm camera. The film showed the helicopter and a simple ground-level gun emplacement. The helicopter was programmed to undergo a series of maneuvers including turns, take-offs, and landings, etc., until it eventually is shot down and crashed." A CDC 6600 computer was used. MAGI produced an animation video called MAGI/SynthaVision Sampler in 1974. Another early instance of ray casting came in 1976, when Scott Roth created a flip book animation in Bob Sproull's computer graphics course at Caltech. The scanned pages are shown as a video in the accompanying image. Roth's computer program noted an edge point at a pixel location if the ray intersected a bounded plane different from that of its neighbors. Of course, a ray could intersect multiple planes in space, but only the surface point closest to the camera was noted as visible. The platform was a DEC PDP-10, a Tektronix storage-tube display, and a printer which would create an image of the display on rolling thermal paper. Roth extended the framework, introduced the term ray casting in the context of computer graphics and solid modeling, and in 1982 published his work while at GM Research Labs. Turner Whitted was the first to show recursive ray tracing for mirror reflection and for refraction through translucent objects, with an angle determined by the solid's index of refraction, and to use ray tracing for anti-aliasing. Whitted also showed ray traced shadows. He produced a recursive ray traced film called The Compleat Angler in 1979 while an engineer at Bell Labs. Whitted's deeply recursive ray tracing algorithm reframed rendering from being primarily a matter of surface visibility determination to being a matter of light transport. His paper inspired a series of subsequent work by others that included distribution ray tracing and finally unbiased path tracing, which provides the rendering equation framework that has allowed computer-generated imagery to be faithful to reality. For decades, global illumination in major films using computer-generated imagery was approximated with additional lights. Ray tracing-based rendering eventually changed that by enabling physically based light transport. Early feature films rendered entirely using path tracing include Monster House (2006), Cloudy with a Chance of Meatballs (2009), and Monsters University (2013). == Algorithm overview == Optical ray tracing describes a method for producing visual images constructed in 3D computer graphics environments, with more photorealism than either ray casting or scanline rendering techniques. It works by tracing a path from an imaginary eye through each pixel in a virtual screen, and calculating the color of the object visible through it. Scenes in ray tracing are described mathematically by a programmer or by a visual artist (normally using intermediary tools). Scenes may also incorporate data from images and models captured by means such as digital photography. Typically, each ray must be tested for intersection with some subset of all the objects in the scene. Once the nearest object has been identified, the algorithm will estimate the incoming light at the point of intersection, examine the material properties of the object, and combine this information to calculate the final color of the pixel. Certain illumination algorithms and reflective or translucent materials may require more rays to be re-cast into the scene. It may at first seem counterintuitive or "backward" to send rays away from the camera, rather than into it (as actual light does in reality), but doing so is many orders of magnitude more efficient. Since the overwhelming majority of light rays from a given light source do not make it directly into the viewer's eye, a "forward" simulation could potentially waste a tremendous amount of computation on light paths that are never recorded. Therefore, the shortcut taken in ray tracing is to presuppose that a given ray intersects the view frame. After either a maximum number of reflections or a ray traveling a certain distance without intersection, the ray ceases to travel and the pixel's value is updated. === Calculate rays for rectangular viewport === On input we have (in calculation we use vector normalization and cross product): E ∈ R 3 {\displaystyle E\in \mathbb {R^{3}} } eye position T ∈ R 3 {\displaystyle T\in \mathbb {R^{3}} } target position θ ∈ [ 0 , π ] {\displaystyle \theta \in [0,\pi ]} field of view - for humans, we can assume ≈ π / 2 rad = 90 ∘ {\displaystyle \approx \pi /2{\text{ rad}}=90^{\circ }} m , k ∈ N {\displaystyle m,k\in \mathbb {N} } numbers of square pixels on viewport vertical and horizontal direction i , j ∈ N , 1 ≤ i ≤ k ∧ 1 ≤ j ≤ m {\displaystyle i,j\in \mathbb {N} ,1\leq i\leq k\land 1\leq j\leq m} numbers of actual pixel v → ∈ R 3 {\displaystyle {\vec {v}}\in \mathbb {R^{3}} } vertical vector which indicates where is up and down, usually v → = [ 0 , 1 , 0 ] {\displaystyle {\vec {v}}=[0,1,0]} - roll component which determine viewport rotation around point C (where the axis of rotation is the ET section) The idea is to find the position of each viewport pixel center P i j {\displaystyle P_{ij}} which allows us to find the line going from eye E {\displaystyle E} through that pixel and finally get the ray described by point E {\displaystyle E} and vector R → i j = P i j − E {\displaystyle {\vec {R}}_{ij}=P_{ij}-E} (or its normalization r → i j {\displaystyle {\vec {r}}_{ij}} ). First we need to find the coordinates of the bottom left viewport pixel P 1 m {\displaystyle P_{1m}} and find the next pixel by making a shift along directions parallel to viewport (vectors b → n {\displaystyle {\vec {b}}_{n

Cowrie (honeypot)

Cowrie is a medium interaction SSH and Telnet honeypot designed to log brute force attacks and shell interaction performed by an attacker. Cowrie also functions as an SSH and telnet proxy to observe attacker behavior to another system. Cowrie was developed from Kippo. == Reception == Cowrie has been referenced in published papers. The Book "Hands-On Ethical Hacking and Network Defense" includes Cowrie in a list of 5 commercial honeypots. === Prior uses === Discussing a honeypot effort called the Project Heisenberg Cloud by Rapid7, Bob Rudis, the company's chief data scientist, told eWEEK, "There are custom Rapid7-developed low- and medium-interaction honeypots used within the framework, along with open-source ones, such as Cowrie." Doug Rickert has experimented with the open-source Cowrie SSH honeypot and wrote about it on Medium. Putting up a simple honeypot isn't difficult, and there are many open-source products besides Cowrie, including the original Honeyd to MongoDB and NoSQL honeypots, to ones that emulate web servers. Some appear to be SCADA or other more advanced applications. === Best practices === Researchers at the SysAdmin, Audit, Network and Security (SANS) institute urged administrators and security researchers to run the latest version of Cowrie on a honeypot to monitor shifts in the type of passwords being scanned for and pattern of attacks on IoT devices. === Discussion and further resources === Attack Detection and Forensics Using Honeypot in an IoT Environment calls Cowrie a "medium interaction honeypot" and describes results from using it for 40 days to capture "all communicated sessions in log files." The book Advances on Data Science also devotes chapter two to "Cowrie Honeypot Dataset and Logging." ICCWS 2018 13th International Conference on Cyber Warfare and Security describes using Cowrie. On the Move to Meaningful Internet Systems: OTM 2019 Conferences includes details of using Cowrie. Splunk, a security tool that can receive information from honeypots, outlines how to set up a honeypot using the open-source Cowrie package.

Representation collapse

Representation collapse is a phenomenon in machine learning and representation learning where a model maps different inputs to the same or very similar embeddings, which means it loses important information about how the data is spread out. It is frequently encountered in self-supervised learning, especially within contrastive and non-contrastive frameworks, when training objectives or model architectures do not maintain variance across representations. Collapse results in degenerate solutions characterized by uninformative learned features, significantly impairing downstream task performance. Various techniques have been proposed to mitigate representation collapse, including the use of negative samples, architectural asymmetry, stop-gradient operations, variance regularization, and redundancy reduction objectives, as seen in methods such as SimCLR, BYOL, and VICReg. Comprehending and averting representation collapse is regarded as a fundamental challenge in the advancement of stable and efficient self-supervised learning systems.

Variable data publishing

Variable-data publishing (VDP) (also known as database publishing) is a term referring to the output of a variable composition system. While these systems can produce both electronically viewable and hard-copy (print) output, the "variable-data publishing" term today often distinguishes output destined for electronic viewing, rather than that which is destined for hard-copy print (e.g. variable data printing). Essentially the same techniques are employed to perform variable-data publishing, as those utilized with variable data printing. The difference is in the interpretation for output. While variable-data printing may be interpreted to produce various print streams or page-description files (e.g. AFP/IPDS, PostScript, PCL), variable-data publishing produces electronically viewable files, most commonly seen in the forms of PDF, HTML, or XML. Variable-data composition involves the use of data to conditionally: exhibit text (static blocks and/or variable content) exhibit images select fonts select colors format page layouts & flows Variable-data may be as simple as an address block or salutation. However, it can be any or all of the document's textual content—including words, sentences, paragraphs, pages, or the entire document. In other words, it can make up as little or as much of the document as the composer desires. Variable data may also be used to exhibit various images, such as logos, products, or membership photos. Further, variable-data can be used to build rule-based design schemes, including fonts, colors, and page formats. The possibilities are vast. The variable-data tools available today, make it possible to perform variable-data composition at nearly every stage of document production. However, the level of control that can be achieved varies, based upon how far into the document production process a variable-data tool is deployed. For example, if variable-data insertion occurs just prior to output...it's not likely that the text flow or layout can be altered with nearly as much control as would be available at the time of initial document composition. Many organizations will produce multiple forms of output (aka: multi-channel output), for the same document. This ensures that the published content is available to recipients via any form of access method they might require. When multi-channel output is utilized, integrity between those output channels often becomes important. Variable-data publishing may be performed on everything from a personal computer to a mainframe system. However, the speed and practical output volumes which can be achieved are directly affected by the computer power utilized. == Origin of the concept == The term variable-data publishing was likely an offshoot of the term "variable-data printing", first introduced to the printing industry by Frank Romano, Professor Emeritus, School of Print Media, at the College of Imaging Arts and Sciences at Rochester Institute of Technology. However, the concept of merging static document elements and variable document elements predates the term and has seen various implementations ranging from simple desktop 'mail merge', to complex mainframe applications in the financial and banking industry. In the past, the term VDP has been most closely associated with digital printing machines. However, in the past 3 years the application of this technology has spread to web pages, emails, and mobile messaging.