AI Content Editor

AI Content Editor — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Machine learning in video games

    Machine learning in video games

    Artificial intelligence and machine learning techniques are used in video games for a wide variety of applications such as non-player character (NPC) control, procedural content generation (PCG) and deep learning-based content generation. Machine learning is a subset of artificial intelligence that uses historical data to build predictive and analytical models. This is in sharp contrast to traditional methods of artificial intelligence such as search trees and expert systems. Information on machine learning techniques in the field of games is mostly known to public through research projects as most gaming companies choose not to publish specific information about their intellectual property. The most publicly known application of machine learning in games is likely the use of deep learning agents that compete with professional human players in complex strategy games. There has been a significant application of machine learning on games such as Atari/ALE, Doom, Minecraft, StarCraft, and car racing. Other games that did not originally exists as video games, such as chess and Go have also been affected by the machine learning. == Overview of relevant machine learning techniques == === Deep learning === Deep learning is a subset of machine learning which focuses heavily on the use of artificial neural networks (ANN) that learn to solve complex tasks. Deep learning uses multiple layers of ANN and other techniques to progressively extract information from an input. Due to this complex layered approach, deep learning models often require powerful machines to train and run on. ==== Convolutional neural networks ==== Convolutional neural networks (CNN) are specialized ANNs that are often used to analyze image data. These types of networks are able to learn translation invariant patterns, which are patterns that are not dependent on location. CNNs are able to learn these patterns in a hierarchy, meaning that earlier convolutional layers will learn smaller local patterns while later layers will learn larger patterns based on the previous patterns. A CNN's ability to learn visual data has made it a commonly used tool for deep learning in games. === Recurrent neural network === Recurrent neural networks are a type of ANN that are designed to process sequences of data in order, one part at a time rather than all at once. An RNN runs over each part of a sequence, using the current part of the sequence along with memory of previous parts of the current sequence to produce an output. These types of ANN are highly effective at tasks such as speech recognition and other problems that depend heavily on temporal order. There are several types of RNNs with different internal configurations; the basic implementation suffers from a lack of long term memory due to the vanishing gradient problem, thus it is rarely used over newer implementations. ==== Long short-term memory ==== A long short-term memory (LSTM) network is a specific implementation of a RNN that is designed to deal with the vanishing gradient problem seen in simple RNNs, which would lead to them gradually "forgetting" about previous parts of an inputted sequence when calculating the output of a current part. LSTMs solve this problem with the addition of an elaborate system that uses an additional input/output to keep track of long term data. LSTMs have achieved very strong results across various fields, and were used by several monumental deep learning agents in games. === Reinforcement learning === Reinforcement learning is the process of training an agent using rewards and/or punishments. The way an agent is rewarded or punished depends heavily on the problem; such as giving an agent a positive reward for winning a game or a negative one for losing. Reinforcement learning is used heavily in the field of machine learning and can be seen in methods such as Q-learning, policy search, Deep Q-networks and others. It has seen strong performance in both the field of games and robotics. === Neuroevolution === Neuroevolution involves the use of both neural networks and evolutionary algorithms. Instead of using gradient descent like most neural networks, neuroevolution models make use of evolutionary algorithms to update neurons in the network. Researchers claim that this process is less likely to get stuck in a local minimum and is potentially faster than state of the art deep learning techniques. == Deep learning agents == Machine learning agents have been used to take the place of a human player rather than function as NPCs, which are deliberately added into video games as part of designed gameplay. Deep learning agents have achieved impressive results when used in competition with both humans and other artificial intelligence agents. === Chess === Chess is a turn-based strategy game that is considered a difficult AI problem due to the computational complexity of its board space. Similar strategy games are often solved with some form of a Minimax Tree Search. These types of AI agents have been known to beat professional human players, such as the historic 1997 Deep Blue versus Garry Kasparov match. Since then, machine learning agents have shown ever greater success than previous AI agents. === Go === Go is another turn-based strategy game which is considered an even more difficult AI problem than chess. The state space of is Go is around 10^170 possible board states compared to the 10^120 board states for Chess. Prior to recent deep learning models, AI Go agents were only able to play at the level of a human amateur. ==== AlphaGo ==== Google's 2015 AlphaGo was the first AI agent to beat a professional Go player. AlphaGo used a deep learning model to train the weights of a Monte Carlo tree search (MCTS). The deep learning model consisted of 2 ANN, a policy network to predict the probabilities of potential moves by opponents, and a value network to predict the win chance of a given state. The deep learning model allows the agent to explore potential game states more efficiently than a vanilla MCTS. The network were initially trained on games of humans players and then were further trained by games against itself. ==== AlphaGo Zero ==== AlphaGo Zero, another implementation of AlphaGo, was able to train entirely by playing against itself. It was able to quickly train up to the capabilities of the previous agent. === StarCraft series === StarCraft and its sequel StarCraft II are real-time strategy (RTS) video games that have become popular environments for AI research. Blizzard and DeepMind have worked together to release a public StarCraft 2 environment for AI research to be done on. Various deep learning methods have been tested on both games, though most agents usually have trouble outperforming the default AI with cheats enabled or skilled players of the game. ==== Alphastar ==== Alphastar was the first AI agent to beat professional StarCraft 2 players without any in-game advantages. The deep learning network of the agent initially received input from a simplified zoomed out version of the gamestate, but was later updated to play using a camera like other human players. The developers have not publicly released the code or architecture of their model, but have listed several state of the art machine learning techniques such as relational deep reinforcement learning, long short-term memory, auto-regressive policy heads, pointer networks, and centralized value baseline. Alphastar was initially trained with supervised learning, it watched replays of many human games in order to learn basic strategies. It then trained against different versions of itself and was improved through reinforcement learning. The final version was hugely successful, but only trained to play on a specific map in a protoss mirror matchup. === Dota 2 === Dota 2 is a multiplayer online battle arena (MOBA) game. Like other complex games, traditional AI agents have not been able to compete on the same level as professional human player. The only widely published information on AI agents attempted on Dota 2 is OpenAI's deep learning Five agent. ==== OpenAI Five ==== OpenAI Five utilized separate long short-term memory networks to learn each hero. It trained using a reinforcement learning technique known as Proximal Policy Learning running on a system containing 256 GPUs and 128,000 CPU cores. Five trained for months, accumulating 180 years of game experience each day, before facing off with professional players. It was eventually able to beat the 2018 Dota 2 esports champion team in a 2019 series of games. === Planetary Annihilation === Planetary Annihilation is a real-time strategy game which focuses on massive scale war. The developers use ANNs in their default AI agent. === Supreme Commander 2 === Supreme Commander 2 is a real-time strategy (RTS) video game. The game uses Multilayer Perceptrons (MLPs) to control a platoon’s reaction to encountered enemy units. Total of four MLPs are used, one for each platoon type: land, naval

    Read more →
  • Human rights and encryption

    Human rights and encryption

    Human rights and encryption refers to the ways in which digital encryption affects human rights. Encryption can be used as both a detriment and a boon to human rights; for example, encryption can be used to enforce digital rights management for video games. This kind of video game licensing can render software unusable long term and represents the erosion of consumer rights. At the same time, encryption is fundamental part of internet security. Asymmetrical encryption is used extensively online for authentication, providing users confidence their internet traffic is not being misdirected. Encryption is also used to obfuscate information as it travels from end-to-end over the internet, preventing eavesdropping and tampering. Encryption can also provide anonymity, which is an important consideration for freedom of expression. Despite its drawbacks, encryption is essential for a free, open, and trustworthy internet. == Background == === Human rights === Human rights are moral principles or norms for human behaviour that are regularly protected as legal rights in national and international law. They are commonly understood as inalienable, fundamental rights "to which a person is inherently entitled simply because they are a human being". Those rights are "inherent in all human beings" regardless of their nationality, location, language, religion, ethnic origin, or any other status. They are applicable everywhere and at every time and are universal and egalitarian. === Cryptography === Cryptography is a long-standing subfield of both mathematics and computer science. It can generally be defined as "the protection of information and computation using mathematical techniques." Encryption and cryptography are closely interlinked, although "cryptography" has a broader meaning. For example, a digital signature is "cryptography", but not technically "encryption". == Overview == Under international human rights law, freedom of expression is recognized as a human right under Article 19 of the Universal Declaration of Human Rights (UDHR) and the International Covenant on Civil and Political Rights (ICCPR). In Article 19 of the UDHR states that "everyone shall have the right to hold opinions without interference" and "everyone shall have the right to freedom of expression; this right shall include freedom to seek, receive and impart information and ideas of all kinds, regardless of frontiers, either orally, in writing or in print, in the form of art, or through any other media of his choice". Since the 1970s, the availability of digital computing and the invention of public-key cryptography have made encryption more widely available. (Previously, encryption techniques were the domain of nation-state actors.) Cryptographic techniques are also used to protect the anonymity of communicating actors and privacy more generally. The availability and use of encryption continue to lead to complex, important, and highly contentious legal policy debates. Some government agencies have made statements or proposals to lessen such usage and deployment due to hurdles it presents for government access. The rise of commercial end-to-end encryption services have pushed towards more debates around the use of encryption and the legal status of cryptography in general. Encryption, as defined above, is a set of cryptographic techniques to protect information. The normative value of encryption, however, is not fixed but varies with the type and purpose of the cryptographic methods used. Traditionally, encryption (cipher) techniques were used to ensure the confidentiality of communications and prevent access to information and communications by others and intended recipients. Cryptography can also ensure the authenticity of communicating parties and the integrity of communications contents, providing a key ingredient for enabling trust in the digital environment. There is a growing awareness within human rights organizations that encryption plays an important role in realizing a free, open, and trustworthy Internet. UN Special Rapporteur on the promotion and protection of the right to freedom of opinion and expression David Kaye observed, during the Human Rights Council in June 2015, that encryption and anonymity deserve a protected status under the rights to privacy and freedom of expression: "Encryption and anonymity, today's leading vehicles for online security, provide individuals with a means to protect their privacy, empowering them to browse, read, develop and share opinions and information without interference and enabling journalists, civil society organizations, members of ethnic or religious groups, those persecuted because of their sexual orientation or gender identity, activists, scholars, artists and others to exercise the rights to freedom of opinion and expression." == Encryption in media and communication == In the context of media and communication, two types of encryption in media and communication can be distinguished: It could be used as a result of the choice of a service provider or deployed by Internet users. Client-side encryption tools and technologies are relevant for marginalized communities, journalists and other online media actors practicing journalism as a way of protecting their rights. It could prevent unauthorized third party access, but the service provider implementing it would still have access to the relevant user data. End-to-end encryption is an encryption technique that refers to encryption that also prevents service providers themselves from having access to the user's communications. The implementation of these forms of encryption has sparked the most debate since the start of the 21st century. === Service providers deployed techniques to prevent unauthorized third-party access. === Among the most widely deployed cryptographic techniques is the securitization of communications channel between internet users and specific service providers from man-in-the-middle attacks, access by unauthorized third parties. Given the breadth of nuances involved, these cryptographic techniques must be run jointly by both the service user and the service provider in order to work properly. They require service providers, including online news publisher(s) or social network(s), to actively implement them into service design. Users cannot deploy these techniques unilaterally; their deployment is contingent on active participation by the service provider. The TLS protocol, which becomes visible to the normal internet user through the HTTPS header, is widely used for securing online commerce, e-government services and health applications as well as devices that make up networked infrastructures, e.g., routers, cameras. However, although the standard has been around since 1990, the wider spread and evolution of the technology has been slow. As with other cryptographic methods and protocols, the practical challenges related to proper, secure and (wider) deployment are significant and have to be considered. Many service providers still do not implement TLS or do not implement it well. In the context of wireless communications, the use of cryptographic techniques that protect communications from third parties are also important. Different standards have been developed to protect wireless communications: 2G, 3G and 4G standards for communication between mobile phones, base stations and base stations controllers; standards to protect communications between mobile devices and wireless routers ('WLAN'); and standards for local computer networks. One common weakness in these designs is that the transmission points of the wireless communication can access all communications e.g., the telecommunications provider. This vulnerability is exacerbated when wireless protocols only authenticate user devices, but not the wireless access point. Whether the data is stored on a device, or on a local server as in the cloud, there is also a distinction between 'at rest'. Given the vulnerability of cellphones to theft for instance, particular attention may be given to limiting service provided access. This does not exclude the situation that the service provider discloses this information to third parties like other commercial entities or governments. The user needs to trust the service provider to act in their interests. The possibility that a service provider is legally compelled to hand over user information or to interfere with particular communications with particular users, remains. === Privacy-enhancing Technologies === There are services that specifically market themselves with claims not to have access to the content of their users' communication. Service Providers can also take measures that restrict their ability to access information and communication, further increasing the protection of users against access to their information and communications. The integrity of these Privacy Enhancing Technologies (PETs), depends on delicate design decisions as well as the

    Read more →
  • Data storage

    Data storage

    Data storage is the recording (storing) of information (data) in a storage medium. Handwriting, phonographic recording, magnetic tape, and optical discs are all examples of storage media. Biological molecules such as RNA and DNA are considered by some as data storage. Recording may be accomplished with virtually any form of energy. Electronic data storage requires electrical power to store and retrieve data. Data stored in a digital, machine-readable medium is called digital data. Computer data storage is one of the core functions of a general-purpose computer. Electronic documents can be stored in much less space than paper documents. Barcodes and magnetic ink character recognition (MICR) are two ways of recording machine-readable data on paper. == Recording media == A recording medium is physical material that holds information. Newly created information is distributed and can be stored in four storage media–print, film, magnetic, and optical–and seen or heard in four information flows–telephone, radio, TV, and the Internet as well as being observed directly. Digital information is stored on electronic media in many different recording formats. With electronic media, the data and the recording media are sometimes referred to as "software" despite the more common use of the word to describe computer software. With (traditional art) static media, art materials such as crayons may be considered both equipment and medium as the wax, charcoal or chalk material from the equipment becomes part of the surface of the medium. Some recording media may be temporary, either by design or by nature. Volatile organic compounds may be used to purposely make data expire over time or to reduce environmental impact. Data such as smoke signals or skywriting are temporary by nature. Depending on the volatility, a gas (e.g., atmosphere, smoke) or a liquid surface such as a lake would be considered a temporary recording medium, if it could be considered a recording medium at all. == Global capacity, digitization, and trends == A 2003 UC Berkeley report estimated that about five exabytes of new information were produced in 2002 and that 92% of this data was stored on magnetic media (primarily hard disk drives). This was about twice the data produced in 1999. The amount of data transmitted over telecommunications systems in 2002 was nearly 18 exabytes—three and a half times more than was recorded on non-volatile storage. Telephone calls constituted 98% of the telecommunicated information in 2002. The researchers' highest estimate for the growth rate of newly stored information (uncompressed) was more than 30% per year. In a more limited study, the International Data Corporation estimated that the total amount of digital data in 2007 was 281 exabytes and that the total amount of digital data produced exceeded the global storage capacity for the first time. A 2011 article in Science estimated that the year 2002 was the beginning of the digital age for information storage: an age in which more information is stored on digital storage devices than on analog storage devices. In 1986, approximately 1% of the world's capacity to store information was in digital format; this grew to 3% by 1993, to 25% by 2000, and to 94% by 2007. These figures correspond to less than three compressed exabytes in 1986, and 295 compressed exabytes in 2007. The quantity of digital storage doubled roughly every three to four years. It is estimated that around 120 zettabytes of data will be generated in 2023, an increase of 60x from 2010, and that it will increase to 181 zettabytes generated in 2025. == Mass storage ==

    Read more →
  • Data recovery

    Data recovery

    In computing, data recovery is a process of retrieving deleted, inaccessible, lost, corrupted, damaged, or overwritten data from secondary storage, removable media or files, when the data stored in them cannot be accessed in a usual way. The data is most often salvaged from storage media such as internal or external hard disk drives (HDDs), solid-state drives (SSDs), USB flash drives, magnetic tapes, CDs, DVDs, RAID subsystems, and other electronic devices. Recovery may be required due to physical damage to the storage devices or logical damage to the file system that prevents it from being mounted by the host operating system (OS). Logical failures occur when the hard drive devices are functional but the user or automated-OS cannot retrieve or access data stored on them. Logical failures can occur due to corruption of the engineering chip, lost partitions, firmware failure, or failures during formatting/re-installation. Data recovery can be a very simple or technical challenge. This is why there are specific software companies specialized in this field that help to get back data on your system. == About == The most common data recovery scenarios involve an operating system failure, malfunction of a storage device, logical failure of storage devices, accidental damage or deletion, etc. (typically, on a single-drive, single-partition, single-OS system), in which case the ultimate goal is simply to copy all important files from the damaged media to another new drive. This can be accomplished using a Live CD, or DVD by booting directly from a ROM or a USB drive instead of the corrupted drive in question. Many Live CDs or DVDs provide a means to mount the system drive and backup drives or removable media, and to move the files from the system drive to the backup media with a file manager or optical disc authoring software. Such cases can often be mitigated by disk partitioning and consistently storing valuable data files (or copies of them) on a different partition from the replaceable OS system files. Another scenario involves a drive-level failure, such as a compromised file system or drive partition, or a hard disk drive failure. In any of these cases, the data is not easily read from the media devices. Depending on the situation, solutions involve repairing the logical file system, partition table, or master boot record, or updating the firmware or drive recovery techniques ranging from software-based recovery of corrupted data, to hardware- and software-based recovery of damaged service areas (also known as the hard disk drive's "firmware"), to hardware replacement on a physically damaged drive which allows for the extraction of data to a new drive. If a drive recovery is necessary, the drive itself has typically failed permanently, and the focus is rather on a one-time recovery, salvaging whatever data can be read. In a third scenario, files have been accidentally "deleted" from a storage medium by the users. Typically, the contents of deleted files are not removed immediately from the physical drive; instead, references to them in the directory structure are removed, and thereafter space the deleted data occupy is made available for later data overwriting. In the mind of end users, deleted files cannot be discoverable through a standard file manager, but the deleted data still technically exists on the physical drive. In the meantime, the original file contents remain, often several disconnected fragments, and may be recoverable if not overwritten by other data files. The term "data recovery" is also used in the context of forensic applications or espionage, where data which have been encrypted, hidden, or deleted, rather than damaged, are recovered. Sometimes data present in the computer gets encrypted or hidden due to reasons like virus attacks which can only be recovered by some computer forensic experts. == Physical damage == A wide variety of failures can cause physical damage to storage media, which may result from human errors and natural disasters. CD-ROMs can have their metallic substrate or dye layer scratched off; hard disks can suffer from a multitude of mechanical failures, such as head crashes, PCB failure, and failed motors; tapes can simply break. Physical damage to a hard drive, even in cases where a head crash has occurred, does not necessarily mean permanent data loss. However, in extreme cases, such as prolonged exposure to moisture and corrosion —like the lost Bitcoin hard drive of James Howells, buried in the Newport landfill for over a decade — recovery is usually impossible. In rare cases, forensic techniques such as magnetic force microscopy (MFM) have been explored to detect residual magnetic traces when data holds exceptional value. Other techniques employed by many professional data recovery companies can typically salvage most, if not all, of the data that had been lost when the failure occurred. Of course, there are exceptions to this, such as cases where severe damage to the hard drive platters may have occurred. However, if the hard drive can be repaired and a full image or clone created, then the logical file structure can be rebuilt in most instances. Most physical damage cannot be repaired by end users. For example, opening a hard disk drive in a normal environment can allow airborne dust to settle on the platter and become caught between the platter and the read/write head. During normal operation, read/write heads float 3 to 6 nanometers above the platter surface, and the average dust particles found in a normal environment are typically around 30,000 nanometers in diameter. When these dust particles get caught between the read/write heads and the platter, they can cause new head crashes that further damage the platter and thus compromise the recovery process. Furthermore, end users generally do not have the hardware or technical expertise required to make these repairs. Consequently, data recovery companies are often employed to salvage important data with the more reputable ones using class 100 dust- and static-free cleanrooms. === Recovery techniques === Recovering data from physically damaged hardware can involve multiple techniques. Some damage can be repaired by replacing parts in the hard disk. This alone may make the disk usable, but there may still be logical damage. A specialized disk-imaging procedure is used to recover every readable bit from the surface. Once this image is acquired and saved on a reliable medium, the image can be safely analyzed for logical damage and will possibly allow much of the original file system to be reconstructed. ==== Hardware repair ==== A common misconception is that a damaged printed circuit board (PCB) may be simply replaced during recovery procedures by an identical PCB from a healthy drive. While this may work in rare circumstances on hard disk drives manufactured before 2003, it will not work on newer drives. Electronics boards of modern drives usually contain drive-specific adaptation data (generally a map of bad sectors and tuning parameters) and other information required to properly access data on the drive. Replacement boards often need this information to effectively recover all of the data. The replacement board may need to be reprogrammed. Some manufacturers (Seagate, for example) store this information on a serial EEPROM chip, which can be removed and transferred to the replacement board. Each hard disk drive has what is called a system area or service area; this portion of the drive, which is not directly accessible to the end user, usually contains drive's firmware and adaptive data that helps the drive operate within normal parameters. One function of the system area is to log defective sectors within the drive; essentially telling the drive where it can and cannot write data. The sector lists are also stored on various chips attached to the PCB, and they are unique to each hard disk drive. If the data on the PCB do not match what is stored on the platter, then the drive will not calibrate properly. In most cases the drive heads will click because they are unable to find the data matching what is stored on the PCB. == Logical damage == The term "logical damage" refers to situations in which the error is not a problem in the hardware and requires software-level solutions. === Corrupt partitions and file systems, media errors === In some cases, data on a hard disk drive can be unreadable due to damage to the partition table or file system, or to (intermittent) media errors. In the majority of these cases, at least a portion of the original data can be recovered by repairing the damaged partition table or file system using specialized data recovery software such as TestDisk; software like ddrescue can image media despite intermittent errors, and image raw data when there is partition table or file system damage. This type of data recovery can be performed by people without expertise in drive hardware as it requires no special physica

    Read more →
  • Computers & Graphics

    Computers & Graphics

    Computers & Graphics is a peer-reviewed scientific journal that covers computer graphics and related subjects such as data visualization, human-computer interaction, virtual reality, and augmented reality. It was established in 1975 and originally published by Pergamon Press. It is now published by Elsevier, which acquired Pergamon Press in 1991. From 2018 to 2022 Graphics and Visual Computing was an open access sister journal sharing the same editorial team and double-blind peer-review policies. It has since merged into GMOD, the International Journal of Graphical Models. == History == The journal was established in 1975 by founding editor-in-chief Robert Schiffman (University of Colorado, Boulder), as Computers & Graphics-UK. Schiffman, who co-organized the first SIGGRAPH conference in 1974, had the conference proceedings published as the first issue of the journal. He was succeeded in 1978 by Larry Feeser (Rensselaer Polytechnic Institute). In 1983 José Luis Encarnação (Technische Hochschule Darmstadt) took over. Joaquim Jorge (University of Lisbon) has been Editor-in-Chief since 2007. == Replicability == The journal is working with the Graphics Replicability Stamp Initiative to promote replicable results in publication. == Abstracting and indexing == The journal is abstracted and indexed in: Current Contents/Engineering, Computing & Technology EBSCO databases Ei Compendex Inspec ProQuest databases Science Citation Index Expanded Scopus Chinese Computer Federation/Recommended List of International Conferences and Journals on CAD & Graphics and Multimedia. According to the Journal Citation Reports, the journal has a 2022 impact factor of 2.5.

    Read more →
  • SIGINT Activity Designator

    SIGINT Activity Designator

    A SIGINT Activity Designator (or SIGAD) identifies a signals intelligence (SIGINT) line of collection activity associated with a signals collection station, such as a base or a ship. For example, the SIGAD for Menwith Hill in the UK is USD1000. SIGADs are used by the signals intelligence agencies of Australia, Canada, New Zealand, the United Kingdom, and the United States (the Five Eyes). There are several thousand SIGADs including the substation SIGADs denoted with a trailing alpha character. Several dozen of these are significant. The leaked Boundless Informant reporting screenshot showed that it summarized 504 active SIGADs during a 30-day period in March 2013. == General format == A SIGAD consists of five to eight case insensitive alphanumeric characters. It takes the general form of an alphanumeric designator normally composed of a two- or three-letter prefix followed by one to three numbers. Often a dash is used to separate the alphabetic and numeric characters in the primary part of the designator, but less frequently a space is used as a separator or the alphabetic and numeric characters are concatenated together. An additional alphabetic character can be added to denote a sub-designator for a subset of the primary unit, such as a detachment. Lastly, a numeric character can be added after the aforementioned alphabetic to provide for a sub-sub-designator. In the examples below an X represents an alphabetic character and an N represents a numeric character that are part of the primary designator. Likewise, an x represents an alphabetic character and an n represents a numeric character that are part of a sub-designator. Here are valid generalized examples of SIGADs: The first two characters show which country operates the particular SIGINT facility, which can be US for the United States, UK for the United Kingdom, CA for Canada, AU for Australia and NZ for New Zealand. A third letter shows what sort of staff runs the station. SIGADs beginning with US without a third letter are used for intercept facilities run by the NSA. == PRISM SIGAD == One prominent SIGAD as of April 2013 is US-984XN, with an unclassified codename of PRISM. It is "the number one source of raw intelligence used for NSA analytic reports" according to National Security Agency sources in a document leaked by Edward Snowden. The President's Daily Brief, an all-source intelligence product, cited SIGAD US-984XN as a source in 1,477 items in 2012. The U.S. government operates the PRISM electronic surveillance collection program through NSA's Special Source Operations, an alliance with trusted telecommunications providers. == SIGADs for spy ships == The declassified SIGAD for the USS Liberty (AGTR-5) was USN-855. The USS Liberty incident occurred on 8 June 1967, during the Six-Day War, when Israeli Air Force jet fighter aircraft and Israeli Navy motor torpedo boats attacked the USS Liberty in international waters. The USS Pueblo (AGER-2) was a technical research ship, which was boarded and captured by North Korean forces on 23 January 1968, in what is known as the Pueblo incident. The declassified SIGAD for the NSA Direct Support Unit (DSU) from the Naval Security Group (NSG) on the USS Pueblo patrol involved in the incident was USN-467Y. The USS Pueblo, which officially remains a commissioned vessel of the United States Navy, is the only ship of the U.S. Navy currently being held captive. == Vietnam War SIGADs == The following are the Vietnam War-era declassified SIGADs from inside South Vietnam during the period of 1969 to 1975: Some locations have multiple SIGADs due to different types of collection activities and/or collection at different times during the period. The SIGADs beginning with USA were operated by the United States Air Force's United States Air Force Security Service (USAFSS). The SIGADs beginning with USM were operated by the United States Army's Army Security Agency (ASA). Lastly, the SIGADs beginning with USN were operated by the United States Navy's Naval Security Group (NAVSECGRU). All three of these units have been merged into other units or inactivated. The above list consists of the higher-echelon SIGADs. It does not include the numerous miscellaneous and temporary detachments, or direction finding stations belonging to major units or sites unless that detachment or site was the only one stationed in South Vietnam. Many of the "dets" were short-lived, often formed to support ongoing MACV operations or forward deployments of combat operational or maneuver units. These detachments usually were designated by a letter suffix attached to the higher-echelon SIGAD such as "USM-633J," which was a detachment of the 372d Radio Research Company, USM-633, supporting the United States Army's 25th Infantry Division. === Supporting Southeast Asia SIGADs === The following declassified SIGADs were highly relevant to the Vietnam Campaign, but were located in areas outside of South Vietnam in Southeast Asia. Again, detachments are not listed separately. In the case of the USS Maddox, naval Direct Support Units (DSUs) used the SIGAD USN-467 as a generic designator for their missions. Each specific patrol received a letter suffix for its duration. The subsequent mission would receive the next letter in an alphabetic sequence. Thus, SIGAD USN-467N specifically designates the USS Maddox patrol involved with the Gulf of Tonkin incident. == Joint Base SIGADs == In November 2005, the US Congress performed a fifth round of Base Realignment and Closure. This 2005 law also created twelve joint bases by merging adjacent installations belonging to different services in an effort to reduce costs and improve efficiencies. Joint bases with a primarily SIGINT mission have SIGADs that begin with USJ. A joint base would have a primary SIGAD in the general form of USJ-NNN, where NNN are numeric characters. An actual example is not given, since these units are currently active.

    Read more →
  • Social search

    Social search

    Social search is a behavior of retrieving and searching on a social searching engine that mainly searches user-generated content such as news, videos and images related search queries on social media like Facebook, LinkedIn, Twitter, Instagram and Flickr. It is an enhanced version of web search that combines traditional algorithms. The idea behind social search is that instead of ranking search results purely based on semantic relevance between a query and the results, a social search system also takes into account social relationships between the results and the searcher. The social relationships could be in various forms. For example, in LinkedIn people search engine, the social relationships include social connections between searcher and each result, whether or not they are in the same industries, work for the same companies, belong the same social groups, and go the same schools, etc. Social search may not be demonstrably better than algorithm-driven search. In the algorithmic ranking model that search engines used in the past, relevance of a site is determined after analyzing the text and content on the page and link structure of the document. In contrast, search results with social search highlight content that was created or touched by other users who are in the Social Graph of the person conducting a search. It is a personalized search technology with online community filtering to produce highly personalized results. Social search takes many forms, ranging from simple shared bookmarks or tagging of content with descriptive labels to more sophisticated approaches that combine human intelligence with computer algorithms. Depending on the feature-set of a particular search engine, these results may then be saved and added to community search results, further improving the relevance of results for future searches of that keyword. The principle behind social search is that human network oriented results would be more meaningful and relevant for the user, instead of computer algorithms deciding the results for specific queries. == Research and implementations == Over the years, there have been different studies, researches and some implementations of Social Search. In 2008, there were a few startup companies that focused on ranking search results according to one's social graph on social networks. Companies in the social search space include Sproose, Mahalo, Jumper 2.0, Scour, Wink, Eurekster, and Delver. Former efforts include Wikia Search. In 2008, a story on TechCrunch showed Google potentially adding in a voting mechanism to search results similar to Digg's methodology. This suggests growing interest in how social groups can influence and potentially enhance the ability of algorithms to find meaningful data for end users. There are also other services like Sentiment that turn search personal by searching within the users' social circles. In 2009, a startup project called HeyStaks (www.heystaks.com) developed a web browser plugin "HayStaks". HeyStaks applies social search through collaboration in web search as a way that leads to better search results. The main motivation for HeyStaks to work on this idea is to provide the user with features that search engines didn't provide at that time. For instance, different searches have indicated that about 70% of the time when user search for something, a friend or a coworker have found it already. Also, studies have shown that approximately, 30% of people who use online search, search for something that they have found before. The startup believe that they help avoid these kind of issues by providing a shared and rich search experience through a list of recommendations that get generated based on search results. In October 2009, Google rolled out its "Social Search"; after a time in beta, the feature was expanded to multiple languages in May 2011. Before the expansion however in 2010 Bing and Google were already taking into account re-tweets and Likes when providing search results. However, after a search deal with Twitter ended without renewal, Google began to retool its Social Search. In January 2012, Google released "Search plus Your World", a further development of Social Search. The feature, which is integrated into Google's regular search as an opt-out feature, pulls references to results from Google+ profiles. The goal was to deliver better, more relevant and personalized search results with this integration. This integration however had some problems in which Google+ still is not wildly adopted or has much usage among many users. Later on, Google was criticized by Twitter for the perceived potential impact of "Search plus Your World" upon web publishers, describing the feature's release to the public as a "bad day for the web", while Google replied that Twitter refused to allow deep search crawling by Google of Twitter's content. By Google integrating Google+, the company was encouraging users to switch to Google's social networking site in order to improve search results. One famous example occurred when Google showed a link to Mark Zuckerberg's dormant Google+ account rather than the active Facebook profile. In November 2014 these accusations started to die down because Google's Knowledge Graph started to finally show links to Facebook, Twitter, and other social media sites. In December 2008, Twitter had re-introduced their people search feature. While the interface had since changed significantly, it allows you to search either full names or usernames in a straight-forward search engine. In January 2013, Facebook announced a new search engine called Graph Search still in the beta stages. The goal was to allow users to prioritize results that were popular with their social circle over the general internet. Facebook's Graph search utilized Facebook's user generated content to target users. Although there have been different researches and studies in social search, social media networks have not vested enough interest in working with search engines. LinkedIn for example has taken steps to improve its own individual search functions in order to stray users from external search engines. Even Microsoft started working with Twitter in order to integrate some tweets into Bing's search results in November 2013. Yet Twitter has its own search engine which points out how much value their data has and why they would like to keep it in house. In the end though social search will never be truly comprehensive of the subjects that matter to people unless users opt to be completely public with their information. == Social discovery == Social discovery is the use of social preferences and personal information to predict what content will be desirable to the user. Technology is used to discover new people and sometimes new experiences shopping, meeting friends or even traveling. The discovery of new people is often in real-time, enabled by mobile apps. However, social discovery is not limited to meeting people in real-time, it also leads to sales and revenue for companies via social media. An example of retail would be the addition of social sharing with music, through the iTunes music store. There is a social component to discovering new music Social discovery is at the basis of Facebook's profitability, generating ad revenue by targeting the ads to users using the social connections to enhance the commercial appeal. == Social search engines == A social search engine in an aspect can be thought of as a search engine that provides an answer for a question from another answer by identifying a person in the answer. That can happen by retrieving a user submitted query and determining that the query is related to the question; and provides an answer, including the link to the resource, as part of search results that are responsive to the query. Few social search engines depend only on online communities. Depending on the feature-set of a particular search engine, these results may then be saved and added to community search results, further improving the relevance of results for future searches of that keyword. Social search engines are considered a part of Web 2.0 because they use the collective filtering of online communities to elevate particularly interesting or relevant content using tagging. These descriptive tags add to the meta data embedded in Web pages, theoretically improving the results for particular keywords over time. A user will generally see suggested tags for a particular search term, indicating tags that have previously been added. An implementation of a social search engine is Aardvark. Aardvark is a social search engine that is based on the "village paradigm" which is about connecting the user who has a question with friends or friends of friends whom can answer his or her question. In Aadvark, a user ask a question in different ways that mostly involves online ways such as instant messaging, email, web input or other non-online ways such as text message or voice. The Aar

    Read more →
  • Scalable Coherent Interface

    Scalable Coherent Interface

    The Scalable Coherent Interface or Scalable Coherent Interconnect (SCI), is a high-speed interconnect standard for shared memory multiprocessing and message passing. The goal was to scale well, provide system-wide memory coherence and a simple interface; i.e. a standard to replace existing buses in multiprocessor systems with one with no inherent scalability and performance limitations. The IEEE Std 1596-1992, IEEE Standard for Scalable Coherent Interface (SCI) was approved by the IEEE standards board on March 19, 1992. It saw some use during the 1990s, but never became widely used and has been replaced by other systems from the early 2000s. == History == Soon after the Fastbus (IEEE 960) follow-on Futurebus (IEEE 896) project in 1987, some engineers predicted it would already be too slow for the high performance computing marketplace by the time it would be released in the early 1990s. In response, a "Superbus" study group was formed in November 1987. Another working group of the standards association of the Institute of Electrical and Electronics Engineers (IEEE) spun off to form a standard targeted at this market in July 1988. It was essentially a subset of Futurebus features that could be easily implemented at high speed, along with minor additions to make it easier to connect to other systems, such as VMEbus. Most of the developers had their background from high-speed computer buses. Representatives from companies in the computer industry and research community included Amdahl, Apple Computer, BB&N, Hewlett-Packard, CERN, Dolphin Server Technology, Cray Research, Sequent, AT&T, Digital Equipment Corporation, McDonnell Douglas, National Semiconductor, Stanford Linear Accelerator Center, Tektronix, Texas Instruments, Unisys, University of Oslo, University of Wisconsin. The original intent was a single standard for all buses in the computer. The working group soon came up with the idea of using point-to-point communication in the form of insertion rings. This avoided the lumped capacitance, limited physical length/speed of light problems and stub reflections in addition to allowing parallel transactions. The use of insertion rings is credited to Manolis Katevenis who suggested it at one of the early meetings of the working group. The working group for developing the standard was led by David B. Gustavson (chair) and David V. James (Vice Chair). David V. James was a major contributor for writing the specifications including the executable C-code. Stein Gjessing’s group at the University of Oslo used formal methods to verify the coherence protocol and Dolphin Server Technology implemented a node controller chip including the cache coherence logic. Different versions and derivatives of SCI were implemented by companies like Dolphin Interconnect Solutions, Convex, Data General AViiON (using cache controller and link controller chips from Dolphin), Sequent and Cray Research. Dolphin Interconnect Solutions implemented a PCI and PCI-Express connected derivative of SCI that provides non-coherent shared memory access. This implementation was used by Sun Microsystems for its high-end clusters, Thales Group and several others including volume applications for message passing within HPC clustering and medical imaging. SCI was often used to implement non-uniform memory access architectures. It was also used by Sequent Computer Systems as the processor memory bus in their NUMA-Q systems. Numascale developed a derivative to connect with coherent HyperTransport. == The standard == The standard defined two interface levels: The physical level that deals with electrical signals, connectors, mechanical and thermal conditions The logical level that describes the address space, data transfer protocols, cache coherence mechanisms, synchronization primitives, control and status registers, and initialization and error recovery facilities. This structure allowed new developments in physical interface technology to be easily adapted without any redesign on the logical level. Scalability for large systems is achieved through a distributed directory-based cache coherence model. (The other popular models for cache coherency are based on system-wide eavesdropping (snooping) of memory transactions – a scheme which is not very scalable.) In SCI each node contains a directory with a pointer to the next node in a linked list that shares a particular cache line. SCI defines a 64-bit flat address space (16 exabytes) where 16 bits are used for identifying a node (65,536 nodes) and 48 bits for address within the node (256 terabytes). A node can contain many processors and/or memory. The SCI standard defines a packet switched network. === Topologies === SCI can be used to build systems with different types of switching topologies from centralized to fully distributed switching: With a central switch, each node is connected to the switch with a ringlet (in this case a two-node ring). In distributed switching systems, each node can be connected to a ring of arbitrary length and either all or some of the nodes can be connected to two or more rings. The most common way to describe these multi-dimensional topologies is k-ary n-cubes (or tori). The SCI standard specification mentions several such topologies as examples. The 2-D torus is a combination of rings in two dimensions. Switching between the two dimensions requires a small switching capability in the node. This can be expanded to three or more dimensions. The concept of folding rings can also be applied to the Torus topologies to avoid any long connection segments. === Transactions === SCI sends information in packets. Each packet consists of an unbroken sequence of 16-bit symbols. The symbol is accompanied by a flag bit. A transition of the flag bit from 0 to 1 indicates the start of a packet. A transition from 1 to 0 occurs 1 (for echoes) or 4 symbols before the packet end. A packet contains a header with address command and status information, payload (from 0 through optional lengths of data) and a CRC check symbol. The first symbol in the packet header contains the destination node address. If the address is not within the domain handled by the receiving node, the packet is passed to the output through the bypass FIFO. In the other case, the packet is fed to a receive queue and may be transferred to a ring in another dimension. All packets are marked when they pass the scrubber (a node is established as scrubber when the ring is initialized). Packets without a valid destination address will be removed when passing the scrubber for the second time to avoid filling the ring with packets that would otherwise circulate indefinitely. === Cache coherence === Cache coherence ensures data consistency in multiprocessor systems. The simplest form applied in earlier systems was based on clearing the cache contents between context switches and disabling the cache for data that were shared between two or more processors. These methods were feasible when the performance difference between the cache and memory were less than one order of magnitude. Modern processors with caches that are more than two orders of magnitude faster than main memory would not perform anywhere near optimal without more sophisticated methods for data consistency. Bus based systems use eavesdropping (snooping) methods since buses are inherently broadcast. Modern systems with point-to point links use broadcast methods with snoop filter options to improve performance. Since broadcast and eavesdropping are inherently non-scalable, these are not used in SCI. Instead, SCI uses a distributed directory-based cache coherence protocol with a linked list of nodes containing processors that share a particular cache line. Each node holds a directory for the main memory of the node with a tag for each line of memory (same line length as the cache line). The memory tag holds a pointer to the head of the linked list and a state code for the line (three states – home, fresh, gone). Associated with each node is also a cache for holding remote data with a directory containing forward and backward pointers to nodes in the linked list sharing the cache line. The tag for the cache has seven states (invalid, only fresh, head fresh, only dirty, head dirty, mid valid, tail valid). The distributed directory is scalable. The overhead for the directory based cache coherence is a constant percentage of the node’s memory and cache. This percentage is in the order of 4% for the memory and 7% for the cache. == Legacy == SCI is a standard for connecting the different resources within a multiprocessor computer system, and it is not as widely known to the public as for example the Ethernet family for connecting different systems. Different system vendors implemented different variants of SCI for their internal system infrastructure. These different implementations interface to very intricate mechanisms in processors and memory systems and each vendor has to preserve some degrees of

    Read more →
  • Lexical choice

    Lexical choice

    Lexical choice is the subtask of Natural language generation that involves choosing the content words (nouns, non-auxiliary verbs, adjectives, and adverbs) in a generated text. Function words (determiners, for example) are usually chosen during realisation. == Examples == The simplest type of lexical choice involves mapping a domain concept (perhaps represented in an ontology) to a word. For example, the concept Finger might be mapped to the word finger. A more complex situation is when a domain concept is expressed using different words in different situations. For example, the domain concept Value-Change can be expressed in many ways: The temperature rose: the verb rose is used for a Value-Change in temperature which increases the value. The temperature fell: the verb fell is used for a Value-Change in temperature which decreases the value. The rain got heavier: the phrase got heavier is used for a Value-Change in precipitation amount when the precipitation is rain. Sometimes words can communicate additional contextual information, for example: The temperature plummeted: the verb plummeted is used for a Value-Change in temperature which decreases the value, when the change is rapid and large. Contextual information is especially significant for vague terms such as tall. For example, a 2m tall man is tall, but a 2m tall horse is small. == Linguistic perspective == Lexical choice modules must be informed by linguistic knowledge of how the system's input data maps onto words. This is a question of semantics, but it is also influenced by syntactic factors (such as collocation effects) and pragmatic factors (such as context). Hence NLG systems need linguistic models of how meaning is mapped to words in the target domain (genre) of the NLG system. Genre tends to be very important; for example the verb veer has a very specific meaning in weather forecasts (wind direction is changing in a clockwise direction) which it does not have in general English, and a weather-forecast generator must be aware of this genre-specific meaning. In some cases there are major differences in how different people use the same word; for example, some people use by evening to mean 6PM and others use it to mean midnight. Psycholinguists have shown that when people speak to each other, they agree on a common interpretation via lexical alignment; this is not something which NLG systems can yet do. Ultimately, lexical choice must deal with the fundamental issue of how language relates to the non-linguistic world. For example, a system which chose colour terms such as red to describe objects in a digital image would need to know which RGB pixel values could generally be described as red; how this was influenced by visual (lighting, other objects in the scene) and linguistic (other objects being discussed) context; what pragmatic connotations were associated with red (for example, when an apple is called red, it is assumed to be ripe as well as have the colour red); and so forth. == Algorithms and models == A number of algorithms and models have been developed for lexical choice in the research community, for example Edmonds developed a model for choosing between near-synonyms (words with similar core meanings but different connotations). However such algorithms and models have not been widely used in applied NLG systems; such systems have instead often used quite simple computational models, and invested development effort in linguistic analysis instead of algorithm development.

    Read more →
  • Kerckhoffs's principle

    Kerckhoffs's principle

    Kerckhoffs's principle (also called Kerckhoffs's desideratum, assumption, axiom, doctrine or law) of cryptography was stated by the Dutch cryptographer Auguste Kerckhoffs in the 19th century. The principle holds that a cryptosystem should be secure, even if everything about the system, except the key, is public knowledge. This concept is widely embraced by cryptographers, in contrast to security through obscurity, which is not. Kerckhoffs's principle was phrased by the American mathematician Claude Shannon as "the enemy knows the system", i.e., "one ought to design systems under the assumption that the enemy will immediately gain full familiarity with them". In that form, it is called Shannon's maxim. Another formulation by American researcher and professor Steven M. Bellovin is: In other words—design your system assuming that your opponents know it in detail. (A former official at NSA's National Computer Security Center told me that the standard assumption there was that serial number 1 of any new device was delivered to the Kremlin.) == Origins == The invention of telegraphy radically changed military communications and increased the number of messages that needed to be protected from the enemy dramatically, leading to the development of field ciphers which had to be easy to use without large confidential codebooks prone to capture on the battlefield. It was this environment which led to the development of Kerckhoffs's requirements. Auguste Kerckhoffs was a professor of German language at Ecole des Hautes Etudes Commerciales (HEC) in Paris. In early 1883, Kerckhoffs's article, La Cryptographie Militaire, was published in two parts in the Journal of Military Science, in which he stated six design rules for military ciphers. Translated from French, they are: The system must be practically, if not mathematically, indecipherable; It should not require secrecy, and it should not be a problem if it falls into enemy hands; It must be possible to communicate and remember the key without using written notes, and correspondents must be able to change or modify it at will; It must be applicable to telegraph communications; It must be portable, and should not require several persons to handle or operate; Lastly, given the circumstances in which it is to be used, the system must be easy to use and should not be stressful to use or require its users to know and comply with a long list of rules. Some are no longer relevant given the ability of computers to perform complex encryption. The second rule, now known as Kerckhoffs's principle, is still critically important. == Explanation of the principle == Kerckhoffs viewed cryptography as a rival to, and a better alternative than, steganographic encoding, which was common in the nineteenth century for hiding the meaning of military messages. One problem with encoding schemes is that they rely on humanly-held secrets such as "dictionaries" which disclose for example, the secret meaning of words. Steganographic-like dictionaries, once revealed, permanently compromise a corresponding encoding system. Another problem is that the risk of exposure increases as the number of users holding the secrets increases. Nineteenth century cryptography, in contrast, used simple tables which provided for the transposition of alphanumeric characters, generally given row-column intersections which could be modified by keys which were generally short, numeric, and could be committed to human memory. The system was considered "indecipherable" because tables and keys do not convey meaning by themselves. Secret messages can be compromised only if a matching set of table, key, and message falls into enemy hands in a relevant time frame. Kerckhoffs viewed tactical messages as only having a few hours of relevance. Systems are not necessarily compromised, because their components (i.e. alphanumeric character tables and keys) can be easily changed. === Advantage of secret keys === Using secure cryptography is supposed to replace the difficult problem of keeping messages secure with a much more manageable one, keeping relatively small keys secure. A system that requires long-term secrecy for something as large and complex as the whole design of a cryptographic system obviously cannot achieve that goal. It only replaces one hard problem with another. However, if a system is secure even when the enemy knows everything except the key, then all that is needed is to manage keeping the keys secret. There are a large number of ways the internal details of a widely used system could be discovered. The most obvious is that someone could bribe, blackmail, or otherwise threaten staff or customers into explaining the system. In war, for example, one side will probably capture some equipment and people from the other side. Each side will also use spies to gather information. If a method involves software, someone could do memory dumps or run the software under the control of a debugger in order to understand the method. If hardware is being used, someone could buy or steal some of the hardware and build whatever programs or gadgets needed to test it. Hardware can also be dismantled so that the chip details can be examined under the microscope. === Maintaining security === A generalization some make from Kerckhoffs's principle is: "The fewer and simpler the secrets that one must keep to ensure system security, the easier it is to maintain system security." Bruce Schneier ties it in with a belief that all security systems must be designed to fail as gracefully as possible: Kerckhoffs's principle applies beyond codes and ciphers to security systems in general: every secret creates a potential failure point. Secrecy, in other words, is a prime cause of brittleness—and therefore something likely to make a system prone to catastrophic collapse. Conversely, openness provides ductility. Any security system depends crucially on keeping some things secret. However, Kerckhoffs's principle points out that the things kept secret ought to be those least costly to change if inadvertently disclosed. For example, a cryptographic algorithm may be implemented by hardware and software that is widely distributed among users. If security depends on keeping that secret, then disclosure leads to major logistic difficulties in developing, testing, and distributing implementations of a new algorithm – it is "brittle". On the other hand, if keeping the algorithm secret is not important, but only the keys used with the algorithm must be secret, then disclosure of the keys simply requires the simpler, less costly process of generating and distributing new keys. == Applications == In accordance with Kerckhoffs's principle, the majority of civilian cryptography makes use of publicly known algorithms. By contrast, ciphers used to protect classified government or military information are often kept secret (see Type 1 encryption). However, it should not be assumed that government/military ciphers must be kept secret to maintain security. It is possible that they are intended to be as cryptographically sound as public algorithms, and the decision to keep them secret is in keeping with a layered security posture. == Security through obscurity == It is moderately common for companies to keep the inner workings of a system secret. Some argue this "security by obscurity" makes the product safer and less vulnerable to attack. A counter-argument is that keeping the innards secret may improve security in the short term, but in the long run, only systems that have been published and analyzed should be trusted. Steven Bellovin and Randy Bush commented: Security Through Obscurity Considered Dangerous Hiding security vulnerabilities in algorithms, software, and/or hardware decreases the likelihood they will be repaired and increases the likelihood that they can and will be exploited. Discouraging or outlawing discussion of weaknesses and vulnerabilities is extremely dangerous and deleterious to the security of computer systems, the network, and its citizens. Open Discussion Encourages Better Security The long history of cryptography and cryptoanalysis has shown time and time again that open discussion and analysis of algorithms exposes weaknesses not thought of by the original authors, and thereby leads to better and more secure algorithms. As Kerckhoffs noted about cipher systems in 1883 [Kerc83], "Il faut qu'il n'exige pas le secret, et qu'il puisse sans inconvénient tomber entre les mains de l'ennemi." (Roughly, "the system must not require secrecy and must be able to be stolen by the enemy without causing trouble.")

    Read more →
  • Completeness (cryptography)

    Completeness (cryptography)

    In cryptography, a boolean function is said to be complete if the value of each output bit depends on all input bits. This is a desirable property to have in an encryption cipher, so that if one bit of the input (plaintext) is changed, every bit of the output (ciphertext) has an average of 50% probability of changing. The easiest way to show why this is good is the following: consider that if we changed our 8-byte plaintext's last byte, it would only have any effect on the 8th byte of the ciphertext. This would mean that if the attacker guessed 256 different plaintext-ciphertext pairs, he would always know the last byte of every 8byte sequence we send (effectively 12.5% of all our data). Finding out 256 plaintext-ciphertext pairs is not hard at all in the internet world, given that standard protocols are used, and standard protocols have standard headers and commands (e.g. "get", "put", "mail from:", etc.) which the attacker can safely guess. On the other hand, if our cipher has this property (and is generally secure in other ways, too), the attacker would need to collect 264 (~1020) plaintext-ciphertext pairs to crack the cipher in this way.

    Read more →
  • BREACH

    BREACH

    BREACH (a backronym: Browser Reconnaissance and Exfiltration via Adaptive Compression of Hypertext) is a security vulnerability against HTTPS when using HTTP compression. BREACH is built based on the CRIME security exploit. BREACH was announced at the August 2013 Black Hat USA conference by security researchers Angelo Prado, Neal Harris and Yoel Gluck. == Details == While the CRIME attack was presented as a general attack that could work effectively against a large number of protocols, only exploits against SPDY request compression and TLS compression were demonstrated and largely mitigated in browsers and servers. The CRIME exploits against HTTP compression has not been mitigated at all, even though the authors of CRIME have warned that this vulnerability might be even more widespread than SPDY and TLS compression combined. BREACH is an instance of the CRIME attack against HTTP compression—the use of gzip or DEFLATE data compression algorithms via the content-encoding option within HTTP by many web browsers and servers. Given this compression oracle, the rest of the BREACH attack follows the same general lines as the CRIME exploit, by performing an initial blind brute-force search to guess a few bytes, followed by divide-and-conquer search to expand a correct guess to an arbitrarily large amount of content. == Mitigation == BREACH exploits the compression in the underlying HTTP protocol. Therefore, turning off TLS compression makes no difference to BREACH, which can still perform a chosen-plaintext attack against the HTTP payload. As a result, clients and servers are either forced to disable HTTP compression completely (thus reducing performance), or to adopt workarounds to try to foil BREACH in individual attack scenarios, such as using cross-site request forgery (CSRF) protection. Another suggested approach is to disable HTTP compression whenever the referrer header indicates a cross-site request, or when the header is not present. This approach allows effective mitigation of the attack without losing functionality, only incurring a performance penalty on affected requests. Another approach is to add padding at the TLS, HTTP header, or payload level. Around 2013–2014, there was an IETF draft proposal for a TLS extension for length-hiding padding that, in theory, could be used as a mitigation against this attack. It allows the actual length of the TLS payload to be disguised by the insertion of padding to round it up to a fixed set of lengths, or to randomize the external length, thereby decreasing the likelihood of detecting small changes in compression ratio that is the basis for the BREACH attack. However, this draft has since expired without further action. A very effective mitigation is HTB (Heal-the-BREACH) that adds random-sized padding to compressed data, providing some variance in the size of the output contents. This randomness delays BREACH from guessing the correct characters in the secret token by a factor of 500 (10-byte max) to 500,000 (100-byte max). HTB protects all websites and pages in the server with minimal CPU usage and minimal bandwidth increase.

    Read more →
  • Reciprocal human machine learning

    Reciprocal human machine learning

    Reciprocal Human Machine Learning (RHML) is an interdisciplinary approach to designing human-AI interaction systems. RHML aims to enable continual learning between humans and machine learning models by having them learn from each other. This approach keeps the human expert "in the loop" to oversee and enhance machine learning performance and simultaneously support the human expert continue learning. == Background == RHML emerged in the context of the rise of big data analytics and artificial intelligence for intelligent tasks like sense-making and decision-making. As machine learning advanced to take on more roles, researchers realized fully autonomous systems had limitations and needed human guidance. RHML extends the concept of human-in-the-loop systems by promoting reciprocal learning. Humans learn from their interactions with machine learning models, staying up-to-date on evolving technology. The models also learn from human feedback and oversight. This amplification of learning on both sides is a key focus of RHML. The approach draws on theories of learning in dyads from education and psychology. It also builds on human-computer interaction and human-centered design principles. Implementing RHML requires developing specialized tools and interfaces tailored to the application == Applications == RHML has been explored across diverse domains including: Cybersecurity - Software to enable reciprocal learning between experts and AI models for social media threat detection. Organizational decision-making - RHML to structure collaboration between humans and AI systems. Workplace training - Using RHML for workers to learn from AI technologies on the job. Open science - Using human and AI collaboration to promote open science. Production and logistics - turning workers and intelligent machines into teammates. RHML maintains human oversight and control over AI systems, while enabling cutting-edge machine learning performance. This collaborative approach highlights the importance of keeping the human expert involved in the loop. An example of RHML in application is Free Spirit (AFSFCV), an open-source architecture first published in early 2025 as a whitepaper, proposing a visually structured approach to intent-based human–AI interaction.

    Read more →
  • Content repository

    Content repository

    A content repository or content store is a database of digital content with an associated set of data management, search and access methods allowing application-independent access to the content, rather like a digital library, but with the ability to store and modify content in addition to searching and retrieving. The content repository acts as the storage engine for a larger application such as a content management system or a document management system, which adds a user interface on top of the repository's application programming interface. == Advantages provided by repositories == Common rules for data access allow many applications to work with the same content without interrupting the data. They give out signals when changes happen, letting other applications using the repository know that something has been modified, which enables collaborative data management. Developers can deal with data using programs that are more compatible with the desktop programming environment. The data model is scriptable when users use a content repository. == Content repository features == A content repository may provide functionality such as: Add/edit/delete content Hierarchy and sort order management Query / search Versioning Access control Import / export Locking Life-cycle management Retention and holding / records management == Examples == Apache Jackrabbit ModeShape == Applications == Content management Document management Digital asset management Records management Revision control Social collaboration Web content management == Standards and specification == Content repository API for Java WebDAV Content Management Interoperability Services

    Read more →
  • Rassd News Network

    Rassd News Network

    Rassd News Network, also known by its initials of RNN (Arabic:شبكة رصد الاخبارية), is an alternative media network based in Cairo, Egypt. RNN was launched as a Facebook-based news source launched on January 25, 2011. It quickly advanced to become a primary contributor of Egyptian revolution-related news that year. Applying the motto "From the people to the people," the citizen journalists who created RNN have since added a Twitter feed and launched an independent website dedicated to short news stories favored by an online audience. RNN is an organized citizen news network with four working committees; one for editing the news, another to support the correspondents covering Egypt, a third for managing the multimedia feeds and a fourth for staff functions such as development, training and public relations. RNN's Arabic name, Rassd, is an acronym that stands for Rakeb (observe), Sawwer (record) and Dawwen (blog). RNN created a Ustream channel on January 27, 2011, and a YouTube account a month later. The success of RNN and its new social media model is evidenced in its recent local network expansion into Libya, Morocco, Syria, Jerusalem and Turkey. Even so, one media scholar in the US (commenting in 2011) called the accuracy of RNN's reporting "fairly mediocre". RNN has endured closures of their Facebook profile and YouTube account as part of the attacks from private media, attempting to thwart their work and influence their content. == Use of RNN's news by international media == RNN has been a global source of Egyptian revolution-related news since its launch. During the early days of the citizen uprisings across the Middle East, major networks such as BBC, Reuters, Al Jazeera and Al Arabiya used some of Rassd's news and photos, and followed the network on Twitter. Three days after the online portal went live it was streaming video to MSNBC through its Facebook page. Then on February 5, 2011, Louisville's NBC-affiliate cited RNN, Cairo when it reported that President Hosni Mubarak had stepped down as head of Egypt's ruling party.

    Read more →