AI Assistant Qt

AI Assistant Qt — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • NumPy

    NumPy

    NumPy (pronounced NUM-py) is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The predecessor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is open-source software and has many contributors. NumPy is fiscally sponsored by NumFOCUS. == History == === matrix-sig === The Python programming language was not originally designed for numerical computing, but attracted the attention of the scientific and engineering community early on. In 1995 the special interest group (SIG) matrix-sig was founded with the aim of defining an array computing package; among its members was Python designer and maintainer Guido van Rossum, who extended Python's syntax (in particular the indexing syntax) to make array computing easier. === Numeric === An implementation of a matrix package was completed by Jim Fulton, then expanded to support multi-dimensional arrays by Jim Hugunin and called Numeric (also variously known as the "Numerical Python extensions" or "NumPy"), with influences from the APL family of languages, Basis, MATLAB, FORTRAN, S and S+, and others. Hugunin, a graduate student at the Massachusetts Institute of Technology (MIT), joined the Corporation for National Research Initiatives (CNRI) in 1997 to work on JPython, leaving Paul Dubois of Lawrence Livermore National Laboratory (LLNL) to take over as maintainer. Other early contributors include David Ascher, Konrad Hinsen and Travis Oliphant. === Numarray === A new package called Numarray was written as a more flexible replacement for Numeric. Like Numeric, it too is now deprecated. Numarray had faster operations for large arrays, but was slower than Numeric on small ones, so for a time both packages were used in parallel for different use cases. The last version of Numeric (v24.2) was released on 11 November 2005, while the last version of numarray (v1.5.2) was released on 24 August 2006. There was a desire to get Numeric into the Python standard library, but Guido van Rossum decided that the code was not maintainable in its state then. === NumPy === In early 2005, NumPy developer Travis Oliphant wanted to unify the community around a single array package and ported Numarray's features to Numeric, releasing the result as NumPy 1.0 in 2006. This new project was part of SciPy. To avoid installing the large SciPy package just to get an array object, this new package was separated and called NumPy. Support for Python 3 was added in 2011 with NumPy version 1.5.0. In 2011, PyPy started development on an implementation of the NumPy API for PyPy. As of 2023, it is not yet fully compatible with NumPy. == Features == NumPy targets the CPython reference implementation of Python, which is a non-optimizing bytecode interpreter. Mathematical algorithms written for this version of Python often run much slower than compiled equivalents due to the absence of compiler optimization. NumPy addresses the slowness problem partly by providing multidimensional arrays and functions and operators that operate efficiently on arrays; using these requires rewriting some code, mostly inner loops, using NumPy. Using NumPy in Python gives functionality comparable to MATLAB since they are both interpreted, and they both allow the user to write fast programs as long as most operations work on arrays or matrices instead of scalars. In comparison, MATLAB boasts a large number of additional toolboxes, notably Simulink, whereas NumPy is intrinsically integrated with Python, a more modern and complete programming language. Moreover, complementary Python packages are available; SciPy is a library that adds more MATLAB-like functionality and Matplotlib is a plotting package that provides MATLAB-like plotting functionality. Although MATLAB can perform sparse matrix operations, NumPy alone cannot perform such operations and requires the use of the scipy.sparse library. Internally, both MATLAB and NumPy rely on BLAS and LAPACK for efficient linear algebra computations. Python bindings of the widely used computer vision library OpenCV utilize NumPy arrays to store and operate on data. Since images with multiple channels are simply represented as three-dimensional arrays, indexing, slicing or masking with other arrays are very efficient ways to access specific pixels of an image. The NumPy array as universal data structure in OpenCV for images, extracted feature points, filter kernels and many more vastly simplifies the programming workflow and debugging. Importantly, many NumPy operations release the global interpreter lock, which allows for multithreaded processing. NumPy also provides a C API, which allows Python code to interoperate with external libraries written in low-level languages. === The ndarray data structure === The core functionality of NumPy is its "ndarray", for n-dimensional array, data structure. These arrays are strided views on memory. In contrast to Python's built-in list data structure, these arrays are homogeneously typed: all elements of a single array must be of the same type. Such arrays can also be views into memory buffers allocated by C/C++, Python, and Fortran extensions to the CPython interpreter without the need to copy data around, giving a degree of compatibility with existing numerical libraries. This functionality is exploited by the SciPy package, which wraps a number of such libraries (notably BLAS and LAPACK). NumPy has built-in support for memory-mapped ndarrays. === Limitations === Inserting or appending entries to an array is not as trivially possible as it is with Python's lists. The np.pad(...) routine to extend arrays actually creates new arrays of the desired shape and padding values, copies the given array into the new one and returns it. NumPy's np.concatenate([a1,a2]) operation does not actually link the two arrays but returns a new one, filled with the entries from both given arrays in sequence. Reshaping the dimensionality of an array with np.reshape(...) is only possible as long as the number of elements in the array does not change. These circumstances originate from the fact that NumPy's arrays must be views on contiguous memory buffers. Algorithms that are not expressible as a vectorized operation will typically run slowly because they must be implemented in "pure Python", while vectorization may increase memory complexity of some operations from constant to linear, because temporary arrays must be created that are as large as the inputs. Runtime compilation of numerical code has been implemented by several groups to avoid these problems; open source solutions that interoperate with NumPy include numexpr and Numba. Cython and Pythran are static-compiling alternatives to these. Many modern large-scale scientific computing applications have requirements that exceed the capabilities of the NumPy arrays. For example, NumPy arrays are usually loaded into a computer's memory, which might have insufficient capacity for the analysis of large datasets. Further, NumPy operations are executed on a single CPU. However, many linear algebra operations can be accelerated by executing them on clusters of CPUs or of specialized hardware, such as GPUs and TPUs, which many deep learning applications rely on. As a result, several alternative array implementations have arisen in the scientific python ecosystem over the recent years, such as Dask for distributed arrays and TensorFlow or JAX for computations on GPUs. Because of its popularity, these often implement a subset of NumPy's API or mimic it, so that users can change their array implementation with minimal changes to their code required. A library named CuPy, accelerated by Nvidia's CUDA framework, has also shown potential for faster computing, being a 'drop-in replacement' of NumPy. == Examples == NumPy is conventionally imported as np. === Basic operations === === Universal functions === === Linear algebra === === Multidimensional arrays === === Incorporation with OpenCV === === Nearest-neighbor search === Functional Python and vectorized NumPy version. === F2PY === Quickly wrap native code for faster scripts.

    Read more →
  • Data dictionary

    Data dictionary

    A data dictionary, or metadata repository, as defined in the IBM Dictionary of Computing, is a "centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format". Oracle defines it as a collection of tables with metadata. The term can have one of several closely related meanings pertaining to databases and database management systems (DBMS): A document describing a database or collection of databases An integral component of a DBMS that is required to determine its structure A piece of middleware that extends or supplants the native data dictionary of a DBMS == Documentation == The terms data dictionary and data repository indicate a more general software utility than a catalogue. A catalogue is closely coupled with the DBMS software. It provides the information stored in it to the user and the DBA, but it is mainly accessed by the various software modules of the DBMS itself, such as DDL and DML compilers, the query optimiser, the transaction processor, report generators, and the constraint enforcer. On the other hand, a data dictionary is a data structure that stores metadata, i.e., (structured) data about information. The software package for a stand-alone data dictionary or data repository may interact with the software modules of the DBMS, but it is mainly used by the designers, users and administrators of a computer system for information resource management. These systems maintain information on system hardware and software configuration, documentation, application and users as well as other information relevant to system administration. If a data dictionary system is used only by the designers, users, and administrators and not by the DBMS Software, it is called a passive data dictionary. Otherwise, it is called an active data dictionary or data dictionary. When a passive data dictionary is updated, it is done so manually and independently from any changes to a DBMS (database) structure. With an active data dictionary, the dictionary is updated first and changes occur in the DBMS automatically as a result. Database users and application developers can benefit from an authoritative data dictionary document that catalogs the organization, contents, and conventions of one or more databases. This typically includes the names and descriptions of various tables (records or entities) and their contents (fields), plus additional details, like the type and length of each data element. Another important piece of information that a data dictionary can provide is the relationship between tables. This is sometimes referred to in entity-relationship diagrams (ERDs), or if using set descriptors, identifying which sets database tables participate in. In an active data dictionary constraints may be placed upon the underlying data. For instance, a range may be imposed on the value of numeric data in a data element (field), or a record in a table may be forced to participate in a set relationship with another record-type. Additionally, a distributed DBMS may have certain location specifics described within its active data dictionary (e.g. where tables are physically located). The data dictionary consists of record types (tables) created in the database by systems generated command files, tailored for each supported back-end DBMS. Oracle has a list of specific views for the "sys" user. This allows users to look up the exact information that is needed. Command files contain SQL Statements for CREATE TABLE, CREATE UNIQUE INDEX, ALTER TABLE (for referential integrity), etc., using the specific statement required by that type of database. There is no universal standard as to the level of detail in such a document. == Middleware == In the construction of database applications, it can be useful to introduce an additional layer of data dictionary software, i.e. middleware, which communicates with the underlying DBMS data dictionary. Such a "high-level" data dictionary may offer additional features and a degree of flexibility that goes beyond the limitations of the native "low-level" data dictionary, whose primary purpose is to support the basic functions of the DBMS, not the requirements of a typical application. For example, a high-level data dictionary can provide alternative entity-relationship models tailored to suit different applications that share a common database. Extensions to the data dictionary also can assist in query optimization against distributed databases. Additionally, DBA functions are often automated using restructuring tools that are tightly coupled to an active data dictionary. Software frameworks aimed at rapid application development sometimes include high-level data dictionary facilities, which can substantially reduce the amount of programming required to build menus, forms, reports, and other components of a database application, including the database itself. For example, PHPLens includes a PHP class library to automate the creation of tables, indexes, and foreign key constraints portably for multiple databases. Another PHP-based data dictionary, part of the RADICORE toolkit, automatically generates program objects, scripts, and SQL code for menus and forms with data validation and complex joins. For the ASP.NET environment, Base One's data dictionary provides cross-DBMS facilities for automated database creation, data validation, performance enhancement (caching and index utilization), application security, and extended data types. Visual DataFlex features provides the ability to use DataDictionaries as class files to form middle layer between the user interface and the underlying database. The intent is to create standardized rules to maintain data integrity and enforce business rules throughout one or more related applications. Some industries use generalized data dictionaries as technical standards to ensure interoperability between systems. The real estate industry, for example, abides by a RESO's Data Dictionary to which the National Association of REALTORS mandates its MLSs comply with through its policy handbook. This intermediate mapping layer for MLSs' native databases is supported by software companies which provide API services to MLS organizations. == Platform-specific examples == Developers use a data description specification (DDS) to describe data attributes in file descriptions that are external to the application program that processes the data, in the context of an IBM i. The sys.ts$ table in Oracle stores information about every table in the database. It is part of the data dictionary that is created when the Oracle Database is created. Developers may also use DDS context from free and open-source software (FOSS) for structured and transactional queries in open environments. == Typical attributes == Here is a non-exhaustive list of typical items found in a data dictionary for columns or fields: Entity or form name or their ID (EntityID or FormID). The group this field belongs to. Field name, such as RDBMS field name Displayed field title. May default to field name if blank. Field type (string, integer, date, etc.) Measures such as min and max values, display width, or number of decimal places. Different field types may interpret this differently. An alternative is to have different attributes depending on field type. Field display order or tab order Coordinates on screen (if a positional or grid-based UI) Default value Prompt type, such as drop-down list, combo-box, check-boxes, range, etc. Is-required (Boolean) - If 'true', the value cannot be blank, null, or only white-spaces Is-read-only (Boolean) Reference table name, if a foreign key. Can be used for validation or selection lists. Various event handlers or references to. Example: "on-click", "on-validate", etc. See event-driven programming. Format code, such as a regular expression or COBOL-style "PIC" statements Description or synopsis Database index characteristics or specification

    Read more →
  • IWARP

    IWARP

    iWARP is a computer networking protocol that implements remote direct memory access (RDMA) for efficient data transfer over Internet Protocol networks. Contrary to some accounts, iWARP is not an acronym. Because iWARP is layered on Internet Engineering Task Force (IETF)-standard congestion-aware protocols such as Transmission Control Protocol (TCP) and Stream Control Transmission Protocol (SCTP), it makes few requirements on the network, and can be successfully deployed in a broad range of environments. == History == In 2007, the IETF published five Request for Comments (RFCs) that define iWARP: RFC 5040 A Remote Direct Memory Access Protocol Specification is layered over Direct Data Placement Protocol (DDP). It defines how RDMA Send, Read, and Write operations are encoded using DDP into headers on the network. RFC 5041 Direct Data Placement over Reliable Transports is layered over MPA/TCP or SCTP. It defines how received data can be directly placed into an upper layer protocols receive buffer without intermediate buffers. RFC 5042 Direct Data Placement Protocol (DDP) / Remote Direct Memory Access Protocol (RDMAP) Security analyzes security issues related to iWARP DDP and RDMAP protocol layers. RFC 5043 Stream Control Transmission Protocol (SCTP) Direct Data Placement (DDP) Adaptation defines an adaptation layer that enables DDP over SCTP. RFC 5044 Marker PDU Aligned Framing for TCP Specification defines an adaptation layer that enables preservation of DDP-level protocol record boundaries layered over the TCP reliable connected byte stream. These RFCs are based on the RDMA Consortium's specifications for RDMA over TCP. The RDMA Consortium's specifications are influenced by earlier RDMA standards, including Virtual Interface Architecture (VIA) and InfiniBand (IB). Since 2007, the IETF has published three additional RFCs that maintain and extend iWARP: RFC 6580 IANA Registries for the Remote Direct Data Placement (RDDP) Protocols published in 2012 defines IANA registries for Remote Direct Data Placement (RDDP) error codes, operation codes, and function codes. RFC 6581 Enhanced Remote Direct Memory Access (RDMA) Connection Establishment published in 2011 fixes shortcomings with iWARP connection setup. RFC 7306 Remote Direct Memory Access (RDMA) Protocol Extensions published in 2014 extends RFC 5040 with atomic operations and RDMA Write with Immediate Data. == Protocol == The main component in the iWARP protocol is the Direct Data Placement Protocol (DDP), which permits the actual zero-copy transmission. DDP itself does not perform the transmission; the underlying protocol (TCP or SCTP) does. However, TCP does not respect message boundaries; it sends data as a sequence of bytes without regard to protocol data units (PDU). In this regard, DDP itself may be better suited for SCTP, and indeed the IETF proposed a standard RDMA over SCTP. To run DDP over TCP requires a tweak known as marker PDU aligned (MPA) framing to guarantee boundaries of messages. Furthermore, DDP is not intended to be accessed directly. Instead, a separate RDMA protocol (RDMAP) provides the services to read and write data. Therefore, the entire RDMA over TCP specification is really RDMAP over DDP over either MPA/TCP or SCTP. All of these protocols can be implemented in hardware. Unlike IB, iWARP only has reliable connected communication, as this is the only service that TCP and SCTP provide. The iWARP specification omits other features of IB, such as Send with Immediate Data operations. With RFC 7306, the IETF is working to reduce these omissions. == Implementation == Because a kernel implementation of the TCP stack can be seen as a bottleneck, the protocol is typically implemented in hardware RDMA network interface controllers (rNICs). As simple data losses are rare in tightly coupled network environments, the error-correction mechanisms of TCP may be performed by software while the more frequently performed communications are handled strictly by logic embedded on the rNIC. Similarly, connections are often established entirely by software and then handed off to the hardware. Furthermore, the handling of iWARP specific protocol details is typically isolated from the TCP implementation, allowing rNICs to be used for both as RDMA offload and TCP offload (in support of traditional sockets based TCP/IP applications). The portion of the hardware implementation used for implementing the TCP protocol is known as the TCP Offload Engine (TOE). TOE itself does not prevent copying on the reception side, and must be combined with RDMA hardware for zero-copy results. The RDMA / TCP specification is a set of different wire protocols intended to be implemented in hardware (though it seems feasible to emulate it in software for compatibility but without the performance benefits). == Interfaces == iWARP is a protocol, not an implementation, but defines protocol behavior in terms of the operations that are legal for the protocol, known as Verbs. As such, iWARP does not have any single standard programming interface. However, programming interfaces tend to very closely correspond to the Verbs. Several programmatic interfaces have been proposed, including OpenFabrics Verbs, Network Direct, uDAPL, kDAPL, IT-API, and RNICPI. Implementations of some of these interfaces are available for different platforms, including Windows and Linux. == Services available == Networking services implemented over iWARP include those offered in the OpenFabrics Enterprise Distribution (OFED) by the OpenFabrics Alliance for Linux operating systems, and by Microsoft Windows via Network Direct. NVMe over Fabrics (NVMEoF) iSCSI Extensions for RDMA (iSER) Server Message Block Direct (SMB Direct) Sockets Direct Protocol (SDP) SCSI RDMA Protocol (SRP) Network File System over RDMA (NFS over RDMA) GPUDirect

    Read more →
  • Social media as a public utility

    Social media as a public utility

    Social media as a public utility is a theory postulating that social networking sites (such as Meta - ie:Facebook & Instagram or Alphabet - ie: YouTube & Google, but also independent sites such as Twitter, Tumblr, Snapchat etc.) are essential public services that should be regulated by the government, in a manner similar to how electric and phone utilities are typically government regulated. It is based on the notion that social media platforms have monopoly power and broad social influence. == Background == === Definitions === Social media is defined as "a group of Internet-based applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of User Generated Content." Furthermore, the New Zealand Government of Internal Affairs describes it as "a set of online technologies, sites, and practices which are used to share opinions, experiences and perspectives. Fundamentally it is about the conversation. In contrast with traditional media, the nature of social media is to be highly interactive." Moreover, the term social media is described as online tools that let people interact and communicate with each other. This has become a standard word for online cultural exchange and a dominant way for individuals to engage on the internet. By using social media individuals become more closely and strongly connected than ever before. The traditional definition of the term public utility is "an infrastructural necessity for the general public where the supply conditions are such that the public may not be provided with a reasonable service at reasonable prices because of monopoly in the area." Conventional public utilities include water, natural gas, and electricity. In order to secure the interests of the public, utilities are regulated. Public utilities can also be seen as natural monopolies implying that the highest degree of efficiency is accomplished under one operator in the marketplace. Public utility regulation for social media has been largely criticized because people believe it would produce undesirable and indirect effects. However, others say that truly effective government regulation would produce valuable results. Social media as a public utility is a crucial debate because utilities get regulated, so marking social media websites as utilities would require government regulation of various social media websites and platforms such as Facebook, Google, and Twitter. Applying the term public utility to social media implies that social media websites are public necessities, and, consequently, should be regulated by the government. While social media are not as essential for survival as traditional public utilities such as electricity, water, and natural gas, many people believe it has become vital for living in an interconnected world and without it, living a successful life would be difficult. Therefore, many people believe that social media has reached utility status and should be treated as a public utility. However, others believe that this is not true because social media are constantly revolutionizing and giving such platforms "utility status" would result in government regulation, which would consequently hinder innovation. Over the past decade many have debated and questioned whether or not "Internet service providers should be considered essential facilities or natural monopolies and regulated as public utilities." === Monopoly === A monopoly is defined as "a firm that is the only seller of a product or service having no close substitutes." A natural monopoly is when the entire demand within a relevant market can be satisfied at lowest cost by one firm rather than by two or more, and if such a market contains more than one firm then the firms will "quickly shake down to one through mergers or failures, or production will continue to consume more resources than necessary." In a monopoly competition is said to be short-lived, and in a natural monopoly it is said to produce inefficient results." Public utility companies can be regulated to prevent them from gaining monopolistic control. In November 2011 AT&T's proposal for merging with T-Mobile was rejected because it would have "diminished competition," and have led to the company having monopolistic power within the telephone industry. Such regulation is permitted because the telephone industry is a public utility. Similarly, Microsoft has also been prevented from taking various business actions that could result in the company gaining monopolistic power. If social media were a public utility then regulation of Google and Facebook would similarly dictate what they could and could not do. The possibility was raised in 2018 by U.S. Representative Steve King during a House Judiciary hearing on social media filtering practices. == Arguments == Advocates of this theory believe that social media websites already act like public utilities, and therefore regulation is needed. Additionally, advocates say that in the 21st century, using such websites are as necessary for communication as using traditional public utilities such as telephone, water, electricity, and natural gas are for other everyday uses. Specifically, advocates note that Google search should be treated as a public utility and needs to be regulated because it dominates the search engine market and no website can afford to ignore it. There is the position that a social media website such as Google "is a common carrier and should be regulated as such (Newman 2011)." These are reinforced by a perception that social media companies fail to properly maintain fair platforms for discourse. === Individual level === Advocates of regulating social media as a public utility believe that having an Internet presence using social media websites is imperative for individuals to adequately take part in the 21st century. Consequently, they argue that these sites are public utilities that need to be regulated to ensure that the constitutional rights of users are protected. For example, regulation may be needed to protect freedom of speech against risks such as Internet censorship and deplatforming. Social media affects people's behavior. For instance, it plays an important role in shaping its users' decisions and actions pertaining to health. This is demonstrated in a Pew Research Center research, which showed that 72 percent of American adults turned to social media for health information in 2011. Around 70 percent of people with chronic illnesses also use the platform to find cure, diagnoses, and other health answers. This development becomes a public issue as social media are likely to provide wrong medical information. Additionally, social media sites can also facilitate deleterious health behavior such as smoking, drug use, and harmful sexual behavior. === Business level === Advocates of social media as a public utility maintain that social media services dominate the Internet and are mainly owned by three or four companies that have unparalleled power to shape user interaction, and because of this power such businesses need to be regulated as public utilities. Zeynep Tufekci, University of North Carolina Chapel Hill, claims that services on the Internet such as Google, eBay, Facebook, Amazon.com, are all natural monopolies. She has stated that these services "benefit greatly from network externalities[,] which means that the more people on the service, the more useful it is for everyone," and thus it is difficult to replace the market leader. === Government level === Advocates of social media as a public utility believe that the government should impose restrictions on social media websites, such as Google, that are designed to benefit its rivals. Due to the recent substantial growth of social media websites such as Google, advocates claim that such a website "might need search neutrality regulation modeled after net neutrality regulation and that a Federal Search Commission might be needed to enforce such a regime." danah boyd expresses a future issue which the government may have to deal with in her research: Facebook is becoming an international social media website, specifically prevalent in Canada and Europe which are "two regions that love to regulate their utilities." Furthermore, recent books by New America Foundation Senior Fellow Rebecca MacKinnon and law professor Lori Andrews advise society to start considering Facebook and Google as nation-states or the "sovereigns of cyberspace." Overall, advocates of social media as a public utility believe that due to the immense popularity and necessity of social media websites, it is imperative that the Government imposes regulations in the same manner they do for electricity, water, and natural gas. == Counterarguments == Opponents of this theory say that social media websites should not be treated as public utilities because these platforms are changing every year, and because they are not essential services for s

    Read more →
  • Negobot

    Negobot

    Negobot also referred to as Lolita or Lolita chatbot is a chatterbot that was introduced to the public in 2013, designed by researchers from the University of Deusto and Optenet to catch online pedophiles. It is a conversational agent that utilizes natural language processing (NLP), information retrieval (IR) and Automatic Learning. Because the bot poses as a young female in order to entice and track potential predators, it became known in media as the "virtual Lolita", in reference to Vladimir Nabokov's novel. == Background == In 2013, the University of Deusto researchers published a paper on their work with Negobot and disclosed the text online. In their abstract, the researchers addressed the issue that an increasing number of children are using the internet and that these young users are more susceptible to existing internet risks. Their main objective was to create a chatterbot with the ability to trap online predators that posed a threat to children. They intended to deploy the bot into sites frequented by predators such as social networks and chatrooms. The university researchers used information provided by anti-pedophilia activist organization Perverted-Justice, including examples of online encounters and conversations with sexual predators, to supplement the program's artificial intelligence system. == Features == === Programmed persona === The chatterbot takes the guise of a naive and vulnerable 14-year-old girl. The bot's programmers used methods of artificial intelligence and natural language processing to create a conversational agent fluent in typical teenage slang, misspellings, and knowledge of pop culture. Through these linguistic features, the bot is able to mimic the conversational style of young teenagers. It also features split personalities and seven different patterns of conversation. Negobot's primary creator, Dr. Carlos Laorden, expressed the significance of the bot's distinguishable style of communication, stating that normally, "chatbots tend to be very predictable. Their behavior and interest in a conversation are flat, which is a problem when attempting to detect untrustworthy targets like paedophiles." What makes Negobot different is its game theory feature, which makes it able to "maintain a much more realistic conversation." Apart from being able to imitate a stereotypical teenager, the program is also able to translate messages into different languages. === Game theory === Negobot's designers programmed it with the ability to treat conversations with potential predators as if it were a game, the objective being to collect as much information on the suspect as possible that could provide evidence of pedophilic characteristics and motives. The use of game theory shapes the decisions the bot makes and the overall direction of the conversation. The bot initiates its undercover operations by entering a chat as a passive participant, waiting to be chatted by a user. Once a user elicits conversation, the bot will frame the conversation in such a way that keeps the target engaged, extracting personal information and discouraging it from leaving the chat. The information is then recorded to be potentially sent to the police. If the target seems to lose interest, the bot attempts to make it feel guilty by expressing sentiments of loneliness and emotional need through strategic, formulated responses, ultimately prolonging interaction. In addition, the bot may provide fake information about itself in attempt to lure the target into physical meetings. === Limitations === Despite being able to carry out a realistic conversation, Negobot is still unable to detect linguistic subtleties in the messages of others, including sarcasm. == Controversy == John Carr, a specialist in online child safety, expressed his concern to BBC over the legality of this undercover investigation. He claimed that using the bot on unsuspecting internet users could be considered a form of entrapment or harassment. The type of information that Negobot collects from potential online predators, he said, is unlikely to be upheld in court. Furthermore, he warned that relying on only software without any real-world policing risks enticing individuals to do or say things that they would not have if real-world policing were a factor.

    Read more →
  • Cover (telecommunications)

    Cover (telecommunications)

    In telecommunications and tradecraft, cover is the technique of concealing or altering the characteristics of communications patterns for the purpose of denying an unauthorized receiver information that would be of value. The purpose of cover is not to make the communication secure, but to make it look like noise, rendering it uninteresting and not worth analysis. Even if an attacker recognizes the communication as interesting, cover makes traffic analysis more difficult since he must crack the cover before he can find out to whom it is addressed. Usually, the covered communication is also encrypted. In this way, enemies have no idea you sent a message; friends know you sent a message, but don't know what you said; the intended recipient knows what you said. Technically, cover sometimes refers to the specific process of modulo two additions of a pseudorandom bit stream generated by a cryptographic device with bits from the control message. Source: from Federal Standard 1037C and from MIL-STD-188

    Read more →
  • Tokenization (data security)

    Tokenization (data security)

    Tokenization, when applied to data security, is the process of substituting a sensitive data element with a non-sensitive equivalent, referred to as a token, that has no intrinsic or exploitable meaning or value. The token is a reference (i.e. identifier) that maps back to the sensitive data through a tokenization system. The mapping from original data to a token uses methods that render tokens infeasible to reverse in the absence of the tokenization system, for example using tokens created from random numbers. A one-way cryptographic function is used to convert the original data into tokens, making it difficult to recreate the original data without obtaining entry to the tokenization system's resources. To deliver such services, the system maintains a vault database of tokens that are connected to the corresponding sensitive data. Protecting the system vault is vital to the system, and improved processes must be put in place to offer database integrity and physical security. The tokenization system must be secured and validated using security best practices applicable to sensitive data protection, secure storage, audit, authentication and authorization. The tokenization system provides data processing applications with the authority and interfaces to request tokens, or detokenize back to sensitive data. The security and risk reduction benefits of tokenization require that the tokenization system is logically isolated and segmented from data processing systems and applications that previously processed or stored sensitive data replaced by tokens. Only the tokenization system can tokenize data to create tokens, or detokenize back to redeem sensitive data under strict security controls. The token generation method must be proven to have the property that there is no feasible means through direct attack, cryptanalysis, side channel analysis, token mapping table exposure or brute force techniques to reverse tokens back to live data. Replacing live data with tokens in systems is intended to minimize exposure of sensitive data to those applications, stores, people and processes, reducing risk of compromise or accidental exposure and unauthorized access to sensitive data. Applications can operate using tokens instead of live data, with the exception of a small number of trusted applications explicitly permitted to detokenize when strictly necessary for an approved business purpose. Tokenization systems may be operated in-house within a secure isolated segment of the data center, or as a service from a secure service provider. Tokenization may be used to safeguard sensitive data involving, for example, bank accounts, financial statements, medical records, criminal records, driver's licenses, loan applications, stock trades, voter registrations, and other types of personally identifiable information (PII). Tokenization is often used in credit card processing. The PCI Council defines tokenization as "a process by which the primary account number (PAN) is replaced with a surrogate value called a token. A PAN may be linked to a reference number through the tokenization process. In this case, the merchant simply has to retain the token and a reliable third party controls the relationship and holds the PAN. The token may be created independently of the PAN, or the PAN can be used as part of the data input to the tokenization technique. The communication between the merchant and the third-party supplier must be secure to prevent an attacker from intercepting to gain the PAN and the token. De-tokenization is the reverse process of redeeming a token for its associated PAN value. The security of an individual token relies predominantly on the infeasibility of determining the original PAN knowing only the surrogate value". The choice of tokenization as an alternative to other techniques such as encryption will depend on varying regulatory requirements, interpretation, and acceptance by respective auditing or assessment entities. This is in addition to any technical, architectural or operational constraint that tokenization imposes in practical use. == Concepts and origins == The concept of tokenization, as adopted by the industry today, has existed since the first currency systems emerged centuries ago as a means to reduce risk in handling high value financial instruments by replacing them with surrogate equivalents. In the physical world, coin tokens have a long history of use replacing the financial instrument of minted coins and banknotes. In more recent history, subway tokens and casino chips found adoption for their respective systems to replace physical currency and cash handling risks such as theft. Exonumia and scrip are terms synonymous with such tokens. In the digital world, similar substitution techniques have been used since the 1970s as a means to isolate real data elements from exposure to other data systems. In databases for example, surrogate key values have been used since 1976 to isolate data associated with the internal mechanisms of databases and their external equivalents for a variety of uses in data processing. More recently, these concepts have been extended to consider this isolation tactic to provide a security mechanism for the purposes of data protection. In the payment card industry, tokenization is one means of protecting sensitive cardholder data in order to comply with industry standards and government regulations. Tokenization was applied to payment card data by Shift4 Corporation and released to the public during an industry Security Summit in Las Vegas, Nevada in 2005. The technology is meant to prevent the theft of the credit card information in storage. Shift4 defines tokenization as: "The concept of using a non-decryptable piece of data to represent, by reference, sensitive or secret data. In payment card industry (PCI) context, tokens are used to reference cardholder data that is managed in a tokenization system, application or off-site secure facility." To protect data over its full lifecycle, tokenization is often combined with end-to-end encryption to secure data in transit to the tokenization system or service, with a token replacing the original data on return. For example, to avoid the risks of malware stealing data from low-trust systems such as point of sale (POS) systems, as in the Target breach of 2013, cardholder data encryption must take place prior to card data entering the POS and not after. Encryption takes place within the confines of a security hardened and validated card reading device and data remains encrypted until received by the processing host, an approach pioneered by Heartland Payment Systems as a means to secure payment data from advanced threats, now widely adopted by industry payment processing companies and technology companies. The PCI Council has also specified end-to-end encryption (certified point-to-point encryption—P2PE) for various service implementations in various PCI Council Point-to-point Encryption documents. == The tokenization process == The process of tokenization consists of the following steps: The application sends the tokenization data and authentication information to the tokenization system. It is stopped if authentication fails and the data is delivered to an event management system. As a result, administrators can discover problems and effectively manage the system. The system moves on to the next phase if authentication is successful. Using one-way cryptographic or random generation techniques, a token is generated and kept in a highly secure data vault. The new token is provided to the application for further use, replacing the sensitive data for processing and storage. Tokenization systems share several components according to established standards. Token generation is the process of producing a token using any means, such as one-way nonreversible cryptographic functions (e.g., a hash function with a strong, secret salt) or assignment via a randomly generated number. Random number generator (RNG) techniques are often the best choice for generating token values. Token mapping – this is the process of assigning the created token value to its original value. To enable permitted look-ups of the original value using the token as the index, a secure cross-reference database must be constructed. Token data store – this is a central repository for the token mapping process that holds the original sensitive values and their related token values. Sensitive data and token values must be securely kept in an encrypted format. Management of cryptographic keys. Strong key management procedures are required for sensitive data encryption on token data stores. == Difference from encryption == Tokenization and "classic" encryption effectively protect data if implemented properly, and a computer security system may use both. While similar in certain regards, tokenization and classic encryption differ in a few key aspects. Both are cryptographic data security methods and the

    Read more →
  • Social Media Working Group Act of 2014

    Social Media Working Group Act of 2014

    The Social Media Working Group Act of 2014 (H.R. 4263) is a bill that would direct the United States Secretary of Homeland Security to establish within the United States Department of Homeland Security (DHS) a social media working group (the Group) to provide guidance and best practices to the emergency preparedness and response community on the use of social media technologies before, during, and after a terrorist attack. The bill was introduced into the United States House of Representatives during the 113th United States Congress. == Background == === Social media === Social media is the social interaction among people in which they create, share or exchange information and ideas in virtual communities and networks. Andreas Kaplan and Michael Haenlein define social media as "a group of Internet-based applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of user-generated content." Furthermore, social media depend on mobile and web-based technologies to create highly interactive platforms through which individuals and communities share, co-create, discuss, and modify user-generated content. They introduce substantial and pervasive changes to communication between organizations, communities, and individuals. Social media differ from traditional or industrial media in many ways, including quality, reach, frequency, usability, immediacy, and permanence. === Virtual Social Media Working Group === First responders have increasingly used social media in emergency response and recovery operations. Social media tools are used to connect with citizens after a disaster and share information. The Virtual Social Media Working group (VSMWG) is an online platform that gives advice to first responders on how to safely and effectively use social media in emergency response operations. The working group is made up of subject matter experts from across the U.S. It was created by DHS in December 2010 and gives first responders guidance and best practices regarding the use of social media during emergencies. The DHS S&T and the VSMWG work with local and state governments, academics and nonprofits. Meetings of the VSMWG are chaired by the Under Secretary of Homeland Security for Science and Technology. == Provisions of the bill == This summary is based largely on the summary provided by the Congressional Research Service, a public domain source. The Social Media Working Group Act of 2014 would amend the Homeland Security Act of 2002 to direct the United States Secretary of Homeland Security to establish within the United States Department of Homeland Security (DHS) a social media working group (the Group) to provide guidance and best practices to the emergency preparedness and response community on the use of social media technologies before, during, and after a terrorist attack. The bill would require the Group to submit an annual report that includes: (1) a review of current and emerging social media technologies being used to support preparedness and response activities related to terrorist attacks, of best practices and lessons learned on the use of social media during the response to terrorist attacks that occurred during the period covered by the report, and of available training for government officials on the use of social media in response to a terrorist attack; (2) recommendations to improve DHS's use of social media and to improve information sharing among DHS and its components and among state and local governments; and (3) a summary of coordination efforts with the private sector to discuss and resolve legal, operational, technical, privacy, and security concerns. == Congressional Budget Office report == This summary is based largely on the summary provided by the Congressional Budget Office, as ordered reported by the House Committee on Homeland Security on June 11, 2014. This is a public domain source. H.R. 4263 would direct the Department of Homeland Security (DHS) to establish a working group to provide guidance and best practices on the use of social media technologies, specifically during a terrorist attack or other emergency. The group would prepare guidance for the emergency preparedness and response community. The bill would define the membership of the working group, which would include more than 20 experts from federal, state, local, and tribal governments along with nongovernmental organizations. The working group would be exempt from the Federal Advisory Committee Act and would be authorized to hold virtual meetings to fulfill the requirement to meet twice a year. The working group would be required to submit an annual report on emerging trends and best practices for emergency response through social media. Based on the cost of similar activities carried out under the DHS Acquisition and Accountability Efficiency Act and the Critical Infrastructure Research and Development Advancement Act of 2013, the Congressional Budget Office (CBO) estimates that the new DHS responsibilities and the annual report required by H.R. 4263 would cost a total of less than $500,000 annually, assuming the availability of appropriated funds. Enacting the legislation would not affect direct spending or revenues; therefore, pay-as-you-go procedures do not apply. H.R. 4263 contains no intergovernmental or private-sector mandates as defined in the Unfunded Mandates Reform Act and would impose no costs on state, local, or tribal governments. == Procedural history == The Social Media Working Group Act of 2014 was introduced into the United States House of Representatives on March 14, 2014, by Rep. Susan W. Brooks (R, IN-5). It was referred to the United States House Committee on Homeland Security and the United States House Homeland Security Subcommittee on Emergency Preparedness, Response, and Communications. On June 19, 2014, it was reported (amended) alongside House Report 113-480. On July 8, 2014, the House voted in Roll Call Vote 369 to pass the bill 375–19. == Debate and discussion == Nate Elliott, a social media expert at Forrester Research, explains that "the hope is when government or another authority tweets something, people will share it for them," but that this often doesn't happen. This problem, that "messages wash away very quickly," is the reason that the federal government is trying to formulate a better social media strategy. Rep. Steven Palazzo (R-MS), who co-sponsored the bill, stated that "social media has played a crucial role in emergency preparedness and response in Mississippi, including during disasters like Hurricane Isaac and the tornadoes that hit the Hattiesburg area a little over a year ago." He said that their goal with the bill was to "build upon existing public-private partnerships and use social media in a more strategic way in order to help save lives and property."

    Read more →
  • PARRY

    PARRY

    PARRY was an early example of a chatbot, implemented in 1972 by psychiatrist Kenneth Colby. == History == PARRY was written in 1972 by psychiatrist Kenneth Colby, then at Stanford University. While ELIZA was a simulation of a Rogerian therapist, PARRY attempted to simulate a person with paranoid schizophrenia. The program implemented a crude model of the behavior of a person with paranoid schizophrenia based on concepts, conceptualizations, and beliefs (judgements about conceptualizations: accept, reject, neutral). It also embodied a conversational strategy, and as such was a much more serious and advanced program than ELIZA. It was described as "ELIZA with attitude". PARRY was tested in the early 1970s using a variation of the Turing Test. A group of experienced psychiatrists analysed a combination of real patients and computers running PARRY through teleprinters. Another group of 33 psychiatrists were shown transcripts of the conversations. The two groups were then asked to identify which of the "patients" were human and which were computer programs. The psychiatrists were able to make the correct identification only 48 percent of the time — a figure consistent with random guessing. PARRY and ELIZA (also known as "the Doctor") interacted several times. The most famous of these exchanges occurred at the ICCC 1972, where PARRY and ELIZA were hooked up over ARPANET and responded to each other.

    Read more →
  • Initialization vector

    Initialization vector

    In cryptography, an initialization vector (IV) or starting variable is an input to a cryptographic primitive being used to provide the initial state. The IV is typically required to be random or pseudorandom, but sometimes an IV only needs to be unpredictable or unique. Randomization is crucial for some encryption schemes to achieve semantic security, a property whereby repeated usage of the scheme under the same key does not allow an attacker to infer relationships between (potentially similar) segments of the encrypted message. For block ciphers, the use of an IV is described by the modes of operation. Some cryptographic primitives require the IV only to be non-repeating, and the required randomness is derived internally. In this case, the IV is commonly called a nonce (a number used only once), and the primitives (e.g. CBC) are considered stateful rather than randomized. This is because an IV need not be explicitly forwarded to a recipient but may be derived from a common state updated at both sender and receiver side. (In practice, a short nonce is still transmitted along with the message to consider message loss.) An example of stateful encryption schemes is the counter mode of operation, which has a sequence number for a nonce. The IV size depends on the cryptographic primitive used; for block ciphers it is generally the cipher's block-size. In encryption schemes, the unpredictable part of the IV has at best the same size as the key to compensate for time/memory/data tradeoff attacks. When the IV is chosen at random, the probability of collisions due to the birthday problem must be taken into account. Traditional stream ciphers such as RC4 do not support an explicit IV as input, and a custom solution for incorporating an IV into the cipher's key or internal state is needed. Some designs realized in practice are known to be insecure; the WEP protocol is a notable example, and is prone to related-IV attacks. == Motivation == A block cipher is one of the most basic primitives in cryptography, and frequently used for data encryption. However, by itself, it can only be used to encode a data block of a predefined size, called the block size. For example, a single invocation of the AES algorithm transforms a 128-bit plaintext block into a ciphertext block of 128 bits in size. The key, which is given as one input to the cipher, defines the mapping between plaintext and ciphertext. If data of arbitrary length is to be encrypted, a simple strategy is to split the data into blocks each matching the cipher's block size, and encrypt each block separately using the same key. This method is not secure as equal plaintext blocks get transformed into equal ciphertexts, and a third party observing the encrypted data may easily determine its content even when not knowing the encryption key. To hide patterns in encrypted data while avoiding the re-issuing of a new key after each block cipher invocation, a method is needed to randomize the input data. In 1980, the NIST published a national standard document designated Federal Information Processing Standard (FIPS) PUB 81, which specified four so-called block cipher modes of operation, each describing a different solution for encrypting a set of input blocks. The first mode implements the simple strategy described above, and was specified as the electronic codebook (ECB) mode. In contrast, each of the other modes describe a process where ciphertext from one block encryption step gets intermixed with the data from the next encryption step. To initiate this process, an additional input value is required to be mixed with the first block, and which is referred to as an initialization vector. For example, the cipher-block chaining (CBC) mode requires an unpredictable value, of size equal to the cipher's block size, as additional input. This unpredictable value is added to the first plaintext block before subsequent encryption. In turn, the ciphertext produced in the first encryption step is added to the second plaintext block, and so on. The ultimate goal for encryption schemes is to provide semantic security: by this property, it is practically impossible for an attacker to draw any knowledge from observed ciphertext. It can be shown that each of the three additional modes specified by the NIST are semantically secure under so-called chosen-plaintext attacks. == Properties == Properties of an IV depend on the cryptographic scheme used. A basic requirement is uniqueness, which means that no IV may be reused under the same key. For block ciphers, repeated IV values devolve the encryption scheme into electronic codebook mode: equal IV and equal plaintext result in equal ciphertext. In stream cipher encryption uniqueness is crucially important as plaintext may be trivially recovered otherwise. Example: Stream ciphers encrypt plaintext P to ciphertext C by deriving a key stream K from a given key and IV and computing C as C = P xor K. Assume that an attacker has observed two messages C1 and C2 both encrypted with the same key and IV. Then knowledge of either P1 or P2 reveals the other plaintext since C1 xor C2 = (P1 xor K) xor (P2 xor K) = P1 xor P2. Many schemes require the IV to be unpredictable by an adversary. This is effected by selecting the IV at random or pseudo-randomly. In such schemes, the chance of a duplicate IV is negligible, but the effect of the birthday problem must be considered. As for the uniqueness requirement, a predictable IV may allow recovery of (partial) plaintext. Example: Consider a scenario where a legitimate party called Alice encrypts messages using the cipher-block chaining mode. Consider further that there is an adversary called Eve that can observe these encryptions and is able to forward plaintext messages to Alice for encryption (in other words, Eve is capable of a chosen-plaintext attack). Now assume that Alice has sent a message consisting of an initialization vector IV1 and starting with a ciphertext block CAlice. Let further PAlice denote the first plaintext block of Alice's message, let E denote encryption, and let PEve be Eve's guess for the first plaintext block. Now, if Eve can determine the initialization vector IV2 of the next message she will be able to test her guess by forwarding a plaintext message to Alice starting with (IV2 xor IV1 xor PEve); if her guess was correct this plaintext block will get encrypted to CAlice by Alice. This is because of the following simple observation: CAlice = E(IV1 xor PAlice) = E(IV2 xor (IV2 xor IV1 xor PAlice)). Depending on whether the IV for a cryptographic scheme must be random or only unique the scheme is either called randomized or stateful. While randomized schemes always require the IV chosen by a sender to be forwarded to receivers, stateful schemes allow sender and receiver to share a common IV state, which is updated in a predefined way at both sides. == Block ciphers == Block cipher processing of data is usually described as a mode of operation. Modes are primarily defined for encryption as well as authentication, though newer designs exist that combine both security solutions in so-called authenticated encryption modes. While encryption and authenticated encryption modes usually take an IV matching the cipher's block size, authentication modes are commonly realized as deterministic algorithms, and the IV is set to zero or some other fixed value. == Stream ciphers == In stream ciphers, IVs are loaded into the keyed internal secret state of the cipher, after which a number of cipher rounds are executed prior to releasing the first bit of output. For performance reasons, designers of stream ciphers try to keep that number of rounds as small as possible, but because determining the minimal secure number of rounds for stream ciphers is not a trivial task, and considering other issues such as entropy loss, unique to each cipher construction, related-IVs and other IV-related attacks are a known security issue for stream ciphers, which makes IV loading in stream ciphers a serious concern and a subject of ongoing research. == WEP IV == The 802.11 encryption algorithm called WEP (short for Wired Equivalent Privacy) used a short, 24-bit IV, leading to reused IVs with the same key, which led to it being easily cracked. Packet injection allowed for WEP to be cracked in times as short as several seconds. This ultimately led to the deprecation of WEP. == SSL 2.0 IV == In cipher-block chaining mode (CBC mode), the IV need not be secret, but must be unpredictable (In particular, for any given plaintext, it must not be possible to predict the IV that will be associated to the plaintext in advance of the generation of the IV.) at encryption time. Additionally for the output feedback mode (OFB mode), the IV must be unique. In particular, the (previously) common practice of re-using the last ciphertext block of a message as the IV for the next message is insecure (for example, this method was used by SSL 2.0). If an attacker knows

    Read more →
  • Social network hosting service

    Social network hosting service

    A social network hosting service is a web hosting service that specifically hosts the user creation of web-based social networking services, alongside related applications. Such services are also known as vertical social networks due to the creation of SNSes which cater to specific user interests and niches; like larger, interest-agnostic SNSes, such niche networking services may also possess the ability to create increasingly niche groups of users. == List of social network hosting services == Federated Media Publishing's BigTent BroadVision Clearvale Ning Wall.fm

    Read more →
  • Dynamic knowledge repository

    Dynamic knowledge repository

    The dynamic knowledge repository (DKR) is a concept developed by Douglas C. Engelbart as a primary strategic focus for allowing humans to address complex problems. He has proposed that a DKR will enable us to develop a collective IQ greater than any individual's IQ. References and discussion of Engelbart's DKR concept are available at the Doug Engelbart Institute. == Definition == A knowledge repository is a computerized system that systematically captures, organizes and categorizes an organization's knowledge. The repository can be searched and data can be quickly retrieved. The effective knowledge repositories include factual, conceptual, procedural and meta-cognitive techniques. The key features of knowledge repositories include communication forums. A knowledge repository can take many forms to "contain" the knowledge it holds. A customer database is a knowledge repository of customer information and insights – or electronic explicit knowledge. A Library is a knowledge repository of books – physical explicit knowledge. A community of experts is a knowledge repository of tacit knowledge or experience. The nature of the repository only changes to contain/manage the type of knowledge it holds. A repository (as opposed to an archive) is designed to get knowledge out. It should therefore have some rules of structure, classification, taxonomy, record management, etc., to facilitate user engagement.

    Read more →
  • Spleak

    Spleak

    Spleak was an IM platform where users could publish and rate content. It existed in the form of six bots covering as many subject areas: CelebSpleak, SportSpleak, VoteSpleak, TVSpleak, GameSpleak, and StyleSpleak. == Overview == Users can add a "multi-Spleak" (which contains all of the different Spleak bots in one) or add the separate bots to their IM buddy lists on MSN and AIM. Users are also allowed access to Spleak online by using a CelebSpleak, SportSpleak, or VoteSpleak widget, or through the CelebSpleak and SportSpleak applications with Facebook. Spleak was an alternate reality game and is moving to its own company, Spleak Media Network. "Celebrate Spleak" was introduced throughout 2007, launched in 2008, and was forced to retire in 2009. == Key people == Spleak was co-founded by Morten Lund and Nicolaj Reffstrup. The company's chief executive officer is Morrie Eisenburg; Josh Scott is Vice President in Product and Tyler Wells is Vice President in Engineering.

    Read more →
  • Business continuity and disaster recovery auditing

    Business continuity and disaster recovery auditing

    Given organizations' increasing dependency on information technology (IT) to run their operations, business continuity planning (and its subset IT service continuity planning) covers the entire organization, while disaster recovery focuses on IT. Auditing documents covering an organization's business continuity and disaster recovery (BCDR) plans provides a third-party validation to stakeholders that the documentation is complete and does not contain material misrepresentations. == Overview == Often used together, the terms business continuity (BC) and disaster recovery (DR) are very different. BC refers to the ability of a business to continue critical functions and business processes after the occurrence of a disaster, whereas DR refers specifically to the IT functions of the business, albeit a subset of BC. == Metrics == The primary objective is to protect the organization in the event that all or part of its operations and/or computer services are rendered partially or completely unusable. === DR metrics === Minimizing downtime and data loss during disaster recovery is typically measured in terms of two key concepts: Recovery time objective (RTO), time until a system is completely up and running Recovery point objective (RPO), a measure of the ability to recover files by specifying a point in time the backup copy will restore to. == The auditor's role == Role of the Internal Auditor in Auditing a Disaster Recovery Plan (DRP): 1. Governance & Oversight - Confirm roles, responsibilities, and oversight are defined, and DRP aligns with risk appetite and continuity strategy. 2. Risk Assessment & BIA - Verify risk and impact assessments identify critical systems and define RTO/RPO. 3. Plan Design & Documentation - Ensure the DRP is current, complete, and includes key recovery procedures. 4. Testing & Validation - Confirm regular DRP testing occurs and results are used to improve the plan. 5. Backup & Recovery - Assess backup frequency and recovery capabilities against RTO/RPO targets. 6. Communication & Training - Verify staff are trained and communication protocols are in place for crises. 7. Maintenance & Improvement - Ensure the DRP is regularly updated and lessons learned are integrated. == Documentation == === Disaster recovery plan === A disaster recovery plan (DRP) is a documented process or set of procedures to execute an organization's disaster recovery processes and recover and protect a business IT infrastructure in the event of a disaster. It is "a comprehensive statement of consistent actions to be taken before, during and after a disaster". The disaster could be natural, environmental or man-made. Man-made disasters could be intentional (for example, an act of a terrorist) or unintentional (that is, accidental, such as the breakage of a man-made dam or even "fat fingers" - or errant commands entered - on a computer system). ==== Types of plans ==== Although there is no one-size-fits-all plan, there are three basic strategies: prevention, including proper backups, having surge protectors and generators detection, a byproduct of routine inspections, which may discover new (potential) threats correction The latter may include securing proper insurance policies, and holding a "lessons learned" brainstorming session. ==== Best practices ==== To maximize their effectiveness, DRPs are most effective when updated frequently, and should: be an integral part of all business analysis processes, be revisited at every major corporate acquisition, at every new product launch and at every new system development milestone. be thoroughly tested, not just unpracticed bureaucratic documentation Adequate records need to be retained by the organization. The auditor examines records, billings, and contracts to verify that records are being kept. One such record is a current list of the organization's hardware and software vendors. Such list is made and periodically updated to reflect changing business practices and as part of an IT asset management system. Copies of it are stored on and off site and are made available or accessible to those who require them. An auditor tests the procedures used to meet this objective and determine their effectiveness. === Relationship to BCPs === Disaster recovery is a subset of business continuity. Where DRP encompasses the policies, tools and procedures to enable recovery of data following a catastrophic event, BCP involves keeping all aspects of a business functioning regardless of potential disruptive events. As such, a business continuity plan is a comprehensive organizational strategy that includes the DRP as well as threat prevention, detection, recovery, and resumption of operations should a data breach or other disaster event occur. Therefore, BCP consists of five component plans: Business resumption plan Occupant emergency plan Continuity of operations plan Incident management plan Disaster recovery plan The first three components (business resumption, occupant emergency, and continuity of operations plans) do not deal with the IT infrastructure. The incident management plan (IMP) does deal with the IT infrastructure, but since it establishes structure and procedures to address cyber attacks against an organization's IT systems, it generally does not represent an agent for activating the DRP; thus DRP is the only BCP component of active interest to IT. == Testing == The overall categorization of tests are functional- and discussion-based. Types of tests include: tabletop exercises, checklists, simulations, parallel processing (testing recovery site while primary site is in operation), and full interruption (fail over) tests. These apply to both BC and DR. == Benefits == Like every insurance plan, there are benefits that can be obtained from proper business continuity planning, including: Studies have shown a correlation between higher spending on auditing fees and lower rates of Incidents. Minimizing risk of delays Guaranteeing the reliability of standby systems (even automating the failure detection and recovery in certain scenarios) Providing a standard for testing the plan Minimizing decision-making during a disaster Reducing potential legal liabilities Lowering unnecessarily stressful work environment === Planning and testing methodology === According to Geoffrey H. Wold of the Disaster Recovery Journal, the entire process involved in developing a Disaster Recovery Plan consists of 10 steps: Performing a risk assessment: The planning committee prepares a risk analysis and a business impact analysis (BIA) that includes a range of possible disasters. Each functional area of the organization is analyzed to determine potential consequences. Traditionally, fire has posed the greatest threat. A thorough plan provides for "worst case" situations, such as destruction of the main building. Establishing priorities for processing and operations: Critical needs of each department are evaluated and prioritized. Written agreements for alternatives selected are prepared, with details specifying duration, termination conditions, system testing, cost, any special security procedures, procedure for the notification of system changes, hours of operation, the specific hardware and other equipment required for processing, personnel requirements, definition of the circumstances constituting an emergency, process to negotiate service extensions, guarantee of compatibility, availability, non-mainframe resource requirements, priorities, and other contractual issues. Collecting data: This includes various lists (employee backup position listing, critical telephone numbers list, master call list, master vendor list, notification checklist), inventories (communications equipment, documentation, office equipment, forms, insurance policies, workgroup and data center computer hardware, microcomputer hardware and software, office supply, off-site storage location equipment, telephones, etc.), distribution register, software and data files backup/retention schedules, temporary location specifications, any other such lists, materials, inventories, and documentation. Pre-formatted forms are often used to facilitate the data gathering process. Organizing and documenting a written plan Developing testing criteria and procedures: reasons for testing include Determining the feasibility and compatibility of backup facilities and procedures. Identifying areas in the plan that need modification. Providing training to the team managers and team members. Demonstrating the ability of the organization to recover. Providing motivation for maintaining and updating the disaster recovery plan. Testing the plan: An initial "dry run" of the plan is performed by conducting a structured walk-through test. An actual test-run must be performed. Problems are corrected. Initial testing can be plan is done in sections and after normal business hours to minimize disruptions. Subsequent tests occur during normal business hours. === Caveats/controversie

    Read more →
  • CARE Principles for Indigenous Data Governance

    CARE Principles for Indigenous Data Governance

    The CARE Principles for Indigenous Data Governance are a set of principles intended to guide open data projects in engaging Indigenous Peoples rights and interests. CARE was created in 2019 by the International Indigenous Data Sovereignty Interest Group, a group that is a part of the Research Data Alliance. It outlines collective rights related to open data in the context of the United Nations Declaration on the Rights of Indigenous Peoples and Indigenous data sovereignty. CARE is an acronym which stands for Collective Benefit, Authority to Control, Responsibility, Ethics. The CARE Principles are 'people and purpose-oriented, reflecting the crucial role of data in advancing Indigenous innovation and self-determination', and intended as a complement to the data-oriented perspective of other standards such as FAIR data (findable, accessible, interoperable, reusable). The CARE principles have been embedded into the Beta version of Standardised Data on Initiatives (STARDIT). CARE principles were the basis of a submission to the UN's Global Digital Compact.

    Read more →