AI Content Paraphrasing Tool

AI Content Paraphrasing Tool — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • FloodAlerts

    FloodAlerts

    FloodAlerts is a software application, developed by software specialists Shoothill, which takes real-time flooding information, and displays the data on an interactive Bing map, updating and warning its users when they, their premises or the routes they need to travel could be at risk of flooding. == History == FloodAlerts was launched in 2012, originally as the world's first Facebook flood warning app. == Operation == FloodAlerts is made available free of charge to individuals. Users are able to set up their own monitored locations and receive alerts via the application or their Facebook wall if the locations they are monitoring are at imminent risk of flooding. Hosted in the Cloud, using the Microsoft Windows Azure platform, the FloodAlerts application processes the data received from the Environment Agency, automatically creates the required map tiles, pins and alerts and displays them on an interactive Bing map, updating the content every 15 minutes. Users are able to see the latest information on the map without having to refresh their browser. FloodAlerts can also be provided as a customised risk management solution to businesses that require infrastructure or asset safety monitoring in areas where water levels are rising or receding. == Awards and recognition == FloodAlerts has received The Guardian and Virgin Media Business's 2012 Innovation Nation Awards and was shortlisted as a finalist for a further two national awards: the UK IT Industry Awards for Innovation and Entrepreneurship and The Institution of Engineering and Technology Innovation Awards for Information Technology. == In the press == The FloodAlerts application was reviewed on the BBC website. It was also reviewed on BBC Click.

    Read more →
  • Data quality

    Data quality

    Data quality refers to the state of qualitative or quantitative pieces of information. There are many definitions of data quality, but data is generally considered high quality if it is "fit for [its] intended uses in operations, decision making and planning". Data is deemed of high quality if it correctly represents the real-world construct to which it refers. Apart from these definitions, as the number of data sources increases, the question of internal data consistency becomes significant, regardless of fitness for use for any particular external purpose. People's views on data quality can often be in disagreement, even when discussing the same set of data used for the same purpose. When this is the case, businesses may adopt recognised international standards for data quality (See #International Standards for Data Quality below). Data governance can also be used to form agreed upon definitions and standards, including international standards, for data quality. In such cases, data cleansing, including standardization, may be required in order to ensure data quality. == Definitions == Defining data quality is difficult due to the many contexts data are used in, as well as the varying perspectives among end users, producers, and custodians of data. From a consumer perspective, data quality is: "data that are fit for use by data consumers" data "meeting or exceeding consumer expectations" data that "satisfies the requirements of its intended use" From a business perspective, data quality is: data that are "'fit for use' in their intended operational, decision-making and other roles" or that exhibits "'conformance to standards' that have been set, so that fitness for use is achieved" data that "are fit for their intended uses in operations, decision making and planning" "the capability of data to satisfy the stated business, system, and technical requirements of an enterprise" From a standards-based perspective, data quality is: the "degree to which a set of inherent characteristics (quality dimensions) of an object (data) fulfills requirements" "the usefulness, accuracy, and correctness of data for its application" Arguably, in all these cases, "data quality" is a comparison of the actual state of a particular set of data to a desired state, with the desired state being typically referred to as "fit for use," "to specification," "meeting consumer expectations," "free of defect," or "meeting requirements." These expectations, specifications, and requirements are usually defined by one or more individuals or groups, standards organizations, laws and regulations, business policies, or software development policies. == Dimensions of data quality == Drilling down further, those expectations, specifications, and requirements are stated in terms of characteristics or dimensions of the data, such as: accessibility or availability accuracy or correctness comparability completeness or comprehensiveness consistency, coherence, or clarity credibility, reliability, or reputation flexibility plausibility relevance, pertinence, or usefulness timeliness or latency uniqueness validity or reasonableness A systematic scoping review of the literature suggests that data quality dimensions and methods with real world data are not consistent in the literature, and as a result quality assessments are challenging due to the complex and heterogeneous nature of these data. == International standards for data quality == ISO 8000 is an international standard for data quality. Managed by the International Organization for Standardization, the ISO 8000 standards address and describe general aspects of data quality including principles, vocabulary and measurement data governance data quality management data quality assessment quality of master data, including exchange of characteristic data and identifiers quality of industrial data == History == Before the rise of the inexpensive computer data storage, massive mainframe computers were used to maintain name and address data for delivery services. This was so that mail could be properly routed to its destination. The mainframes used business rules to correct common misspellings and typographical errors in name and address data, as well as to track customers who had moved, died, gone to prison, married, divorced, or experienced other life-changing events. Government agencies began to make postal data available to a few service companies to cross-reference customer data with the National Change of Address registry (NCOA). This technology saved large companies millions of dollars in comparison to manual correction of customer data. Large companies saved on postage, as bills and direct marketing materials made their way to the intended customer more accurately. Initially sold as a service, data quality moved inside the walls of corporations, as low-cost and powerful server technology became available. Companies with an emphasis on marketing often focused their quality efforts on name and address information, but data quality is recognized as an important property of all types of data. Principles of data quality can be applied to supply chain data, transactional data, and nearly every other category of data found. For example, making supply chain data conform to a certain standard has value to an organization by: 1) avoiding overstocking of similar but slightly different stock; 2) avoiding false stock-out; 3) improving the understanding of vendor purchases to negotiate volume discounts; and 4) avoiding logistics costs in stocking and shipping parts across a large organization. For companies with significant research efforts, data quality can include developing protocols for research methods, reducing measurement error, bounds checking of data, cross tabulation, modeling and outlier detection, verifying data integrity, etc. == Overview == There are a number of theoretical frameworks for understanding data quality. A systems-theoretical approach influenced by American pragmatism expands the definition of data quality to include information quality, and emphasizes the inclusiveness of the fundamental dimensions of accuracy and precision on the basis of the theory of science (Ivanov, 1972). One framework, dubbed "Zero Defect Data" (Hansen, 1991) adapts the principles of statistical process control to data quality. Another framework seeks to integrate the product perspective (conformance to specifications) and the service perspective (meeting consumers' expectations) (Kahn et al. 2002). Another framework is based in semiotics to evaluate the quality of the form, meaning and use of the data (Price and Shanks, 2004). One highly theoretical approach analyzes the ontological nature of information systems to define data quality rigorously (Wand and Wang, 1996). A considerable amount of data quality research involves investigating and describing various categories of desirable attributes (or dimensions) of data. Nearly 200 such terms have been identified and there is little agreement in their nature (are these concepts, goals or criteria?), their definitions or measures (Wang et al., 1993). Software engineers may recognize this as a similar problem to "ilities". MIT has an Information Quality (MITIQ) Program, led by Professor Richard Wang, which produces a large number of publications and hosts a significant international conference in this field (International Conference on Information Quality, ICIQ). This program grew out of the work done by Hansen on the "Zero Defect Data" framework (Hansen, 1991). In practice, data quality is a concern for professionals involved with a wide range of information systems, ranging from data warehousing and business intelligence to customer relationship management and supply chain management. One industry study estimated the total cost to the U.S. economy of data quality problems at over U.S. $600 billion per annum (Eckerson, 2002). Incorrect data – which includes invalid and outdated information – can originate from different data sources – through data entry, or data migration and conversion projects. In 2002, the USPS and PricewaterhouseCoopers released a report stating that 23.6 percent of all U.S. mail sent is incorrectly addressed. One reason contact data becomes stale very quickly in the average database – more than 45 million Americans change their address every year. In fact, the problem is such a concern that companies are beginning to set up a data governance team whose sole role in the corporation is to be responsible for data quality. In some organizations, this data governance function has been established as part of a larger Regulatory Compliance function - a recognition of the importance of Data/Information Quality to organizations. Problems with data quality don't only arise from incorrect data; inconsistent data is a problem as well. Eliminating data shadow systems and centralizing data in a warehouse is one of the initiatives a company can take to ensure data consistency. En

    Read more →
  • SIGMOD Edgar F. Codd Innovations Award

    SIGMOD Edgar F. Codd Innovations Award

    The ACM SIGMOD Edgar F. Codd Innovations Award is a lifetime research achievement award given by the ACM Special Interest Group on Management of Data, at its yearly flagship conference (also called SIGMOD). According to its homepage, it is given "for innovative and highly significant contributions of enduring value to the development, understanding, or use of database systems and databases". The award has been given since 1992. Until 2003, this award was known as the “SIGMOD Innovations Award.” In 2004, SIGMOD, with the unanimous approval of ACM Council, decided to rename the award to honor Dr. E.F. (Ted) Codd (1923 – 2003) who invented the relational data model and was responsible for the significant development of the database field as a scientific discipline. == Recipients ==

    Read more →
  • Semantic heterogeneity

    Semantic heterogeneity

    Semantic heterogeneity is when database schema or datasets for the same domain are developed by independent parties, resulting in differences in meaning and interpretation of data values. Beyond structured data, the problem of semantic heterogeneity is compounded due to the flexibility of semi-structured data and various tagging methods applied to documents or unstructured data. Semantic heterogeneity is one of the more important sources of differences in heterogeneous datasets. Yet, for multiple data sources to interoperate with one another, it is essential to reconcile these semantic differences. Decomposing the various sources of semantic heterogeneities provides a basis for understanding how to map and transform data to overcome these differences. == Classification == One of the first known classification schemes applied to data semantics is from William Kent in the late 80s. Kent's approach dealt more with structural mapping issues than differences in meaning, which he pointed to data dictionaries as potentially solving. One of the most comprehensive classifications is from Pluempitiwiriyawej and Hammer, "Classification Scheme for Semantic and Schematic Heterogeneities in XML Data Sources". They classify heterogeneities into three broad classes: Structural conflicts arise when the schema of the sources representing related or overlapping data exhibit discrepancies. Structural conflicts can be detected when comparing the underlying schema. The class of structural conflicts includes generalization conflicts, aggregation conflicts, internal path discrepancy, missing items, element ordering, constraint and type mismatch, and naming conflicts between the element types and attribute names. Domain conflicts arise when the semantics of the data sources that will be integrated exhibit discrepancies. Domain conflicts can be detected by looking at the information contained in the schema and using knowledge about the underlying data domains. The class of domain conflicts includes schematic discrepancy, scale or unit, precision, and data representation conflicts. Data conflicts refer to discrepancies among similar or related data values across multiple sources. Data conflicts can only be detected by comparing the underlying sources. The class of data conflicts includes ID-value, missing data, incorrect spelling, and naming conflicts between the element contents and the attribute values. Moreover, mismatches or conflicts can occur between set elements (a "population" mismatch) or attributes (a "description" mismatch). Michael Bergman expanded upon this schema by adding a fourth major explicit category of language, and also added some examples of each kind of semantic heterogeneity, resulting in about 40 distinct potential categories . This table shows the combined 40 possible sources of semantic heterogeneities across sources: A different approach toward classifying semantics and integration approaches is taken by Sheth et al. Under their concept, they split semantics into three forms: implicit, formal and powerful. Implicit semantics are what is either largely present or can easily be extracted; formal languages, though relatively scarce, occur in the form of ontologies or other description logics; and powerful (soft) semantics are fuzzy and not limited to rigid set-based assignments. Sheth et al.'s main point is that first-order logic (FOL) or description logic is inadequate alone to properly capture the needed semantics. == Relevant applications == Besides data interoperability, relevant areas in information technology that depend on reconciling semantic heterogeneities include data mapping, semantic integration, and enterprise information integration, among many others. From the conceptual to actual data, there are differences in perspective, vocabularies, measures and conventions once any two data sources are brought together. Explicit attention to these semantic heterogeneities is one means to get the information to integrate or interoperate. A mere twenty years ago, information technology systems expressed and stored data in a multitude of formats and systems. The Internet and Web protocols have done much to overcome these sources of differences. While there is a large number of categories of semantic heterogeneity, these categories are also patterned and can be anticipated and corrected. These patterned sources inform what kind of work must be done to overcome semantic differences where they still reside.

    Read more →
  • User-defined function

    User-defined function

    A user-defined function (UDF) is a function provided by the user of a program or environment, in a context where the usual assumption is that functions are built into the program or environment. UDFs are usually written for the requirement of its creator. == BASIC language == In some old implementations of the BASIC programming language, user-defined functions are defined using the "DEF FN" syntax. More modern dialects of BASIC are influenced by the structured programming paradigm, where most or all of the code is written as user-defined functions or procedures, and the concept becomes practically redundant. == COBOL language == In the COBOL programming language, a user-defined function is an entity that is defined by the user by specifying a FUNCTION-ID paragraph. A user-defined function must return a value by specifying the RETURNING phrase of the procedure division header and they are invoked using the function-identifier syntax. See the ISO/IEC 1989:2014 Programming Language COBOL standard for details. As of May 2022, the IBM Enterprise COBOL for z/OS 6.4 (IBM COBOL) compiler contains support for user-defined functions. == Databases == In relational database management systems, a user-defined function provides a mechanism for extending the functionality of the database server by adding a function, that can be evaluated in standard query language (usually SQL) statements. The SQL standard distinguishes between scalar and table functions. A scalar function returns only a single value (or NULL), whereas a table function returns a (relational) table comprising zero or more rows, each row with one or more columns. User-defined functions in SQL are declared using the CREATE FUNCTION statement. For example, a user-defined function that converts Celsius to Fahrenheit (a temperature scale used in USA) might be declared like this: Once created, a user-defined function may be used in expressions in SQL statements. For example, it can be invoked where most other intrinsic functions are allowed. This also includes SELECT statements, where the function can be used against data stored in tables in the database. Conceptually, the function is evaluated once per row in such usage. For example, assume a table named Elements, with a row for each known chemical element. The table has a column named BoilingPoint for the boiling point of that element, in Celsius. The query would retrieve the name and the boiling point from each row. It invokes the CtoF user-defined function as declared above in order to convert the value in the column to a value in Fahrenheit. Each user-defined function carries certain properties or characteristics. The SQL standard defines the following properties: Language - defines the programming language in which the user-defined function is implemented; examples include SQL, C, C# and Java. Parameter style - defines the conventions that are used to pass the function parameters and results between the implementation of the function and the database system (only applicable if language is not SQL). Specific name - a name for the function that is unique within the database. Note that the function name does not have to be unique, considering overloaded functions. Some SQL implementations require that function names are unique within a database, and overloaded functions are not allowed. Determinism - specifies whether the function is deterministic or not. The determinism characteristic has an influence on the query optimizer when compiling a SQL statement. SQL-data access - tells the database management system whether the function contains no SQL statements (NO SQL), contains SQL statements but does not access any tables or views (CONTAINS SQL), reads data from tables or views (READS SQL DATA), or actually modifies data in the database (MODIFIES SQL DATA). User-defined functions should not be confused with stored procedures. Stored procedures allow the user to group a set of SQL commands. A procedure can accept parameters and execute its SQL statements depending on those parameters. A procedure is not an expression and, thus, cannot be used like user-defined functions. Some database management systems allow the creation of user defined functions in languages other than SQL. Microsoft SQL Server, for example, allows the user to use .NET languages including C# for this purpose. DB2 and Oracle support user-defined functions written in C or Java programming languages. === SQL Server 2000 === There are three types of UDF in Microsoft SQL Server 2000: scalar functions, inline table-valued functions, and multistatement table-valued functions. Scalar functions return a single data value (not a table) with RETURNS clause. Scalar functions can use all scalar data types, with exception of timestamp and user-defined data types. Inline table-valued functions return the result set of a single SELECT statement. Multistatement table-valued functions return a table, which was built with many TRANSACT-SQL statements. User-defined functions can be invoked from a query like built‑in functions such as OBJECT_ID, LEN, DATEDIFF, or can be executed through an EXECUTE statement like stored procedures. Performance Notes: User-defined functions are subroutines made of one or more Transact-SQL statements that can be used to encapsulate code for reuse. It takes zero or more arguments and evaluates a return value. Has both control-flow and DML statements in its body similar to stored procedures. Does not allow changes to any Global Session State, like modifications to database or external resource, such as a file or network. Does not support output parameter. DEFAULT keyword must be specified to pass the default value of parameter. Errors in UDF cause UDF to abort which, in turn, aborts the statement that invoked the UDF. === Apache Hive === Apache Hive defines, in addition to the regular user-defined functions (UDF), also user-defined aggregate functions (UDAF) and table-generating functions (UDTF). Hive enables developers to create their own custom functions with Java. === Apache Doris === Apache Doris, an open-source real-time analytical database, allows external users to contribute their own UDFs written in C++ to it.

    Read more →
  • Environmental informatics

    Environmental informatics

    Environmental informatics is the science of information applied to environmental science. As such, it provides the information processing and communication infrastructure to the interdisciplinary field of environmental sciences aiming at data, information and knowledge integration, the application of computational intelligence to environmental data as well as the identification of environmental impacts of information technology. Environmental informatics thus acts as a bridge, providing an interdisciplinary means of analysing, describing and understanding the complex interactions between humans, nature and technology. Since each field of applied computer science has its own subject matter, terminology and methods, specialised disciplines, such as environmental, bio- and geoinformatics have emerged, each of which combines computer science with a specific field of application such as environmental, bio- or geosciences. Environmental informatics, bioinformatics and geoinformatics all deal with computer-based processing of environmental phenomena. However, environmental informatics is the only field that pursues normative goals (e.g., political goals of environmental protection, environmental planning, and sustainability). This also influences the choice of methods. This also distinguishes it from application areas such as numerical weather prediction, which is considered an early and important example of computer simulation of environmental phenomena. The UK Natural Environment Research Council defines environmental informatics as the "research and system development focusing on the environmental sciences relating to the creation, collection, storage, processing, modelling, interpretation, display and dissemination of data and information." Kostas Karatzas defined environmental informatics as the "creation of a new 'knowledge-paradigm' towards serving environmental management needs." Karatzas argued further that environmental informatics "is an integrator of science, methods and techniques and not just the result of using information and software technology methods and tools for serving environmental engineering needs." Environmental informatics emerged in early 1990 in Central Europe. Current initiatives to effectively manage, share, and reuse environmental and ecological data are indicative of the increasing importance of fields like environmental informatics and ecoinformatics to develop the foundations for effectively managing ecological information. Examples of these initiatives are National Science Foundation Datanet projects, DataONE and Data Conservancy. == Subject matter and objectives == The subject of environmental informatics are environmental information systems (EIS). An EIS 'is a computer-based system that integrates and stores data collected about the natural environment and provides powerful methods for accessing and evaluating it.' This allows environmental data to be processed by computers for environmental protection, planning, research and technology. According to Jaeschke and Bossel, environmental informatics has three interrelated objectives: Environmental informatics serves to procure data and information for describing the state and development of the environment. Of particular importance is information that is needed to prevent or limit undesirable changes and to support desirable changes. Based on the evaluation and analysis of data, environmental informatics improves our understanding of the environment and the interactions between nature, technology and society. It thus supports environmentally relevant decisions. This enables the influence of development (system correction), the assessment of the effects and side effects of potential measures, and the creation of tools for the routine planning, implementation and monitoring of measures. == History == The simulation model World3, which formed the basis of the highly acclaimed study The Limits to Growth, is considered the starting point of environmental informatics. It incorporated environmental information, among other things, to calculate scenarios for global development. In the mid-1980s, interest grew in structuring environmental protection as an area of application for computer science. One of the first publications in German was the book Informatik im Umweltschutz. Anwendungen und Perspektiven (Computer science in environmental protection. Applications and perspectives) from 1986. The term 'environmental informatics' did not appear until around 1993, which is why the development of environmental informatics is usually referred to as having taken place in the 1990s. In 1993, the first university chair for environmental informatics was established in Cottbus. In 1994, the anthology Umweltinformatik. Informatikmethoden für Umweltschutz und Umweltforschung (Environmental Informatics: Informatics Methods for Environmental Protection and Environmental Research) was published. The development of environmental informatics was 'primarily initiated by German computer science.' In the English-speaking world, the volume Environmental Informatics was published in 1995, mainly based on the German anthology of 1994. An article in the conference proceedings of the World Computer Congress of the International Federation for Information Processing (IFIP) in Hamburg in 1994 describes the initial situation of environmental informatics as follows: 'On the one hand, we suffer from the huge amount of available data – people sometimes speak of data graveyards – on the other hand, the really relevant data may still be missing.' This statement indicates the need that led to the emergence of environmental informatics as a specialised discipline of applied computer science. Furthermore, the specific characteristics and processing requirements of environmental data necessitated the emergence of environmental informatics. The special features of environmental data include: The data structures required are highly heterogeneous due to specific processes and differing perspectives on environmental aspects (e.g., water protection, emission control, hazardous substances). In addition to the heterogeneity of the data, heterogeneous databases also play a role, as environmental data is often obtained and presented in an interdisciplinary manner. Obligations change frequently as a result of new legislation, whether regional (e.g. state regulations on water protection), national (e.g. federal emission control regulations) or international (e.g. Registration, Evaluation, Authorisation and Restriction of Chemicals|REACH). The objects represented are often multidimensional and, therefore, require complex geometric representation using curves or polygons. It is often necessary to process uncertain, imprecise or incomplete data, which is, for example, the result of extrapolations or forecasts. A new "knowledge paradigm" has emerged to meet the requirements of environmental management. Environmental informatics produces its own concepts, methods and techniques and is not merely the result of using information and communication technology methods and tools to meet environmental requirements. The development of environmental informatics since the 1990s has been significantly influenced by the newly established conferences EnviroInfo, ISESS and ITEE and is documented in the respective proceedings. Aspects of sustainability and sustainable development were increasingly integrated into environmental informatics after 2000, thereby expanding the field. In 2004, the Working Group on Sustainable Information Society of the Gesellschaft für Informatik e. V. (German Informatics Society, GI) published the Memorandum on a Sustainable Information Society, which formulates recommendations for an information society that is compatible with human, social and natural needs. Since 2007, environmental informatics has often been described in more detail as informatics for environmental protection, sustainable development and risk management. The increased focus on sustainability has also contributed to the formation of the research focus Information and Communications Technology for Sustainability (ICT4S) and to the emergence of the international conference ICT4S in 2013. ICT-ENSURE, the European Commission's funding measure for the establishment of a European research area on "ICT for Environmental Sustainability Research" (2008–2010), has also contributed to the structuring of environmental informatics. == Environmental informatics and sustainable development == Efforts to place environmental informatics within the context of sustainable development have been growing since 2000 and were significantly influenced by the Memorandum on a Sustainable Information Society. According to this Memorandum, the information society offers great but unevenly distributed opportunities for education, participation and intercultural understanding. In addition, the Memorandum highlighted the material and energy consumption of inf

    Read more →
  • SQL programming tool

    SQL programming tool

    In the field of software, SQL programming tools provide platforms for database administrators (DBAs) and application developers to perform daily tasks efficiently and accurately. Database administrators and application developers often face constantly changing environments which they rarely completely control. Many changes result from new development projects or from modifications to existing code, which, when deployed to production, do not always produce the expected result. For organizations to better manage development projects and the teams that develop code, suppliers of SQL programming tools normally provide more than facility to the database administrator or application developer to aid in database management and in quality code-deployment practices. == Features == SQL programming tools may include the following features: === SQL editing === SQL editors allow users to edit and execute SQL statements. They may support the following features: cut, copy, paste, undo, redo, find (and replace), bookmarks block indent, print, save file, uppercase/lowercase keyword highlighting auto-completion access to frequently used files output of query result editing query-results committing and rolling-back transactions inside cut paper === Object browsing === Tools may display information about database objects relevant to developers or to database administrators. Users may: view object descriptions view object definitions (DDL) create database objects enable and disable triggers and constraints recompile valid or invalid objects query or edit tables and views Some tools also provide features to display dependencies among objects, and allow users to expand these dependent objects recursively (for example: packages may reference views, views generally reference tables, super/subtypes, and so on). === Session browsing === Database administrators and application developers can use session browsing tools to view the current activities of each user in the database. They can check the resource-usage of individual users, statistics information, locked objects and the current running SQL of each individual session. === User-security management === DBAs can create, edit, delete, disable or enable user-accounts in the database using security-management tools. DBAs can also assign roles, system privileges, object privileges, and storage-quotas to users. === Debugging === Some tools offer features for the debugging of stored procedures: step in, step over, step out, run until exception, breakpoints, view & set variables, view call stack, and so on. Users can debug any program-unit without making any modification to it, including triggers and object types. === Performance monitoring === Monitoring tools may show the database resources — usage summary, service time summary, recent activities, top sessions, session history or top SQL — in easy-to-read graphs. Database administrators can easily monitor the health of various components in the monitoring instance. Application developers may also make use of such tools to diagnose and correct application-performance problems as well as improve SQL server performance. === Test data === Test data generation tools can populate the database by realistic test data for server or client side testing purposes. Also, this kind of software can upload sample blob files to database.

    Read more →
  • XOR swap algorithm

    XOR swap algorithm

    In computer programming, the exclusive or swap (sometimes shortened to XOR swap) is an algorithm that uses the exclusive or bitwise operation to swap the values of two variables without using the temporary variable which is normally required. The algorithm is primarily a novelty and a way of demonstrating properties of the exclusive or operation. It is sometimes discussed as a program optimization, but there are almost no cases where swapping via exclusive or provides benefit over the standard, obvious technique. == The algorithm == Conventional swapping requires the use of a temporary storage variable. Using the XOR swap algorithm, however, no temporary storage is needed. The algorithm is as follows: Since XOR is a commutative operation, either X XOR Y or Y XOR X can be used interchangeably in any of the foregoing three lines. Note that on some architectures the first operand of the XOR instruction specifies the target location at which the result of the operation is stored, preventing this interchangeability. The algorithm typically corresponds to three machine-code instructions, represented by corresponding pseudocode and assembly instructions in the three rows of the following table: In the above System/370 assembly code sample, R1 and R2 are distinct registers, and each XR operation leaves its result in the register named in the first argument. Using x86 assembly, values X and Y are in registers eax and ebx (respectively), and xor places the result of the operation in the first register (Note: x86 supports XCHG instruction so using triple XOR do not make sense on this architecture). In RISC-V assembly, value X and Y are in registers x10 and x11, and xor places the result of the operation in the first operand. However, in the pseudocode or high-level language version or implementation, the algorithm fails if x and y use the same storage location, since the value stored in that location will be zeroed out by the first XOR instruction, and then remain zero; it will not be "swapped with itself". This is not the same as if x and y have the same values. The trouble only comes when x and y use the same storage location, in which case their values must already be equal. That is, if x and y use the same storage location, then the line: sets x to zero (because x = y so X XOR Y is zero) and sets y to zero (since it uses the same storage location), causing x and y to lose their original values. == Proof of correctness == The binary operation XOR over bit strings of length N {\displaystyle N} exhibits the following properties (where ⊕ {\displaystyle \oplus } denotes XOR): L1. Commutativity: A ⊕ B = B ⊕ A {\displaystyle A\oplus B=B\oplus A} L2. Associativity: ( A ⊕ B ) ⊕ C = A ⊕ ( B ⊕ C ) {\displaystyle (A\oplus B)\oplus C=A\oplus (B\oplus C)} L3. Identity exists: there is a bit string, 0, (of length N) such that A ⊕ 0 = A {\displaystyle A\oplus 0=A} for any A {\displaystyle A} L4. Each element is its own inverse: for each A {\displaystyle A} , A ⊕ A = 0 {\displaystyle A\oplus A=0} . Suppose that we have two distinct registers R1 and R2 as in the table below, with initial values A and B respectively. We perform the operations below in sequence, and reduce our results using the properties listed above. === Linear algebra interpretation === As XOR can be interpreted as binary addition and a pair of bits can be interpreted as a vector in a two-dimensional vector space over the field with two elements, the steps in the algorithm can be interpreted as multiplication by 2×2 matrices over the field with two elements. For simplicity, assume initially that x and y are each single bits, not bit vectors. For example, the step: which also has the implicit: corresponds to the matrix ( 1 1 0 1 ) {\displaystyle \left({\begin{smallmatrix}1&1\\0&1\end{smallmatrix}}\right)} as ( 1 1 0 1 ) ( x y ) = ( x + y y ) . {\displaystyle {\begin{pmatrix}1&1\\0&1\end{pmatrix}}{\begin{pmatrix}x\\y\end{pmatrix}}={\begin{pmatrix}x+y\\y\end{pmatrix}}.} The sequence of operations is then expressed as: ( 1 1 0 1 ) ( 1 0 1 1 ) ( 1 1 0 1 ) = ( 0 1 1 0 ) {\displaystyle {\begin{pmatrix}1&1\\0&1\end{pmatrix}}{\begin{pmatrix}1&0\\1&1\end{pmatrix}}{\begin{pmatrix}1&1\\0&1\end{pmatrix}}={\begin{pmatrix}0&1\\1&0\end{pmatrix}}} (working with binary values, so 1 + 1 = 0 {\displaystyle 1+1=0} ), which expresses the elementary matrix of switching two rows (or columns) in terms of the transvections (shears) of adding one element to the other. To generalize to where X and Y are not single bits, but instead bit vectors of length n, these 2×2 matrices are replaced by 2n×2n block matrices such as ( I n I n 0 I n ) . {\displaystyle \left({\begin{smallmatrix}I_{n}&I_{n}\\0&I_{n}\end{smallmatrix}}\right).} These matrices are operating on values, not on variables (with storage locations), hence this interpretation abstracts away from issues of storage location and the problem of both variables sharing the same storage location. == Code example == A C function that implements the XOR swap algorithm: The code first checks if the addresses are distinct and uses a guard clause to exit the function early if they are equal. Without that check, if they were equal, the algorithm would fold to a triple x ^= x resulting in zero. == Reasons for avoidance in practice == On modern CPU architectures, the XOR technique can be slower than using a temporary variable to do swapping. At least on recent x86 CPUs, both by AMD and Intel, moving between registers regularly incurs zero latency. (This is called MOV-elimination.) Even if there is not any architectural register available to use, the XCHG instruction will be at least as fast as the three XORs taken together. Another reason is that modern CPUs strive to execute instructions in parallel via instruction pipelines. In the XOR technique, the inputs to each operation depend on the results of the previous operation, so they must be executed in strictly sequential order, negating any benefits of instruction-level parallelism. === Aliasing === The XOR swap is also complicated in practice by aliasing. If an attempt is made to XOR-swap the contents of some location with itself, the result is that the location is zeroed out and its value lost. Therefore, XOR swapping must not be used blindly in a high-level language if aliasing is possible. This issue does not apply if the technique is used in assembly to swap the contents of two registers. Similar problems occur with call by name, as in Jensen's Device, where swapping i and A[i] via a temporary variable yields incorrect results due to the arguments being related: swapping via temp = i; i = A[i]; A[i] = temp changes the value for i in the second statement, which then results in the incorrect i value for A[i] in the third statement. == Variations == The underlying principle of the XOR swap algorithm can be applied to any operation meeting criteria L1 through L4 above. Replacing XOR by addition and subtraction gives various slightly different, but largely equivalent, formulations. For example: Unlike the XOR swap, this variation requires that the underlying processor or programming language uses a method such as modular arithmetic or bignums to guarantee that the computation of X + Y cannot cause an error due to integer overflow. Therefore, it is seen even more rarely in practice than the XOR swap. However, the implementation of AddSwap above in the C programming language always works even in case of integer overflow, since, according to the C standard, addition and subtraction of unsigned integers follow the rules of modular arithmetic, i. e. are done in the cyclic group Z / 2 s Z {\displaystyle \mathbb {Z} /2^{s}\mathbb {Z} } where s {\displaystyle s} is the number of bits of unsigned int. Indeed, the correctness of the algorithm follows from the fact that the formulas ( x + y ) − y = x {\displaystyle (x+y)-y=x} and ( x + y ) − ( ( x + y ) − y ) = y {\displaystyle (x+y)-((x+y)-y)=y} hold in any abelian group. This generalizes the proof for the XOR swap algorithm: XOR is both the addition and subtraction in the abelian group ( Z / 2 Z ) s {\displaystyle (\mathbb {Z} /2\mathbb {Z} )^{s}} (which is the direct sum of s copies of Z / 2 Z {\displaystyle \mathbb {Z} /2\mathbb {Z} } ). This doesn't hold when dealing with the signed int type (the default for int). Signed integer overflow is an undefined behavior in C and thus modular arithmetic is not guaranteed by the standard, which may lead to incorrect results. The sequence of operations in AddSwap can be expressed via matrix multiplication as: ( 1 − 1 0 1 ) ( 1 0 1 − 1 ) ( 1 1 0 1 ) = ( 0 1 1 0 ) {\displaystyle {\begin{pmatrix}1&-1\\0&1\end{pmatrix}}{\begin{pmatrix}1&0\\1&-1\end{pmatrix}}{\begin{pmatrix}1&1\\0&1\end{pmatrix}}={\begin{pmatrix}0&1\\1&0\end{pmatrix}}} == Application to register allocation == On architectures lacking a dedicated swap instruction, because it avoids the extra temporary register, the XOR swap algorithm is required for optimal register allocatio

    Read more →
  • Latent semantic mapping

    Latent semantic mapping

    Latent semantic mapping (LSM) is a data-driven framework to model globally meaningful relationships implicit in large volumes of (often textual) data. It is a generalization of latent semantic analysis. In information retrieval, LSA enables retrieval on the basis of conceptual content, instead of merely matching words between queries and documents. LSM was derived from earlier work on latent semantic analysis. There are 3 main characteristics of latent semantic analysis: Discrete entities, usually in the form of words and documents, are mapped onto continuous vectors, the mapping involves a form of global correlation pattern, and dimensionality reduction is an important aspect of the analysis process. These constitute generic properties, and have been identified as potentially useful in a variety of different contexts. This usefulness has encouraged great interest in LSM. The intended product of latent semantic mapping, is a data-driven framework for modeling relationships in large volumes of data. Mac OS X v10.5 and later includes a framework implementing latent semantic mapping.

    Read more →
  • Wearable technology

    Wearable technology

    Wearable technology is a category of small electronic and mobile devices with wireless communications capability designed to be worn on the human body and are incorporated into gadgets, accessories, or clothes. Common types of wearable technology include smartwatches, fitness trackers, and smartglasses. Wearable electronic devices are often close to or on the surface of the skin, where they detect, analyze, and transmit information such as vital signs, and/or ambient data and which allow in some cases immediate biofeedback to the wearer. Wearable devices collect vast amounts of data from users making use of different behavioral and physiological sensors, which monitor their health status and activity levels. Wrist-worn devices include smartwatches with a touchscreen display, while wristbands are mainly used for fitness tracking but do not contain a touchscreen display. Wearable devices such as activity trackers are an example of the Internet of things, since "things" such as electronics, software, sensors, and connectivity are effectors that enable objects to exchange data (including data quality) through the internet with a manufacturer, operator, and/or other connected devices, without requiring human intervention. Wearable technology offers a wide range of possible uses, from communication and entertainment to improving health and fitness, however, there are worries about privacy and security because wearable devices have the ability to collect personal data. Wearable technology has a variety of use cases which is growing as the technology is developed and the market expands. It can be used to encourage individuals to be more active and improve their lifestyle choices. Healthy behavior is encouraged by tracking activity levels and providing useful feedback to enable goal setting. This can be shared with interested stakeholders such as healthcare providers. Wearables are popular in consumer electronics, most commonly in the form factors of smartwatches, smart rings, and implants. Apart from commercial uses, wearable technology is being incorporated into navigation systems, advanced textiles (e-textiles), and healthcare. As wearable technology is being proposed for use in critical applications, like other technology, it is vetted for its reliability and security properties. == History == In the 1500s, German inventor Peter Henlein (1485–1542) created small watches that were worn as necklaces. A century later, pocket watches grew in popularity as waistcoats became fashionable for men. Wristwatches were created in the late 1600s but were worn mostly by women as bracelets. Pedometers were developed around the same time as pocket watches. The concept of a pedometer was described by Leonardo da Vinci around 1500, and the Germanic National Museum in Nuremberg has a pedometer in its collection from 1590. In the late 1800s, the first wearable hearing aids were introduced. In 1904, aviator Alberto Santos-Dumont pioneered the modern use of the wristwatch. In 1949, American biophysicist Norman Holter invented the very first health monitoring device. His invention, the Holter monitor, was groundbreaking as one of the first wearable devices capable of tracking vital health data outside of a clinical setting. In the 1970s, calculator watches became available, reaching the peak of their popularity in the 1980s. From the early 2000s, wearable cameras were being used as part of a growing sousveillance movement. Expectations, operations, usage and concerns about wearable technology was floated on the first International Conference on Wearable Computing. In 2008, Ilya Fridman incorporated a hidden Bluetooth microphone into a pair of earrings. Big tech companies such as Apple, Samsung, and Fitbit have expanded on this idea by interfacing with smartphones and personal computer software to collect a wide variety of data. Wearable devices include dedicated health monitors, fitness bands, and smartwatches. In 2010, Fitbit released its first step counter. Wearable technology which tracks information such as walking and heart rate is part of the quantified self movement. In 2013, McLear, also known as NFC Ring, released a "smart ring". The smart ring could make bitcoin payments, unlock other devices, and transfer personally identifying information, and also had other features. In 2013, one of the first widely available smartwatches was the Samsung Galaxy Gear. Apple followed in 2015 with the Apple Watch. === Prototypes === From 1991 to 1997, Rosalind Picard and her students, Steve Mann and Jennifer Healey, at the MIT Media Lab designed, built, and demonstrated data collection and decision making from "Smart Clothes" that monitored continuous physiological data from the wearer. These "smart clothes", "smart underwear", "smart shoes", and smart jewellery collected data that related to affective state and contained or controlled physiological sensors and environmental sensors like cameras and other devices. At the same time, also at the MIT Media Lab, Thad Starner and Alex "Sandy" Pentland develop augmented reality. In 1997, their smartglass prototype is featured on 60 Minutes and enables rapid web search and instant messaging. Though the prototype's glasses are nearly as streamlined as modern smartglasses, the processor was a computer worn in a backpack – the most lightweight solution available at the time. In 2009, Sony Ericsson teamed up with the London College of Fashion for a contest to design digital clothing. The winner was a cocktail dress with Bluetooth technology making it light up when a call is received. Zach "Hoeken" Smith of MakerBot fame made keyboard pants during a "Fashion Hacking" workshop at a New York City creative collective. The Tyndall National Institute in Ireland developed a "remote non-intrusive patient monitoring" platform which was used to evaluate the quality of the data generated by the patient sensors and how the end users may adopt to the technology. More recently, London-based fashion company CuteCircuit created costumes for singer Katy Perry featuring LED lighting so that the outfits would change color both during stage shows and appearances on the red carpet such as the dress Katy Perry wore in 2010 at the MET Gala in NYC. In 2012, CuteCircuit created the world's first dress to feature Tweets, as worn by singer Nicole Scherzinger. In 2010, McLear, also known as NFC Ring, developed prototypes of its "smart ring" devices, before a Kickstarter fundraising in 2013. In 2014, graduate students from the Tisch School of Arts in New York designed a hoodie that sent pre-programmed text messages triggered by gesture movements. Around the same time, prototypes for digital eyewear with heads up display (HUD) began to appear. The US military employs headgear with displays for soldiers using a technology called holographic optics. In 2010, Google started developing prototypes of its optical head-mounted display Google Glass, which went into customer beta in March 2013. == Usage == In the consumer space, sales of smart wristbands (aka activity trackers such as the Jawbone UP and Fitbit Flex) started accelerating in 2013. One in five American adults have a wearable device, according to the 2014 PriceWaterhouseCoopers Wearable Future Report. As of 2009, decreasing cost of processing power and other components was facilitating widespread adoption and availability. In professional sports, wearable technology has applications in monitoring and real-time feedback for athletes. Examples of wearable technology in sport include accelerometers, pedometers, and GPS's which can be used to measure an athlete's energy expenditure and movement pattern. In cybersecurity and financial technology, secure wearable devices have captured part of the physical security key market. McLear, also known as NFC Ring, and VivoKey developed products with one-time pass secure access control. In health informatics, wearable devices have enabled better capturing of human health statistics for data driven analysis. This has facilitated data-driven machine learning algorithms to analyse the health condition of users. In business, wearable technology helps managers easily supervise employees by knowing their locations and what they are currently doing. Employees working in a warehouse also have increased safety when working around chemicals or lifting something. Smart helmets are employee safety wearables that have vibration sensors that can alert employees of possible danger in their environment. == Wearable technology and health == Wearable technology is often used to monitor a user's health. Given that such a device is in close contact with the user, it can easily collect data. It started as soon as 1980 where first wireless ECG was invented. In the last decades, there has been substantial growth in research of e.g. textile-based, tattoo, patch, and contact lenses as well as circulation of a notion of "quantified self", transhumanism-related ideas, and growth of life ex

    Read more →
  • Data drilling

    Data drilling

    Data drilling (also drilldown) refers to any of various operations and transformations on tabular, relational, and multidimensional data. The term has widespread use in various contexts, but is primarily associated with specialized software designed specifically for data analysis. == Common data drilling operations == There are certain operations that are common to applications that allow data drilling. Among them are: Query operations: tabular query pivot query === Tabular query === Tabular query operations consist of standard operations on data tables. Among these operations are: search sort filter (by value) filter (by extended function or condition) transform (e.g., by adding or removing columns) Consider the following example: Fred and Wilma table (Fig 001): gender, fname, lname, home male, fred, chopin, Poland male, fred, flintstone, bedrock male, fred, durst, usa female, wilma, flintstone, bedrock female, wilma, rudolph, usa female, wilma, webb, usa male, fred, johnson, usa The preceding is an example of a simple flat file table formatted as comma-separated values. The table includes first name, last name, gender and home country for various people named fred or wilma. Although the example is formatted this way, it is important to emphasize that tabular query operations (as well as all data drilling operations) can be applied to any conceivable data type, regardless of the underlying formatting. The only requirement is that the data be readable by the software application in use. === Pivot query === A pivot query allows multiple representations of data according to different dimensions. This query type is similar to tabular query, except it also allows data to be represented in summary format, according to a flexible user-selected hierarchy. This class of data drilling operation is formally, (and loosely) known by different names, including crosstab query, pivot table, data pilot, selective hierarchy, intertwingularity and others. To illustrate the basics of pivot query operations, consider the Fred and Wilma table (Fig 001). A quick scan of the data reveals that the table has redundant information. This redundancy could be consolidated using an outline or a tree structure or in some other way. Moreover, once consolidated, the data could have many different alternate layouts. Using a simple text outline as output, the following alternate layouts are all possible with a pivot query: Summarize by gender (Fig 001): female flintstone, wilma rudolph, wilma webb, wilma male chopin, fred flintstone, fred durst, fred johnson, fred (Dimensions = gender; Tabular fields = lname, fname;) Summarize by home, lname (Fig 001): bedrock flintstone fred wilma Poland chopin fred usa ... (Dimensions = home, lname; Tabular fields = fname;) ==== Uses ==== Pivot query operations are useful for summarizing a corpus of data in multiple ways, thereby illustrating different representations of the same basic information. Although this type of operation appears prominently in spreadsheets and desktop database software, its flexibility is arguably under-utilized. There are many applications that allow only a 'fixed' hierarchy for representing data, and this represents a substantial limitation. == Drillup == Drillup is the opposite of drilldown. For example, if you drilldown to see the revenue of one product, then you might want to drillup to see the revenue of all products.

    Read more →
  • Bibliographic database

    Bibliographic database

    A bibliographic database is a database of bibliographic records. This is an organised online collection of references to published written works like journal and newspaper articles, conference proceedings, reports, government and legal publications, patents and books. In contrast to library catalogue entries, a majority of the records in bibliographic databases describe articles and conference papers rather than complete monographs, and they generally contain very rich subject descriptions in the form of keywords, subject classification terms, or abstracts. A bibliographic database may cover a wide range of topics or one academic field like computer science. A significant number of bibliographic databases are marketed under a trade name by licensing agreement from vendors, or directly from their makers: the indexing and abstracting services. Many bibliographic databases have evolved into digital libraries, providing the full text of the organised contents:for instance CORE also organises and mirrors scholarly articles and OurResearch develops a search engine for open access content in Unpaywall. Others merge with non-bibliographic and scholarly databases to create more complete disciplinary search engine systems, such as Chemical Abstracts or Entrez. == History == Prior to the mid-20th century, individuals searching for published literature had to rely on printed bibliographic indexes, generated manually from index cards. During the early 1960s computers were used to digitize text for the first time; the purpose was to reduce the cost and time required to publish two American abstracting journals, the Index Medicus of the National Library of Medicine and the Scientific and Technical Aerospace Reports of the National Aeronautics and Space Administration (NASA). By the late 1960s, such bodies of digitized alphanumeric information, known as bibliographic and numeric databases, constituted a new type of information resource. Online interactive retrieval became commercially viable in the early 1970s over private telecommunications networks. The first services offered a few databases of indexes and abstracts of scholarly literature. These databases contained bibliographic descriptions of journal articles that were searchable by keywords in author and title, and sometimes by journal name or subject heading. The user interfaces were crude, the access was expensive, and searching was done by librarians on behalf of "end users".

    Read more →
  • Tabletopia

    Tabletopia

    Tabletopia is an online portal for users to play and create virtual tabletop games. The platform is developed by Tabletopia Inc and initially was released as a web browser based service after a successful crowdfunding campaign in August 2015. In December 2016 Tabletopia was released on Steam, and later in 2018 became available in AppStore and Google Play. == Gameplay == Tabletopia is a sandbox system for running any game. That means no AI or rules enforcement. Participating players will have to know how to play the game. Nevertheless, the platform has some automated actions available, like card-shuffling and dealing, dice-rolling, magnetic placement of components in special zones, hand management, and some others. Tabletopia also features ready game setups for various player numbers to facilitate gameplay. It also has customisable camera controls which let players save camera positions and switch between them using hot keys. People can use the Game Designer mode to design and create their own board games using the component library. They can then monetise the games with a 70/30 split to the game designer. == Development == Tabletopia was created in early 2014, by Tim Bokarev and his partners Artem Zinoviev and Dmitry Sergeev. These co-founders already had experience in the video and board games industry. Their other projects include Promo Interactive, an internet advertising agency, Playtox, a mobile MMORPG, Igrology, a game studio, and Tesera.ru, the main Russian-speaking board gaming portal. By Spring 2014, Artem, Dmitry and Tim created Tabletopia Inc. USA and started development. Tabletopia is a multinational crew that includes professionals from USA, Ukraine, Australia, Ireland, and Germany. The Kickstarter campaign in August 2015 earned $133,721 by 2,545 backers. Tabletopia received Green Light on Steam in September 2015 and was released on Steam in March 2016. The platform remained in Early Access until December 2016, when it was officially released on Steam and on the web. In February 2018 it was released as a stand-alone app for iOS tablets, and in September 2018 for Android tablets.

    Read more →
  • Algorithm engineering

    Algorithm engineering

    Algorithm engineering focuses on the design, analysis, implementation, optimization, profiling and experimental evaluation of computer algorithms, bridging the gap between algorithmics theory and practical applications of algorithms in software engineering. It is a general methodology for algorithmic research. == Origins == In 1995, a report from an NSF-sponsored workshop "with the purpose of assessing the current goals and directions of the Theory of Computing (TOC) community" identified the slow speed of adoption of theoretical insights by practitioners as an important issue and suggested measures to reduce the uncertainty by practitioners whether a certain theoretical breakthrough will translate into practical gains in their field of work, and tackle the lack of ready-to-use algorithm libraries, which provide stable, bug-free and well-tested implementations for algorithmic problems and expose an easy-to-use interface for library consumers. But also, promising algorithmic approaches have been neglected due to difficulties in mathematical analysis. The term "algorithm engineering" was first used with specificity in 1997, with the first Workshop on Algorithm Engineering (WAE97), organized by Giuseppe F. Italiano. == Difference from algorithm theory == Algorithm engineering does not intend to replace or compete with algorithm theory, but tries to enrich, refine and reinforce its formal approaches with experimental algorithmics (also called empirical algorithmics). This way it can provide new insights into the efficiency and performance of algorithms in cases where the algorithm at hand is less amenable to algorithm theoretic analysis, formal analysis pessimistically suggests bounds which are unlikely to appear on inputs of practical interest, the algorithm relies on the intricacies of modern hardware architectures like data locality, branch prediction, instruction stalls, instruction latencies which the machine model used in Algorithm Theory is unable to capture in the required detail, the crossover between competing algorithms with different constant costs and asymptotic behaviors needs to be determined. == Methodology == Some researchers describe algorithm engineering's methodology as a cycle consisting of algorithm design, analysis, implementation and experimental evaluation, joined by further aspects like machine models or realistic inputs. They argue that equating algorithm engineering with experimental algorithmics is too limited, because viewing design and analysis, implementation and experimentation as separate activities ignores the crucial feedback loop between those elements of algorithm engineering. === Realistic models and real inputs === While specific applications are outside the methodology of algorithm engineering, they play an important role in shaping realistic models of the problem and the underlying machine, and supply real inputs and other design parameters for experiments. === Design === Compared to algorithm theory, which usually focuses on the asymptotic behavior of algorithms, algorithm engineers need to keep further requirements in mind: Simplicity of the algorithm, implementability in programming languages on real hardware, and allowing code reuse. Additionally, constant factors of algorithms have such a considerable impact on real-world inputs that sometimes an algorithm with worse asymptotic behavior performs better in practice due to lower constant factors. === Analysis === Some problems can be solved with heuristics and randomized algorithms in a simpler and more efficient fashion than with deterministic algorithms. Unfortunately, this makes even simple randomized algorithms difficult to analyze because there are subtle dependencies to be taken into account. === Implementation === Huge semantic gaps between theoretical insights, formulated algorithms, programming languages and hardware pose a challenge to efficient implementations of even simple algorithms, because small implementation details can have rippling effects on execution behavior. The only reliable way to compare several implementations of an algorithm is to spend an considerable amount of time on tuning and profiling, running those algorithms on multiple architectures, and looking at the generated machine code. === Experiments === See: Experimental algorithmics === Application engineering === Implementations of algorithms used for experiments differ in significant ways from code usable in applications. While the former prioritizes fast prototyping, performance and instrumentation for measurements during experiments, the latter requires thorough testing, maintainability, simplicity, and tuning for particular classes of inputs. === Algorithm libraries === Stable, well-tested algorithm libraries like LEDA play an important role in technology transfer by speeding up the adoption of new algorithms in applications. Such libraries reduce the required investment and risk for practitioners, because it removes the burden of understanding and implementing the results of academic research. == Conferences == Two main conferences on Algorithm Engineering are organized annually, namely: Symposium on Experimental Algorithms (SEA), established in 1997 (formerly known as WEA). SIAM Meeting on Algorithm Engineering and Experiments (ALENEX), established in 1999. The 1997 Workshop on Algorithm Engineering (WAE'97) was held in Venice (Italy) on September 11–13, 1997. The Third International Workshop on Algorithm Engineering (WAE'99) was held in London, UK in July 1999. The first Workshop on Algorithm Engineering and Experimentation (ALENEX99) was held in Baltimore, Maryland on January 15–16, 1999. It was sponsored by DIMACS, the Center for Discrete Mathematics and Theoretical Computer Science (at Rutgers University), with additional support from SIGACT, the ACM Special Interest Group on Algorithms and Computation Theory, and SIAM, the Society for Industrial and Applied Mathematics.

    Read more →
  • Visual Peer Review

    Visual Peer Review

    == Development and history == Visual Peer Review was first described in a 2017 classroom study by Friedman and Rosen, which examined how students evaluate peer-produced data visualizations using structured rubrics. Developed within the broader fields of data visualization, information visualization, and educational technology, the system emphasized clear labeling, visual integrity, and reduction of chartjunk. Students assigned rubric scores and provided written explanations, aligning the activity with established principles of peer review. Follow-up research expanded both the methodological and analytic dimensions of the framework. Friedman and colleagues applied natural language processing (NLP) to peer-review text to analyze part-of-speech patterns, sentence complexity, and comment length. These analyses offered insight into how students expressed critique and engaged with core design principles. Later studies incorporated advanced statistical modeling to evaluate system-level behavior, including peer review networks and reviewer typologies. Between 2021 and 2024, the framework underwent iterative refinement through a series of studies that explored interface design, behavioral nudges, reviewer engagement, and social network dynamics. The system was influenced by earlier work in computer-supported peer review—particularly My Reviewers, a rubric-based writing assessment platform developed by Joe Moxley at the University of South Florida. While Moxley's platform focused on text-based feedback, Visual Peer Review adapted its core structure to support critique of DataVis and visual analytics. To guide structured analysis and feedback, Friedman and Rosen also drew on the “what, why, and how” framework introduced by Liu and Stasko (2010), which emphasizes understanding a visualization's purpose, task alignment, and encoding strategy. == Framework and components == Visual Peer Review is designed to support critique, reflection, and learning in courses focusing on data visualization, visual analytics, and related fields in educational technology. The system consists of interconnected component. Core components include: Visual Artifacts: Students generate original visualizations using software such as R (e.g., ggplot2), Tableau, Python, or Adobe Illustrator. These artifacts may include statistical graphics, dashboards, or design-oriented infographics. Rubric-Based Assessment: Peer reviewers evaluate submitted visualizations using structured rubrics grounded in visualization theory and design heuristics. Rubric dimensions typically include: Use of labeling and axis scales Minimalization of chartjunk and clutter (following Tufte's principles) Optimization of the data–ink ratio Preservation of visual integrity through accurate representation (lie factor) Written Peer Comments: In addition to scoring, reviewers provide narrative feedback explaining their reasoning. These comments aim to improve design literacy, strengthen visual reasoning, and support the learning process common to peer review across educational contexts. Instructor Analytics Dashboard: Instructors access an analytics dashboard that displays peer-review activity across the course. Metrics include comment length, rubric coverage, participation patterns, and potential indicators of disengagement. These features position the framework within the domain of learning analytics, where visualized data helps instructors monitor student progress and identify support needs. == Ongoing development == Current work focuses on enhancing rubric structure, integrating principles from human–computer interaction, DataVis and expanding learning-analytics capabilities. Ongoing studies investigate how interface design, reviewer behavior, and classroom context influence the quality of feedback and overall engagement. Continuing development positions Visual Peer Review at the intersection of data visualization education, peer assessment, and educational technology.

    Read more →