AI Generator Canva

AI Generator Canva — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • AirPair

    AirPair

    AirPair is a service and eponymous company that connects people who need help with programming issues (usually, programmers at small technology companies or at finance companies that use technology products) and people who can help them. Unlike services such as oDesk and Elance, AirPair is not a service for outsourcing programming tasks, but rather a service that facilitates one-off knowledge transfers from people with highly specialized knowledge of particular technology stacks or programming issues to people who are in need of specialized help. == History == AirPair launched in March 2013, with founder Jonathon Kresner, who hails from Australia, working full-time, and it soon hired three other part-time developers to work alongside him. Kresner had previously founded two other startups: Preparty, a social invitation and event-booking service based in Australia, and ClimbFind, an online rock-climbing community that reached a million users. Kresner was inspired to work on AirPair because he saw the need for outside expert assistance with programming issues arise regularly at these startups. In November 2013, founder Kresner describes the company's initial success at bootstrapping itself to "Ramen profitability" in a blog post. In December 2013, AirPair was accepted into the Winter 2014 Y Combinator batch. In March 2014, AirPair announced it would launch partnerships with Stripe, Twilio, and other companies that had their own application programming interfaces, allowing developers having trouble with the APIs to seek help over AirPair from experts on the APIs. AirPair presented at the Y Combinator Winter 2014 Demo Day on March 25, 2014, and successfully raised over $1 million within the next 48 hours. == Reception == A review of AirPair by Will Lam stressed that because payment was based on time rather than results, it was important to use it for clearly thought-out questions where one had high confidence that the session would help. Dennis Beatty, who met AirPair founder Jonathon Kresner in March 2014, wrote in April 2014 a glowing review of AirPair's vision of connecting people and its business success. AirPair has been compared with other peer-to-peer coding help sites such as Codementor and HackHands.

    Read more →
  • Digital artifact

    Digital artifact

    Digital artifact in information science, is any undesired or unintended alteration in data introduced in a digital process by an involved technique and/or technology. Digital artifact can be of any content types including text, audio, video, image, animation or a combination. == Information science == In information science, digital artifacts result from: Hardware malfunction: In computer graphics, visual artifacts may be generated whenever a hardware component such as the processor, memory chip, cabling malfunctions, etc., corrupts data. Examples of malfunctions include physical damage, overheating, insufficient voltage and GPU overclocking. Common types of hardware artifacts are texture corruption and T-vertices in 3D graphics, and pixelization in MPEG compressed video. Software malfunction: Artifacts may be caused by algorithm flaws such as decoding/encoding audio or video, or a poor pseudo-random number generator that would introduce artifacts distinguishable from the desired noise into statistical models. Compression: Controlled amounts of unwanted information may be generated as a result of the use of lossy compression techniques. One example is the artifacts seen in JPEG and MPEG compression algorithms that produce compression artifacts. Quantization: Digital imprecision generated in the process of converting analog information into digital space, is due to the limited granularity of digital numbering space. In computer graphics, quantization is seen as pixelation. Aliasing: As a consequence of sampling or sample-rate conversion, energy from frequencies outside of the signal frequency band of interest are folded across multiples of the Nyquist frequency. This is typically mitigated by using an anti-aliasing filter. Filtering: The process of filtering a signal, such as using an anti-aliasing filter, causes undesired alterations to the signal due to imperfections in the frequency response magnitude and phase, and due to the time domain impulse response. Rolling shutter, the line scanning of an object that is moving too fast for the image sensor to capture a unitary image. Error diffusion: poorly-weighted kernel coefficients result in undesirable visual artifacts.

    Read more →
  • Relational data stream management system

    Relational data stream management system

    A relational data stream management system (RDSMS) is a distributed, in-memory data stream management system (DSMS) that is designed to use standards-compliant SQL queries to process unstructured and structured data streams in real-time. Unlike SQL queries executed in a traditional RDBMS, which return a result and exit, SQL queries executed in a RDSMS do not exit, generating results continuously as new data become available. Continuous SQL queries in a RDSMS use the SQL Window function to analyze, join and aggregate data streams over fixed or sliding windows. Windows can be specified as time-based or row-based. == RDSMS SQL Query Examples == Continuous SQL queries in a RDSMS conform to the ANSI SQL standards. The most common RDSMS SQL query is performed with the declarative SELECT statement. A continuous SQL SELECT operates on data across one or more data streams, with optional keywords and clauses that include FROM with an optional JOIN subclause to specify the rules for joining multiple data streams, the WHERE clause and comparison predicate to restrict the records returned by the query, GROUP BY to project streams with common values into a smaller set, HAVING to filter records resulting from a GROUP BY, and ORDER BY to sort the results. The following is an example of a continuous data stream aggregation using a SELECT query that aggregates a sensor stream from a weather monitoring station. The SELECTquery aggregates the minimum, maximum and average temperature values over a one-second time period, returning a continuous stream of aggregated results at one second intervals. RDSMS SQL queries also operate on data streams over time or row-based windows. The following example shows a second continuous SQL query using the WINDOW clause with a one-second duration. The WINDOW clause changes the behavior of the query, to output a result for each new record as it arrives. Hence the output is a stream of incrementally updated results with zero result latency.

    Read more →
  • Terminology extraction

    Terminology extraction

    Terminology extraction (also known as term extraction, glossary extraction, term recognition, or terminology mining) is a subtask of information extraction. The goal of terminology extraction is to automatically extract relevant terms from a given corpus. In the semantic web era, a growing number of communities and networked enterprises started to access and interoperate through the internet. Modeling these communities and their information needs is important for several web applications, like topic-driven web crawlers, web services, recommender systems, etc. The development of terminology extraction is also essential to the language industry. One of the first steps to model a knowledge domain is to collect a vocabulary of domain-relevant terms, constituting the linguistic surface manifestation of domain concepts. Several methods to automatically extract technical terms from domain-specific document warehouses have been described in the literature. Typically, approaches to automatic term extraction make use of linguistic processors (part of speech tagging, phrase chunking) to extract terminological candidates, i.e. syntactically plausible terminological noun phrases. Noun phrases include compounds (e.g. "credit card"), adjective noun phrases (e.g. "local tourist information office"), and prepositional noun phrases (e.g. "board of directors"). In English, the first two (compounds and adjective noun phrases) are the most frequent. Terminological entries are then filtered from the candidate list using statistical and machine learning methods. Once filtered, because of their low ambiguity and high specificity, these terms are particularly useful for conceptualizing a knowledge domain or for supporting the creation of a domain ontology or a terminology base. Furthermore, terminology extraction is a very useful starting point for semantic similarity, knowledge management, human translation and machine translation, etc. == Bilingual terminology extraction == The methods for terminology extraction can be applied to parallel corpora. Combined with e.g. co-occurrence statistics, candidates for term translations can be obtained. Bilingual terminology can be extracted also from comparable corpora (corpora containing texts within the same text type, domain but not translations of documents between each other).

    Read more →
  • ACL Data Collection Initiative

    ACL Data Collection Initiative

    The ACL Data Collection Initiative (ACL/DCI) was a project established in 1989 by the Association for Computational Linguistics (ACL) to create and distribute large text and speech corpora for computational linguistics research. The initiative aimed to address the growing need for substantial text databases that could support research in areas such as natural language processing, speech recognition, and computational linguistics. By 1993, the initiative’s activities had effectively ceased, with its functions and datasets absorbed by the Linguistic Data Consortium (LDC), which was founded in 1992. == Objectives == The ACL/DCI had several key objectives: To acquire a large and diverse text corpus from various sources To transform the collected texts into a common format based on the Standard Generalized Markup Language (SGML) To make the corpus available for scientific research at low cost with minimal restrictions To provide a common database that would allow researchers to replicate or extend published results To reduce duplication of effort among researchers in obtaining and preparing text data These objectives were designed to address the growing demand for very large amounts of text arising from applications in recognition and analysis of text and speech. Its core objective was to "oversee the acquisition and preparation of a large text corpus to be made available for scientific research at cost and without royalties". == History == By the late 1980s, researchers in computational linguistics and speech recognition faced a significant problem: the lack of large-scale, accessible text corpora for developing statistical models and testing algorithms. Existing generally available text databases were too small to meet the needs of developing applications in text and speech recognition. The initiative was formed to meet this need by collecting, standardizing, and distributing large quantities of text data with minimal restrictions for scientific research. As stated by Liberman (1990), "research workers have been severely hampered by the lack of appropriate materials, and specially by the lack of a large enough body of text on which published results can be replicated or extended by others." The ACL/DCI committee was established in February 1989. The committee included members from academic and industrial research laboratories in the United States and Europe. The initiative was chaired by Mark Liberman from the University of Pennsylvania (formerly of AT&T Bell Laboratories). Other committee members included representatives from organizations such as Bellcore, IBM T.J. Watson Research Center, Cambridge University, Virginia Polytechnic Institute & State University, Northeastern University, University of Pennsylvania, SRI International, MCC, Xerox PARC, ISSCO, and University of Pisa. The project operated initially without dedicated funding, relying on volunteer efforts from committee members and their affiliated institutions. Key supporters included AT&T Bell Labs, Bellcore, IBM, Xerox, and the University of Pennsylvania, which allowed the use of their computing facilities for ACL/DCI-related work. Previously running on volunteer effort pro bono, in 1991, it obtained funding from General Electric and the National Science Foundation (IRI-9113530). == Data == As of 1990, the ACL/DCI had collected hundreds of millions of words of diverse text. The collection included: Wall Street Journal articles (25 to 50 million words); Canadian Hansard (parliamentary records) in parallel English and French versions: cleaned-up English Hansard donated by the IBM alignment models group (100 million words), and original Bilingual Hansard (from a different time period) obtained directly (200 million words). Collins English Dictionary (1979 edition), both as fulltext (3 million words) and as various "database" versions, constructed using "typographers' tape" donated by Collins, which were computer tapes containing the structured digital data used to typeset and print the 1979 edition of the dictionary; Emails from ARPANET newsletters for the ACM Special Interest Group on Information Retrieval Forum (IRLIST) and AIList Digest issues distributed over the ARPANET (AILIST) (5 million words), both collected by Edward A. Fox at VIPSU; Articles on networking (2 million words); U.S. Department of Agriculture Extension Service Fact Sheets (>1 million words); 200,000 scientific abstracts of about 1,500 words each from the Department of Energy (25 million words); Archives of the Challenger Investigation Commission, including transcripts of depositions and hearings (2.5 million words); Books from the Library of America, including works by Mark Twain, Eugene O'Neill, Ralph Waldo Emerson, Herman Melville, W.E.B. DuBois, Willa Cather, and Benjamin Franklin (130 books, 20 million words); Public domain books like the King James Bible, Tristram Shandy, The Federalist Papers; Several million words of transcribed radiologists' reports, donated by Francis Ganong at Kurzweil Applied Intelligence Inc (about 5 million words); The Child Language Data Exchange corpus of child language acquisition transcripts; U.S. Department of Justice Justice Retrieval and Inquiry System (JURIS) materials; The Swiss Civil Code in parallel German, French and Italian; Economic reports from the Union Bank of Switzerland, in parallel English, German, French and Italian; About 12K words of administrative policy manuals and 14K words of administrative memos, contributed by Geoff Pullum of U.C.S.C.; Material from various ACM journals and the ACL journal Computational Linguistics; The CSLI publications series: 50-100 reports (8K words each) and 5-10 books (80K words each). The initiative started with North American English text but expanded to include Canadian French and planned to include Japanese, Chinese, and other Asian languages. At least 5 million words from the collection were tagged under the Penn Treebank project, and those tags were distributed by DCI as well. After DCI was absorbed by the LDC, the datasets were curated under LDC. == Format == The ACL/DCI corpus was coded in a standard form based on SGML (Standard Generalized Markup Language, ISO 8879), consistent with the recommendations of the Text Encoding Initiative (TEI), of which the DCI was an affiliated project. The TEI was a joint project of the ACL, the Association for Computers and the Humanities, and the Association for Literary and Linguistic Computing, aiming to provide a common interchange format for literary and linguistic data. The initiative planned to add annotations reflecting consensually approved linguistic features like part of speech and various aspects of syntactic and semantic structure over time. == Examples == As an example of the use of ACL/DCI, consider the Wall Street Journal (WSJ) corpus for speech recognition research. The WSJ corpus was used as the basis for the DARPA Spoken Language System (SLS) community's Continuous Speech Recognition (CSR) Corpus. The WSJ corpus became a standard benchmark for evaluating speech recognition systems and has been used in numerous research papers. The WSJ CSR Corpus provided DARPA with its first general-purpose English, large vocabulary, natural language, high perplexity corpus containing speech (400 hours) and text (47 million words) during 1987–89. The text corpus was 313 MB in size. The text was preprocessed to remove ambiguity in the word sequence that a reader might choose, ensuring that the unread text used to train language models was representative of the spoken test material. The preprocessing included converting numbers into orthographics, expanding abbreviations, resolving apostrophes and quotation marks, and marking punctuation. As another example, the Yarowsky algorithm used bitext data from DCI to train a simple word-sense disambiguation model that was competitive with advanced models trained on smaller datasets. == Distribution == Materials from the ACL/DCI collection were distributed to research groups on a non-commercial basis. By 1990, about 25 research groups and individual researchers had received tapes containing various portions of the collected material. To obtain the data, researchers had to sign an agreement not to redistribute the data or make direct commercial use of it. However, commercial application of "analytical materials" derived from the text, such as statistical tables or grammar rules, was explicitly permitted. The initiative first distributed data via 12-inch reels of 9-track tape, then via CD-ROMs. Each such tape could contain 30 million words compressed via the Lempel-Ziv algorithms. The first CD-ROM distribution was in 1991, funded by Dragon Systems Inc. It contained Collins English Dictionary, WSJ, scientific abstracts provided by the U.S. Department of Energy, and the Penn Treebank.

    Read more →
  • SQL/PSM

    SQL/PSM

    SQL/PSM (SQL/Persistent Stored Modules) is an ISO standard mainly defining an extension of SQL with a procedural language for use in stored procedures. Initially published in 1996 as an extension of SQL-92 (ISO/IEC 9075-4:1996, a version sometimes called PSM-96 or even SQL-92/PSM), SQL/PSM was later incorporated into the multi-part SQL:1999 standard, and has been part 4 of that standard since then, most recently in SQL:2023. The SQL:1999 part 4 covered less than the original PSM-96 because the SQL statements for defining, managing, and invoking routines were actually incorporated into part 2 SQL/Foundation, leaving only the procedural language itself as SQL/PSM. The SQL/PSM facilities are still optional as far as the SQL standard is concerned; most of them are grouped in Features P001-P008. SQL/PSM standardizes syntax and semantics for control flow, exception handling (called "condition handling" in SQL/PSM), local variables, assignment of expressions to variables and parameters, and (procedural) use of cursors. It also defines an information schema (metadata) for stored procedures. SQL/PSM is one language in which methods for the SQL:1999 structured types can be defined. The other is Java, via SQL/JRT. SQL/PSM is derived, seemingly directly, from Oracle's PL/SQL. Oracle developed PL/SQL and released it in 1991, basing the language on the US Department of Defense's Ada programming language. However, Oracle has maintained a distance from the standard in its documentation. IBM's SQL PL (used in DB2) and Mimer SQL's PSM were the first two products officially implementing SQL/PSM. It is commonly thought that these two languages, and perhaps also MySQL/MariaDB's procedural language, are closest to the SQL/PSM standard. However, a PostgreSQL addon implements SQL/PSM (alongside its other procedural languages like the PL/SQL-derived plpgsql), although it is not part of the core product. RDF functionality in OpenLink Virtuoso was developed entirely through SQL/PSM, combined with custom datatypes (e.g., ANY for handling URI and Literal relation objects), sophisticated indexing, and flexible physical storage choices (column-wise or row-wise).

    Read more →
  • Information explosion

    Information explosion

    Information explosion is the rapid increase in the amount of published information or data and the effects of this abundance. As the amount of available data grows, the problem of managing the information becomes more difficult, which can lead to information overload. The Online Oxford English Dictionary indicates use of the phrase in a March 1964 New Statesman article. The New York Times first used the phrase in its editorial content in an article by Walter Sullivan on June 7, 1964, in which he described the phrase as "much discussed". The earliest known use of the phrase was in a speech about television by NBC president Pat Weaver at the Institute of Practitioners of Advertising in London on September 27, 1955. The speech was rebroadcast on radio station WSUI in Iowa City and excerpted in the Daily Iowan newspaper two months later. Many sectors are seeing this rapid increase in the amount of information available such as healthcare, supermarkets, and governments. Another sector that is being affected by this phenomenon is journalism. Such a profession, which in the past was responsible for the dissemination of information, may be suppressed by the overabundance of information today. Techniques to gather knowledge from an overabundance of electronic information (e.g., data fusion may help in data mining) have existed since the 1970s. Another common technique to deal with such amount of information is qualitative research. Such approaches aim to organize the information, synthesizing, categorizing and systematizing in order to be more usable and easier to search. == Growth patterns == The world's technological capacity to store information grew from, optimally compressed, 2.6 exabytes in 1986 to 15.7 in 1993, over 54.5 in 2000, and to 295 exabytes in 2007. The world's technological capacity to receive information through one-way broadcast networks was 432 exabytes of (optimally compressed) information in 1986, 715 (optimally compressed) exabytes in 1993, 1,200 (optimally compressed) exabytes in 2000, and 1,900 in 2007. The world's effective capacity to exchange information through two-way telecommunications networks was 0.281 exabytes of (optimally compressed) information in 1986, 0.471 in 1993, 2.2 in 2000, and 65 (optimally compressed) exabytes in 2007. A new metric that is being used in an attempt to characterize the growth in person-specific information, is the disk storage per person (DSP), which is measured in megabytes/person (where megabytes is 106 bytes and is abbreviated MB). Global DSP (GDSP) is the total rigid disk drive space (in MB) of new units sold in a year divided by the world population in that year. The GDSP metric is a crude measure of how much disk storage could possibly be used to collect person-specific data on the world population. In 1983, one million fixed drives with an estimated total of 90 terabytes were sold worldwide; 30MB drives had the largest market segment. In 1996, 105 million drives, totaling 160,623 terabytes were sold with 1 and 2 gigabyte drives leading the industry. By the year 2000, with 20GB drive leading the industry, rigid drives sold for the year are projected to total 2,829,288 terabytes Rigid disk drive sales to top $34 billion in 1997. According to Latanya Sweeney, there are three trends in data gathering today: Type 1. Expansion of the number of fields being collected, known as the “collect more” trend. Type 2. Replace an existing aggregate data collection with a person-specific one, known as the “collect specifically” trend. Type 3. Gather information by starting a new person-specific data collection, known as the “collect it if you can” trend. == Related terms == Since "information" in electronic media is often used synonymously with "data", the term information explosion is closely related to the concept of data flood (also dubbed data deluge). Sometimes the term information flood is used as well. All of those basically boil down to the ever-increasing amount of electronic data exchanged per time unit. A term that covers the potential negative effects of information explosion is information inflation. The awareness about non-manageable amounts of data grew along with the advent of ever more powerful data processing since the mid-1960s. == Challenges == Even though the abundance of information can be beneficial in several levels, some problems may be of concern such as privacy, legal and ethical guidelines, filtering and data accuracy. Filtering refers to finding useful information in the middle of so much data, which relates to the job of data scientists. A typical example of a necessity of data filtering (data mining) is in healthcare since in the next years is due to have EHRs (Electronic Health Records) of patients available. With so much information available, the doctors will need to be able to identify patterns and select important data for the diagnosis of the patient. On the other hand, according to some experts, having so much public data available makes it difficult to provide data that is actually anonymous. Another point to take into account is the legal and ethical guidelines, which relates to who will be the owner of the data and how frequently he/she is obliged to the release this and for how long. With so many sources of data, another problem will be accuracy of such. An untrusted source may be challenged by others, by ordering a new set of data, causing a repetition in the information. According to Edward Huth, another concern is the accessibility and cost of such information. The accessibility rate could be improved by either reducing the costs or increasing the utility of the information. The reduction of costs according to the author, could be done by associations, which should assess which information was relevant and gather it in a more organized fashion. == Web servers == As of August 2005, there were over 70 million web servers. As of September 2007 there were over 135 million web servers. == Blogs == According to Technorati, the number of blogs doubles about every 6 months with a total of 35.3 million blogs as of April 2006. This is an example of the early stages of logistic growth, where growth is approximately exponential, since blogs are a recent innovation. As the number of blogs approaches the number of possible producers (humans), saturation occurs, growth declines, and the number of blogs eventually stabilizes.

    Read more →
  • Magic Quadrant

    Magic Quadrant

    Magic Quadrant (MQ) is a series of market research reports published by research and advisory firm Gartner that rely on proprietary qualitative data analysis methods to demonstrate market trends, such as direction, maturity, and participants. Their analyses are conducted for several specific technology industries and are updated every 1–2 years: once an updated report has been published, its predecessor is "retired". == Rating == Gartner rates vendors upon two criteria: completeness of vision and ability to execute. Completeness of vision – Reflects the vendor's innovation, and whether the vendor drives or follows the market. Ability to execute – Summarizes factors such as the vendor's financial viability, market responsiveness, product development, sales channels and customer base. The two component scores lead to a vendor position in one of four quadrants: === Leaders === Vendors in the "Leaders" quadrant have the highest composite scores for their completeness of vision and ability to execute. A vendor in the Leaders quadrant has the market share, credibility, and marketing & sales capabilities needed to drive the acceptance of new technologies. These vendors demonstrate a clear understanding of market needs, they are innovators and thought leaders, and they have well-articulated plans that customers and prospects can use when designing their infrastructures and strategies. In addition, they have a presence in the five major geographical regions, consistent financial performance, and broad platform support. === Challengers === Vendors in the "Challengers" quadrant have high scores mainly for their ability to execute. They both participate in the market and execute well enough to be a serious threat to vendors in the "Leaders" quadrant. They have strong products, as well as sufficiently credible market position and resources to sustain continued growth. Financial viability is not an issue for vendors in the "Challengers" quadrant, but they lack the size and influence of vendors in the "Leaders" quadrant due to their relative lack of vision. === Visionaries === Vendors in the "Visionaries" quadrant have high scores mainly for their completeness of vision. They deliver innovative products that address operationally or financially important end-user problems at a broad scale, but have not yet demonstrated the ability to capture market share or maintain sustainable levels of profitability. Visionary vendors are frequently privately held companies and acquisition targets for larger, established companies. The likelihood of acquisition often reduces the risks associated with installing their systems. === Niche Players === Vendors in the "Niche Players" quadrant have relatively low scores for both their ability to execute and their completeness of vision. They are often narrowly focused on specific market or vertical segments. This quadrant often also includes vendors that are adapting their existing products to enter the market under consideration, or larger vendors having difficulty developing and executing on their vision. == Gartner Critical Capabilities == Gartner Critical Capabilities complement Magic Quadrant analysis to offer deeper insight into the products and services offered by multiple vendors by a comparative analysis that scores competing products or services against a set of critical differentiators identified by Gartner. Gartner has periodically ended Magic Quadrant listings for IT Service Management, Web Content Management, and other industries as those markets have fully matured or other factors rendered the analytic framework inapplicable. == Criticism == The Magic Quadrant, and analysts in general, skew the market: according to research, by applying their methodologies to describe a market, they change that marketplace to fit their tools. Another criticism is that open source vendors are not considered sufficiently by analysts like Gartner, as has been published in an online discussion between a VP from Talend and a German Research VP from Gartner. On May 29, 2009 (2009-05-29), software vendor ZL Technologies filed a federal lawsuit against Gartner that challenged the "legitimacy" of Gartner's Magic Quadrant rating system. Gartner filed a motion to dismiss by claiming First Amendment protection since it contends that its MQ reports contain "pure opinion", which legally means opinions that are not based on fact. The court threw out the ZL case because it lacked a specific complaint. The decision was upheld on appeal.

    Read more →
  • Lexical choice

    Lexical choice

    Lexical choice is the subtask of Natural language generation that involves choosing the content words (nouns, non-auxiliary verbs, adjectives, and adverbs) in a generated text. Function words (determiners, for example) are usually chosen during realisation. == Examples == The simplest type of lexical choice involves mapping a domain concept (perhaps represented in an ontology) to a word. For example, the concept Finger might be mapped to the word finger. A more complex situation is when a domain concept is expressed using different words in different situations. For example, the domain concept Value-Change can be expressed in many ways: The temperature rose: the verb rose is used for a Value-Change in temperature which increases the value. The temperature fell: the verb fell is used for a Value-Change in temperature which decreases the value. The rain got heavier: the phrase got heavier is used for a Value-Change in precipitation amount when the precipitation is rain. Sometimes words can communicate additional contextual information, for example: The temperature plummeted: the verb plummeted is used for a Value-Change in temperature which decreases the value, when the change is rapid and large. Contextual information is especially significant for vague terms such as tall. For example, a 2m tall man is tall, but a 2m tall horse is small. == Linguistic perspective == Lexical choice modules must be informed by linguistic knowledge of how the system's input data maps onto words. This is a question of semantics, but it is also influenced by syntactic factors (such as collocation effects) and pragmatic factors (such as context). Hence NLG systems need linguistic models of how meaning is mapped to words in the target domain (genre) of the NLG system. Genre tends to be very important; for example the verb veer has a very specific meaning in weather forecasts (wind direction is changing in a clockwise direction) which it does not have in general English, and a weather-forecast generator must be aware of this genre-specific meaning. In some cases there are major differences in how different people use the same word; for example, some people use by evening to mean 6PM and others use it to mean midnight. Psycholinguists have shown that when people speak to each other, they agree on a common interpretation via lexical alignment; this is not something which NLG systems can yet do. Ultimately, lexical choice must deal with the fundamental issue of how language relates to the non-linguistic world. For example, a system which chose colour terms such as red to describe objects in a digital image would need to know which RGB pixel values could generally be described as red; how this was influenced by visual (lighting, other objects in the scene) and linguistic (other objects being discussed) context; what pragmatic connotations were associated with red (for example, when an apple is called red, it is assumed to be ripe as well as have the colour red); and so forth. == Algorithms and models == A number of algorithms and models have been developed for lexical choice in the research community, for example Edmonds developed a model for choosing between near-synonyms (words with similar core meanings but different connotations). However such algorithms and models have not been widely used in applied NLG systems; such systems have instead often used quite simple computational models, and invested development effort in linguistic analysis instead of algorithm development.

    Read more →
  • Relational data stream management system

    Relational data stream management system

    A relational data stream management system (RDSMS) is a distributed, in-memory data stream management system (DSMS) that is designed to use standards-compliant SQL queries to process unstructured and structured data streams in real-time. Unlike SQL queries executed in a traditional RDBMS, which return a result and exit, SQL queries executed in a RDSMS do not exit, generating results continuously as new data become available. Continuous SQL queries in a RDSMS use the SQL Window function to analyze, join and aggregate data streams over fixed or sliding windows. Windows can be specified as time-based or row-based. == RDSMS SQL Query Examples == Continuous SQL queries in a RDSMS conform to the ANSI SQL standards. The most common RDSMS SQL query is performed with the declarative SELECT statement. A continuous SQL SELECT operates on data across one or more data streams, with optional keywords and clauses that include FROM with an optional JOIN subclause to specify the rules for joining multiple data streams, the WHERE clause and comparison predicate to restrict the records returned by the query, GROUP BY to project streams with common values into a smaller set, HAVING to filter records resulting from a GROUP BY, and ORDER BY to sort the results. The following is an example of a continuous data stream aggregation using a SELECT query that aggregates a sensor stream from a weather monitoring station. The SELECTquery aggregates the minimum, maximum and average temperature values over a one-second time period, returning a continuous stream of aggregated results at one second intervals. RDSMS SQL queries also operate on data streams over time or row-based windows. The following example shows a second continuous SQL query using the WINDOW clause with a one-second duration. The WINDOW clause changes the behavior of the query, to output a result for each new record as it arrives. Hence the output is a stream of incrementally updated results with zero result latency.

    Read more →
  • Umbrella review

    Umbrella review

    In medical research, an umbrella review is a review of systematic reviews or meta-analyses. They may also be called overviews of reviews, reviews of reviews, summaries of systematic reviews, or syntheses of reviews. Umbrella reviews are among the highest levels of evidence currently available in medicine. By summarizing information from multiple overview articles, umbrella reviews make it easier to review the evidence and allow for comparison of results between each of the individual reviews. Umbrella reviews may address a broader question than a typical review, such as discussing multiple different treatment comparisons instead of only one. They are especially useful for developing guidelines and clinical practice, and when comparing competing interventions.

    Read more →
  • Social information architecture

    Social information architecture

    Social information architecture, also known as social iA, is a sub-domain of information architecture which deals with the social aspects of conceptualizing, modeling and organizing information. It has become more relevant because of the rise of social media and Web 2.0 in recent times. == Approach == There are different approaches to the explanation of social information architecture. === Architecture model (internal space) === Architects designing a physical community space, have to consider how the architecture will shape social interactions. A long hallway of offices creates an utterly different dynamic than desks with arranged in an open space. One might foster individuality, privacy, propriety; the other: collaboration, distraction, communalism. Still, physical spaces can be flexibly repurposed and worked around if the inhabitants desire a social dynamic not instantly afforded by the space. Office doors can be left open to invite easier interaction. Partitions can be raised between adjacent desks to limit distraction and increase privacy. That's physical architecture. The information architectures of online communities are far more deterministic and far less flexible. They literally define the social architecture by pre-specifying in immutable computer code what information you have access to, who you can talk to, where you can go. In the online world, information architecture = social architecture. === Social dialogue and information model (external space) === All major brands use information architecture to market their products online, it is then commonly wrapped under the umbrella phrase 'digital strategy'. Information architecture used for strategic purposes encompasses brand SEO, strategic placement of virals, social media presence etc. Charities, news outlets and social dialogue forums can make a much more specific use of the same tools for positive and important social purposes. Social Information Architecture is perceived as the socially conscious wing of commercial information architecture and function to exchange information and ideas between people and groups. Social iA can pick up on conflicting issues that are treated with misunderstanding between cultures and leaves individuals and societies vulnerable to exploitation and manipulation. Since the net has such a far reach it is obvious to use it for meaningful and coordinated social dialogue. Example of such issues are faith, environment, politics, climate change, war, injustice and other social challenges. Information architecture can help create frameworks in which sharing information brings people together, inspires and encourages them to participate in a forward thinking and unfragmented way. One of its core activities is to spread messages that bring people from opposite sites of social and cultural spectrums together and to confront uncomfortable subject head on. == How does social information architecture work? == Social iA utilizes a variety of Web2.0 applications to filter relevant or valuable information and weave them in appropriate information repository or provide feedback to interesting channels. Social iA makes strategic use of Search Engines, Social Media, Google Algorithms, as well as websites, video & news channels. It ‘reads’ or 'listens' to social conversations and search engine queries and engages with the net actively to gather clues about the world's pulse on the internet. It assesses data, social & political trends, and respond with targeted campaigns to give people ideas, as well as help people with making sense of information. == Principals == Dan Brown in his paper 8 Principals of Social Information Architecture enlists the following principals: 1. The principle of objects: Treat content as a living, breathing thing, with a lifecycle, behaviors and attributes. 2. The principle of choices: Create pages that offer meaningful choices to users, keeping the range of choices available focused on a particular task. 3. The principle of disclosure: Show only enough information to help people understand what kinds of information they'll find as they dig deeper. 4. The principle of exemplars: Describe the contents of categories by showing examples of the contents. 5. The principle of front doors: Assume at least half of the website's visitors will come through some page other than the home page. 6. The principle of multiple classification: Offer users several different classification schemes to browse the site's content. 7. The principle of focused navigation: Don't mix apples and oranges in your navigation scheme. 8. The principle of growth: Assume the content you have today is a small fraction of the content you will have tomorrow. == What can social information architecture achieve? == Social information architecture has many potentials in terms of fostering social connections and how information is shared in social spaces on the web.

    Read more →
  • Physical access

    Physical access

    Physical access is a term in computer security that refers to the ability of people to physically gain access to a computer system. According to Gregory White, "Given physical access to an office, the knowledgeable attacker will quickly be able to find the information needed to gain access to the organization's computer systems and network." == Attacks and countermeasures == === Attacks === Physical access opens up a variety of avenues for hacking. Michael Meyers notes that "the best network software security measures can be rendered useless if you fail to physically protect your systems," since an intruder could simply walk off with a server and crack the password at his leisure. Physical access also allows hardware keyloggers to be installed. An intruder may be able to boot from a CD or other external media and then read unencrypted data on the hard drive. They may also exploit a lack of access control in the boot loader; for instance, pressing F8 while certain versions of Microsoft Windows are booting, specifying 'init=/bin/sh' as a boot parameter to Linux (usually done by editing the command line in GRUB), etc. One could also use a rogue device to access a poorly secured wireless network; if the signal were sufficiently strong, one might not even need to breach the perimeter. === Countermeasures === IT security standards in the United States typically call for physical access to be limited by locked server rooms, sign-in sheets, etc. Physical access systems and IT security systems have historically been administered by separate departments of organizations, but are increasingly being seen as having interdependent functions needing a single, converged security policy. An IT department could, for instance, check security log entries for suspicious logons occurring after business hours, and then use keycard swipe records from a building access control system to narrow down the list of suspects to those who were in the building at that time. Surveillance cameras might also be used to deter or detect unauthorized access.

    Read more →
  • Block swap algorithms

    Block swap algorithms

    In computer algorithms, block swap algorithms swap two regions of elements of an array. It is simple to swap two non-overlapping regions of an array of equal size. However, it is not as simple to swap two contiguous regions of an array of unequal sizes (algorithms that perform such swapping are called rotation algorithms). A few well-known algorithms can accomplish this: Bentley's juggling (also known as the dolphin algorithm), Gries-Mills rotation, triple reversal algorithm, conjoined triple reversal algorithm (also known as the trinity rotation) and Successive rotation. == Triple reversal algorithm == The triple reversal algorithm is the simplest to explain, using rotations. A rotation is an in-place reversal of array elements. This method swaps two elements of an array from outside in within a range. The rotation works for an even or odd number of array elements. The reversal algorithm uses three in-place rotations to accomplish an in-place block swap: Rotate region A Rotate region B Rotate region AB Where A and B are adjacent regions of an array that together form the region AB. Gries-Mills and reversal algorithms perform better than Bentley's juggling, because of their cache-friendly memory access pattern behavior. The triple reversal algorithm parallelizes well, because rotations can be split into sub-regions, which can be rotated independently of others.

    Read more →
  • Storage area network

    Storage area network

    A storage area network (SAN) or storage network is a computer network which provides access to consolidated, block-level data storage. SANs are primarily used to access data storage devices, such as disk arrays and tape libraries from servers so that the devices appear to the operating system as direct-attached storage. A SAN typically is a dedicated network of storage devices not accessible through the local area network (LAN). Although a SAN provides only block-level access, file systems built on top of SANs do provide file-level access and are known as shared-disk file systems. Newer SAN configurations enable hybrid SAN and allow traditional block storage that appears as local storage but also object storage for web services through APIs. == Storage architectures == Storage area networks (SANs) are sometimes referred to as network behind the servers and historically developed out of a centralized data storage model, but with its own data network. A SAN is, at its simplest, a dedicated network for data storage. In addition to storing data, SANs allow for the automatic backup of data, and the monitoring of the storage as well as the backup process. A SAN is a combination of hardware and software. It grew out of data-centric mainframe architectures, where clients in a network can connect to several servers that store different types of data. To scale storage capacities as the volumes of data grew, direct-attached storage (DAS) was developed, where disk arrays or just a bunch of disks (JBODs) were attached to servers. In this architecture, storage devices can be added to increase storage capacity. However, the server through which the storage devices are accessed is a single point of failure, and a large part of the LAN network bandwidth is used for accessing, storing and backing up data. To solve the single point of failure issue, a direct-attached shared storage architecture was implemented, where several servers could access the same storage device. DAS was the first network storage system and is still widely used where data storage requirements are not very high. Out of it developed the network-attached storage (NAS) architecture, where one or more dedicated file server or storage devices are made available in a LAN. Therefore, the transfer of data, particularly for backup, still takes place over the existing LAN. If more than a terabyte of data was stored at any one time, LAN bandwidth became a bottleneck. Therefore, SANs were developed, where a dedicated storage network was attached to the LAN, and terabytes of data are transferred over a dedicated high speed and bandwidth network. Within the SAN, storage devices are interconnected. Transfer of data between storage devices, such as for backup, happens behind the servers and is meant to be transparent. In a NAS architecture data is transferred using the TCP and IP protocols over Ethernet. Distinct protocols were developed for SANs, such as Fibre Channel, iSCSI, Infiniband. Therefore, SANs often have their own network and storage devices, which have to be bought, installed, and configured. This makes SANs inherently more expensive than NAS architectures. == Components == SANs have their own networking devices, such as SAN switches. To access the SAN, so-called SAN servers are used, which in turn connect to SAN host adapters. Within the SAN, a range of data storage devices may be interconnected, such as SAN-capable disk arrays, JBODs and tape libraries. === Host layer === Servers that allow access to the SAN and its storage devices are said to form the host layer of the SAN. Such servers have host adapters, which are cards that attach to slots on the server motherboard (usually PCI slots) and run with a corresponding firmware and device driver. Through the host adapters the operating system of the server can communicate with the storage devices in the SAN. In Fibre channel deployments, a cable connects to the host adapter through the gigabit interface converter (GBIC). GBICs are also used on switches and storage devices within the SAN, and they convert digital bits into light impulses that can then be transmitted over the Fibre Channel cables. Conversely, the GBIC converts incoming light impulses back into digital bits. The predecessor of the GBIC was called gigabit link module (GLM). === Fabric layer === The fabric layer consists of SAN networking devices that include SAN switches, routers, protocol bridges, gateway devices, and cables. SAN network devices move data within the SAN, or between an initiator, such as an HBA port of a server, and a target, such as the port of a storage device. When SANs were first built, hubs were the only devices that were Fibre Channel capable, but Fibre Channel switches were developed and hubs are now rarely found in SANs. Switches have the advantage over hubs that they allow all attached devices to communicate simultaneously, as a switch provides a dedicated link to connect all its ports with one another. When SANs were first built, Fibre Channel had to be implemented over copper cables, these days multimode optical fibre cables are used in SANs. SANs are usually built with redundancy, so SAN switches are connected with redundant links. SAN switches connect the servers with the storage devices and are typically non-blocking allowing transmission of data across all attached wires at the same time. SAN switches are for redundancy purposes set up in a meshed topology. A single SAN switch can have as few as 8 ports and up to 32 ports with modular extensions. So-called director-class switches can have as many as 128 ports. In switched SANs, the Fibre Channel switched fabric protocol FC-SW-6 is used under which every device in the SAN has a hardcoded World Wide Name (WWN) address in the host bus adapter (HBA). If a device is connected to the SAN its WWN is registered in the SAN switch name server. In place of a WWN, or worldwide port name (WWPN), SAN Fibre Channel storage device vendors may also hardcode a worldwide node name (WWNN). The ports of storage devices often have a WWN starting with 5, while the bus adapters of servers start with 10 or 21. === Storage layer === The serialized Small Computer Systems Interface (SCSI) protocol is often used on top of the Fibre Channel switched fabric protocol in servers and SAN storage devices. The Internet Small Computer Systems Interface (iSCSI) over Ethernet and the Infiniband protocols may also be found implemented in SANs, but are often bridged into the Fibre Channel SAN. However, Infiniband and iSCSI storage devices, in particular, disk arrays, are available. The various storage devices in a SAN are said to form the storage layer. It can include a variety of hard disk and magnetic tape devices that store data. In SANs, disk arrays are joined through a RAID which makes a lot of hard disks look and perform like one big storage device. Every storage device, or even partition on that storage device, has a logical unit number (LUN) assigned to it. This is a unique number within the SAN. Every node in the SAN, be it a server or another storage device, can access the storage by referencing the LUN. The LUNs allow for the storage capacity of a SAN to be segmented and for the implementation of access controls. A particular server, or a group of servers, may, for example, be only given access to a particular part of the SAN storage layer, in the form of LUNs. When a storage device receives a request to read or write data, it will check its access list to establish whether the node, identified by its LUN, is allowed to access the storage area, also identified by a LUN. LUN masking is a technique whereby the host bus adapter and the SAN software of a server restrict the LUNs for which commands are accepted. In doing so LUNs that should never be accessed by the server are masked. Another method to restrict server access to particular SAN storage devices is fabric-based access control, or zoning, which is enforced by the SAN networking devices and servers. Under zoning, server access is restricted to storage devices that are in a particular SAN zone. == Network protocols == A mapping layer to other protocols is used to form a network: ATA over Ethernet (AoE), mapping of AT Attachment (ATA) over Ethernet Fibre Channel Protocol (FCP), a mapping of SCSI over Fibre Channel Fibre Channel over Ethernet (FCoE) ESCON over Fibre Channel (FICON), used by mainframe computers HyperSCSI, mapping of SCSI over Ethernet iFCP or SANoIP mapping of FCP over IP iSCSI, mapping of SCSI over TCP/IP iSCSI Extensions for RDMA (iSER), mapping of iSCSI over InfiniBand Network block device, mapping device node requests on UNIX-like systems over stream sockets like TCP/IP SCSI RDMA Protocol (SRP), another SCSI implementation for remote direct memory access (RDMA) transports Storage networks may also be built using Serial Attached SCSI (SAS) and Serial ATA (SATA) technologies. SAS evolved from SCSI direct-attached storage. SATA evolved from Para

    Read more →